Archive

Posts Tagged ‘javascript’

view the generated HTML source in Firefox

July 20, 2016 Leave a comment

To view the generated HTML source in Firefox, use this bookmarklet:

javascript: var win = window.open(); win.document.write('<html><head><title>Generated HTML of  ' + location.href + '</title></head><pre>' + document.documentElement.innerHTML.replace(/&/g, '&amp;').replace(/</g, '&lt;') + '</pre></html>'); win.document.close(); void 0;

This tip is from here.

Scraping AJAX web pages (Part 5.5)

July 13, 2016 Leave a comment

Don’t forget to check out the rest of the series too!

This post is very similar to the previous one (Part 5), which scraped a webpage using PhantomJS from the command line and sent the output to the stdout.

This time we use PhantomJS again, but we do it from a Python script and wrap Selenium around PhantomJS. The generated HTML source will be available in a variable. Here is the source:

#!/usr/bin/env python3
# encoding: utf-8

"""
required packages:
* selenium
optional packages:
* bs4
* lxml
"""

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# from bs4 import BeautifulSoup

url = "http://simile.mit.edu/crowbar/test.html"

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
    "(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get(url)
html = driver.page_source
print(html)
# soup = BeautifulSoup(driver.page_source, "lxml") #page_source fetches page after rendering is complete
# driver.save_screenshot('screen.png') # save a screenshot to disk

driver.quit()

The script sets the user agent (optional but recommended). The source is captured in a variable. The last two lines are in comments but they would work. You could feed the source to BeautifulSoup and then you could extract part of the HTML source. If you uncomment the last line, then you can create a screenshot of the webpage.

Scraping AJAX web pages (Part 5)

July 13, 2016 Leave a comment

Don’t forget to check out the rest of the series too!

I’ve already written about PhatomJS, for instance here. Recall: PhantomJS is a headless WebKit scriptable with a JavaScript API.

The problem is still the same: we have a webpage that contains lots of JavaScript code and we want to get the final HTML that is produced after the JavaScript codes have been executed.

Example: http://simile.mit.edu/crowbar/test.html. If you download it with “wget” for instance, you get the text “Hi lame crawler” in the source. However, a JavaScript code changes this text to “Hi Crowbar!” in the browser and we want to get this generated source. How?

This time we’ll use PhantomJS. We also need a JavaScript script that will instruct PhantomJS what to do. Let’s call it printSource.js:

var system = require('system');
var page   = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url    = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
  // page.content is the source
  console.log(page.content);
  // need to call phantom.exit() to prevent from hanging
  phantom.exit();
});

Note that this code comes from here.

If you want to set the user agent, use this modified script:

var system = require('system');
var page   = require('webpage').create();
page.settings.userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)';
// system.args[0] is the filename, so system.args[1] is the first real argument
var url    = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
  // page.content is the source
  console.log(page.content);
  // need to call phantom.exit() to prevent from hanging
  phantom.exit();
});

Then launch the following command:

$ phantomjs printSource.js http://simile.mit.edu/crowbar/test.html

The output is printed to the standard output.

If you want to do the same thing from a Python script, check out Part 5.5 of the series.

Categories: bash Tags: , , , ,

AutoFocus

November 28, 2015 Leave a comment

Problem
You visit some websites quite often but the focus is not put on the input field, so you need to click there each time. Google puts the focus on the input field when you want to do a query. Why can’t other sites do the same?

Solution
I got fed up so I wrote a Greasemonkey script that does the autofocus job for me. Thus, after opening such a site, I can type immediately.

You can find the script here: https://github.com/jabbalaci/AutoFocus . The script is very simple and can be customized easily. Currently it contains 2 rules: one for Wikipedia, and one for IMDb.

HTML / CSS / JavaScript video tutorials

November 2, 2013 Leave a comment

Check out http://www.developphp.com/ for HTML / CSS / JavaScript video tutorials. As I saw they are freely available!

Examples:

Categories: Uncategorized Tags: , , , , ,

Disable Enter in HTML forms

November 29, 2010 Leave a comment

Problem

You have an HTML form with several submit buttons. Each button sets some variables that you need for processing the form correctly. You want to disable sending the form with Enter, you only want to allow the buttons.

Solution

<script type="text/javascript">

function stopRKey(evt) {
  var evt = (evt) ? evt : ((event) ? event : null);
  var node = (evt.target) ? evt.target : ((evt.srcElement) ? evt.srcElement : null);
  if ((evt.keyCode == 13) && (node.type=="text"))  {return false;}
}

document.onkeypress = stopRKey;

</script> 

This tip is from here.

Categories: html Tags: ,