Archive

Archive for December, 2012

2012 in review

December 31, 2012 Leave a comment

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

About 55,000 tourists visit Liechtenstein every year. This blog was viewed about 170,000 times in 2012. If it were Liechtenstein, it would take about 3 years for that many people to see it. Your blog had more visits than a small country in Europe!

Click here to see the complete report.

Categories: Uncategorized Tags: , ,

Free online polls

December 27, 2012 1 comment
Categories: Uncategorized Tags: ,

Scraping AJAX web pages

December 27, 2012 Leave a comment
Categories: Uncategorized Tags: ,

Scraping AJAX web pages (Part 4)

December 27, 2012 8 comments

Don’t forget to check out the rest of the series too!

I managed to solve a problem that bugged me for a long time. Namely, (1) I want to download the generated source of an AJAX-powered webpage; (2) I want a headless solution, i.e. I want no browser window; and (3) I want to wait until the AJAX-content is fully loaded.

During the past 1.5 years I got quite close :) I could solve everything except issue #3. Now I’m proud to present a complete solution that satisfies all the criteria above.

#!/usr/bin/env python

import os
import sys

from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtWebKit import QWebPage

SEC = 1000 # 1 sec. is 1000 msec.
USER_AGENT = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0'

class JabbaWebkit(QWebPage):
    # 'html' is a class variable
    def __init__(self, url, wait, app, parent=None):
        super(JabbaWebkit, self).__init__(parent)
        JabbaWebkit.html = ''

        if wait:
            QTimer.singleShot(wait * SEC, app.quit)
        else:
            self.loadFinished.connect(app.quit)

        self.mainFrame().load(QUrl(url))

    def save(self):
        JabbaWebkit.html = self.mainFrame().toHtml()

    def userAgentForUrl(self, url):
        return USER_AGENT

def get_page(url, wait=None):
    # here is the trick how to call it several times
    app = QApplication.instance() # checks if QApplication already exists
    if not app: # create QApplication if it doesnt exist
        app = QApplication(sys.argv)
    #
    form = JabbaWebkit(url, wait, app)
    app.aboutToQuit.connect(form.save)
    app.exec_()
    return JabbaWebkit.html

#############################################################################

if __name__ == "__main__":
    url = 'http://simile.mit.edu/crowbar/test.html'
    print get_html(url)

It’s also on GitHub. The GitHub version contains more documentation and more examples.

[ reddit comments ]

Update (20121228)
Jabba-Webkit got included in Pycoder’s Weekly #46. Awesome.

GitHub: create a new repository and start using it

December 27, 2012 Leave a comment

Problem
You want to create a new GitHub repository and you want to use it right away, i.e. you want to upload some content.

In the past, GitHub showed a detailed step-by-step help for all this, but it got removed :(

Solution
On the main page of GitHub, there is a button called “New repository”. Click on it, fill out the fields and create the repo. Now it’s on GitHub.

The next step is to clone it on your local machine:

git clone git@github.com:username/project.git

Here use the URL that starts with “git@github.com”! Not the one with “https://“. Once I cloned the “https://” and then it kept asking my username and password at each commit :(

Now you can perform your local changes. When ready, upload the changes to github:

git push origin master

More info here.

Update (20131216)
If you need to upload your SSH key, follow this guide.

Free list of Elite proxy servers

December 27, 2012 Leave a comment

Problem
You want to collect a list of free Elite proxies.

Solution
Currently I scrape these pages to maintain a list of Elite proxies:

It’s enough for me now. If my needs change in the future, I will update this list.

Wikipedia APIs for bots

December 18, 2012 1 comment
Categories: Uncategorized Tags: , ,
Follow

Get every new post delivered to your Inbox.

Join 72 other followers