Archive

Archive for the ‘python’ Category

Detailed Twitter info in JSON: an undocumented feature

October 24, 2016 Leave a comment

Problem
Using a script, I wanted to figure out the number of my followers on Twitter. Here is my (mostly abandoned) Twitter page: https://twitter.com/szathmar . I didn’t want to use any API since I didn’t want to register for an API key so I went on the easy way: let’s scrape the necessary data out :) Digging in the HTML code I found the number of followers, but I also found a hidden treasure!

Solution
And the hidden treasure is a long json string that contains all kinds of information about a twitter user:

hidden_json2

Here on the screenshot you can see just an extract, the json string is much longer. Fine, let’s get it!

#!/usr/bin/env python3
# coding: utf-8

import json
import readline
import sys
from pprint import pprint

import requests
from bs4 import BeautifulSoup

def main():
    url = input("Full twitter URL: ")
    html = requests.get(url).text
    soup = BeautifulSoup(html, "lxml")

    tag = soup.find('input', {'class': 'json-data'})
    j = tag['value']
    d = json.loads(j)
    json_out = json.dumps(d, indent=4)
    print(json_out)

    # followers = d['profile_user']['followers_count']
    # print(followers)

##############################################################################

if __name__ == "__main__":
    main()

If you want the number of followers for instance, then uncomment the last two lines.

Thank you Twitter! It’s really nice of you to provide all these data in JSON!

Sample
The JSON that I could extract from my page is 743 lines long! Here is an extract of it:

...
"profile_image_url": "http://pbs.twimg.com/profile_images/459783802395430912/vcMT0CGX_normal.png",
"business_profile_state": "none",
"url": null,
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme6/bg.gif",
"screen_name": "szathmar",
"is_translator": false,
"friends_count": 123,
"followers_count": 70,
"profile_text_color": "333333",
"profile_link_color": "FF3300",
"translator_type": "none",
"profile_background_color": "709397",
...
Categories: python Tags: , , ,

[wordpress] using the old-style editor

October 16, 2016 Leave a comment

Problem
Wordpress.com introduced a while ago a new-style editor for writing posts. However, I really hate it, it’s unusable. How to get back to the old-style editor?

Solution
Transforming the URL you can get back to the old-style editor. For instance:

new style: https://wordpress.com/post/ubuntuincident.wordpress.com/5865
old style: https://ubuntuincident.wordpress.com/wp-admin/post.php?post=5865&action=edit

Let’s automate the task with Python:

#!/usr/bin/env python3
# coding: utf-8

import readline
import webbrowser

def main():
    url = input("New style URL: ")
    parts = url.split("/")
    new = "{0}//{1}/wp-admin/post.php?post={2}&action=edit".format(
        parts[0], parts[4], parts[-1]
    )
    print("New style:", new)
    webbrowser.open_new_tab(new)

##############################################################################

if __name__ == "__main__":
    main()

Screenshots

New-style shit.

New-style shit.

Old-style goodie.

Old-style goodie.

purge a reddit account

August 2, 2016 Leave a comment

Problem
You have a reddit account that you want to empty, i.e. delete all the posts and comments you have made.

Solution
Use Shreddit. It deletes a limited number of posts/comments in a session, so you may have to re-run it several times. When it cannot remove anything, then it’s done.

Categories: python Tags: ,

Jinja2-like template for PHP

August 1, 2016 Leave a comment

Problem
My primary language is Python. When I need to do a simple webpage or a REST API, I use Flask with its built-in Jinja2 template engine.

However, I started to work on a project with some friends and our UI developer chose PHP for the frontend. As I also want to contribute to the UI, I looked around the PHP template engines if there is someting similar to Jinja2.

Solution
It turned out that Jinja2 was ported to PHP! It’s called Twig and it’s almost the same. So if you use Flask, Twig is a natural choice for PHP.

There are also several MVC frameworks for PHP but I don’t use any (yet?). I have a PHP file (the controller), and a corresponding HTML file (the view, i.e. the template). Let’s see a simple example:

index.php:

<?php
require_once 'vendor/twig/lib/Twig/Autoloader.php';
Twig_Autoloader::register();

$loader = new Twig_Loader_Filesystem('templates');
$twig = new Twig_Environment($loader, array(
    // 'cache' => 'compilation_cache',
));

$context = array(
    'name' => 'Twig',
);

echo $twig->render('index.html', $context);
?>

index.html:

Hello {{ name }}!

It will print the text “Hello Twig!” to the screen.

What happens? The index.php file is the controller. Here you collect all the data that you want to print in the resulting HTML output. These data are put in a hash table (dictionary), and it’s passed to the template file index.html.

You can enable the cache in the index.php file. In this case the view will be “compiled” to a PHP file, making it faster. However, during the development you’d better switch it off. As I noticed, when I change the source code, the cache is not always updated automatically. So if you enable the cache and change the source, don’t forget to purge the cache.

Project layout
My project structure looks like this:

.
├── compilation_cache
├── index.php
├── templates
│   └── index.html
└── vendor
    └── twig
        └── lib
            └── Twig
                └── Autoloader.php
                └── ... (other files of the Twig template engine)

For security reasons, I think it’s a good idea to move the “vendor” folder somewhere else that is not accessible via the http protocol. That is, if your project is served from your “~/public_html” folder, move the “vendor” folder outside of “~/public_html“. I’m not sure but it may be true for the “compilation_cache” folder too.

Categories: php, python Tags: , ,

Scraping AJAX web pages (Part 5.5)

July 13, 2016 Leave a comment

Don’t forget to check out the rest of the series too!

This post is very similar to the previous one (Part 5), which scraped a webpage using PhantomJS from the command line and sent the output to the stdout.

This time we use PhantomJS again, but we do it from a Python script and wrap Selenium around PhantomJS. The generated HTML source will be available in a variable. Here is the source:

#!/usr/bin/env python3
# encoding: utf-8

"""
required packages:
* selenium
optional packages:
* bs4
* lxml
"""

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# from bs4 import BeautifulSoup

url = "http://simile.mit.edu/crowbar/test.html"

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
    "(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get(url)
html = driver.page_source
print(html)
# soup = BeautifulSoup(driver.page_source, "lxml") #page_source fetches page after rendering is complete
# driver.save_screenshot('screen.png') # save a screenshot to disk

driver.quit()

The script sets the user agent (optional but recommended). The source is captured in a variable. The last two lines are in comments but they would work. You could feed the source to BeautifulSoup and then you could extract part of the HTML source. If you uncomment the last line, then you can create a screenshot of the webpage.

email notification from a script

June 15, 2016 Leave a comment

Problem
I have a Digital Ocean VPS box where several scripts are running. Some of them run for a day. I would like to get an email notification when a particular script starts / ends, or when something happens.

In short: how to send an email from the command line?

Solution
First, do the necessary configuration to be able to send emails from the command line (more details here).

Sending email without a body:

mailx -s "subject" < /dev/null "to@email.com" 2>/dev/null

Sending email with a body:

echo 'this is the body of the email' | mailx -s "subject" "to@email.com" 2>/dev/null

I also made a Python wrapper for it that you can find here.

Categories: bash, python Tags: ,

[vim] run current file with Python

Problem
You use (neo)vim for editing your Python code and you want to execute the source code in your editor. The output of the script should appear in the editor.

Solution
I came up with a dynamic solution, i.e. the interpreter is taken from the first line of the code. If you specified “#!/usr/bin/env python2“, then python2 is used; if you have “#!/usr/bin/env python3“, then python3 is used.

But what if you use Anaconda and you have for instance “#!/opt/anaconda3/bin/python3” in the first line? Then simply this interpreter is used.

Here is the snippet from my config file:

" run python script {{{
    function! RunWithPython()
        let first = getline(1)
        let first = substitute(first, "^#!", "", "")
        let first = substitute(first, "\n", "", "")
        let exe = ""    " the Python binary to call

        if first =~ "/usr/bin/env "
            let exe = split(first)[-1]
        elseif first == "/opt/anaconda3/bin/python3"
            let exe = first
        endif
        if exe == ""
            echo "Error: unknown Python interpreter in the first line."
            return
        endif
        " echo exe
        echo system(exe . " " . expand('%'))
    endfunction

    au FileType python nnoremap <buffer> <F9> :call RunWithPython()<cr>
" }}}

If you want to use Anaconda, then simply customize line 10.

Categories: python, vim Tags: