Home > project, python, wordpress > Convert your wordpress blog to a PDF book

Convert your wordpress blog to a PDF book

Problem

You have a wordpress blog (.com or .org), and you would like to convert the whole blog to a PDF book. That is, convert every post to PDF and then join the pieces. The final result should be a single PDF, like a book.

Related work

An easy, simple, and free solution is offered by LJBook. Just upload your exported blog, and they generate a single PDF out of it.
However, I had a problem with it. My blog contains lots of source codes and unfortunately those blocks are not treated correctly by LJBook. So, I had to find another solution.

My solution

Here is a sample PDF and my whole blog (up to March 6, 2011). With my method, you can generate such an output.

The current version of the script (written in Python) is available here.

Steps to follow:

  • Download the script above and put it in a directory. In this directory, create a subdirectory called “pieces“. The script will download the HTML files here, and the PDF outputs are also stored in this subdirectory.
  • Customize the beginning of the script: blog name, username, password, etc.
  • The HTML to PDF conversion is done with WKhtmlToPDF. Here you will find more info about this tool and how to get it. Download it and store the binary here: /opt/wkhtmltopdf/wkhtmltopdf-i386.
  • Optional: disable the side bar on your wordpress blog. I don’t think you want to see the side bar on each page in the PDF book :) Refer to this post to figure out how to hide the side bar.
  • Now everything is set, you can launch the script. If everything is fine then the script will download each public post on your blog and convert them to PDF. Warning! When you launch the script, it will delete all *.html and *.pdf files in the directory “pieces“!
  • Once you have all the PDFs, enter the directory “pieces” and join the PDFs: “pdftk *.pdf cat output book.pdf“. If you don’t have pdftk, install it (sudo apt-get install pdftk).
  • When ready, don’t forget to set back the side bar on your blog.
  • You might want to edit the final PDF. It is almost sure that it will contain some empty pages; you can remove them with a PDF editor.
About these ads
  1. April 20, 2011 at 23:35

    Wow :) Me gonna try this for my blog ;)

    • April 20, 2011 at 23:39

      Let me know in a comment if your PDF is ready to be downloaded :)

  2. Dev
    May 11, 2011 at 10:47

    6.63MB only and yet I already have a copy of my whole blog? I’ll take that any day. This is purely gold if you ask me, so thanks a lot for bringing this up and sharing it with all of us. It’s much appreciated Jabba Laci! I’ve slow internet connection so downloading it from dropbox isn’t so fast as you’ve thought, but I’m dead excited to see it!

    ETA: Wow, finished downloading and I must say it looks perfect! The links are even retained too. SWEET!

    Regards,
    Dev from Convert from JPG to PDF

    • May 11, 2011 at 10:55

      I’m glad you like it :) Remember, merging the pieces with pdftk will result in a huge PDF file. I tried with Adobe Acrobat 8 Professional too, and it makes a very small PDF compared to pdftk. However, AA8P is for Windows and not free…

  3. November 13, 2012 at 17:18

    has anyone tried blogbooker to export blogger, wordpress into pdf book?

  4. June 20, 2013 at 02:02

    It appears python2.7 doesn’t like this for some reason. Do you have any ideas? Thanks,

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 72 other followers

%d bloggers like this: