Archive

Posts Tagged ‘download’

How to download files in parallel?

October 10, 2017 Leave a comment

Problem
So far I’ve mostly used wget to download files. You can collect the URLs in a file (one URL per line), and pass it to wget: “wget -i list.txt“. However, wget fetches them one by one, which can be time consuming. Is there a way to parallelize the download?

Solution
Use Aria2 for this purpose. It’s similar to wget, but it spawns several threads and starts to download the files in parallel. The number of worker threads has a default value, but you can also change that. Its basic usage is the same:

aria2c -i list.txt
Advertisements
Categories: bash Tags: , , , ,

download tube videos

November 14, 2014 Leave a comment

http://www.tubeoffline.com/

They support a lot of tube sites.

Categories: Uncategorized Tags: ,

mirror a website with wget

June 15, 2014 Leave a comment

Problem
I want to crawl a webpage recursively and download it for local usage.

Solution

wget -c --mirror -p --html-extension --convert-links --no-parent --reject "index.html*" $url

The options:

  •  -c: continue (if you stop the process with CTRL+C and relaunch it, it will continue)
  • --mirror: turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings
  • -p: get all images, etc. needed to display HTML page
  • --html-extension: save HTML docs with .html extensions
  • --convert-links: make links in downloaded HTML point to local files
  • --no-parent: Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
  • --reject "index.html*": don’t download index.html* files

Credits
Tips from here.

Get URLs only
If you want to spider a website and get the URLs only, check this post out. In short:

wget --spider --force-html -r -l2 $url 2>&1 | grep '^--' | awk '{ print $3 }'

Where -l2 specifies recursion maximum depth level 2. You may have to change this value.

Categories: bash Tags: , , , ,

How to download an entire website for off-line reading?

November 30, 2012 Leave a comment

Problem

You want to download an entire website (e.g. a blog) for offline reading.

Solution

wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

Tip from here.

Categories: bash Tags: , , ,

Firefox: get rid of that annoying download pop-up window

May 13, 2012 1 comment
Categories: firefox Tags:

Download all issues of Full Circle Magazine

January 27, 2011 5 comments

Problem

You want to get all the issues of Full Circle Magazine but you don’t want to download ’em one by one. Is there an easy and painless way to get them in a bundle?

Solution

Here are the necessary URLs till issue 47:

http://dl.fullcirclemagazine.org/issue0_en.pdf
http://dl.fullcirclemagazine.org/issue1_en.pdf
http://dl.fullcirclemagazine.org/issue2_en.pdf
http://dl.fullcirclemagazine.org/issue3_en.pdf
http://dl.fullcirclemagazine.org/issue4_en.pdf
http://dl.fullcirclemagazine.org/issue5_en.pdf
http://dl.fullcirclemagazine.org/issue6_en.pdf
http://dl.fullcirclemagazine.org/issue7_en.pdf
http://dl.fullcirclemagazine.org/issue8_en.pdf
http://dl.fullcirclemagazine.org/issue9_en.pdf
http://dl.fullcirclemagazine.org/issue10_en.pdf
http://dl.fullcirclemagazine.org/issue11_en.pdf
http://dl.fullcirclemagazine.org/issue12_en.pdf
http://dl.fullcirclemagazine.org/issue13_en.pdf
http://dl.fullcirclemagazine.org/issue14_en.pdf
http://dl.fullcirclemagazine.org/issue15_en.pdf
http://dl.fullcirclemagazine.org/issue16_en.pdf
http://dl.fullcirclemagazine.org/issue17_en.pdf
http://dl.fullcirclemagazine.org/issue18_en.pdf
http://dl.fullcirclemagazine.org/issue19_en.pdf
http://dl.fullcirclemagazine.org/issue20_en.pdf
http://dl.fullcirclemagazine.org/issue21_en.pdf
http://dl.fullcirclemagazine.org/issue22_en.pdf
http://dl.fullcirclemagazine.org/issue23_en.pdf
http://dl.fullcirclemagazine.org/issue24_en.pdf
http://dl.fullcirclemagazine.org/issue25_en.pdf
http://dl.fullcirclemagazine.org/issue26_en.pdf
http://dl.fullcirclemagazine.org/issue27_en.pdf
http://test.fullcirclemagazine.org/wp-content/uploads/2009/08/fullcircle-issue28-eng1.pdf
http://dl.fullcirclemagazine.org/issue29_en.pdf
http://dl.fullcirclemagazine.org/issue30_en.pdf
http://dl.fullcirclemagazine.org/issue31_en.pdf
http://dl.fullcirclemagazine.org/issue32_en.pdf
http://dl.fullcirclemagazine.org/issue33_en.pdf
http://dl.fullcirclemagazine.org/issue34_en.pdf
http://dl.fullcirclemagazine.org/issue35_en.pdf
http://dl.fullcirclemagazine.org/issue36_en.pdf
http://dl.fullcirclemagazine.org/issue37_en.pdf
http://dl.fullcirclemagazine.org/issue38_en.pdf
http://dl.fullcirclemagazine.org/issue39_en.pdf
http://dl.fullcirclemagazine.org/issue40_en.pdf
http://dl.fullcirclemagazine.org/issue41_en.pdf
http://dl.fullcirclemagazine.org/issue42_en.pdf
http://dl.fullcirclemagazine.org/issue43_en.pdf
http://dl.fullcirclemagazine.org/issue44_en.pdf
http://dl.fullcirclemagazine.org/issue45_en.pdf
http://dl.fullcirclemagazine.org/issue46_en.pdf
http://dl.fullcirclemagazine.org/issue47_en.pdf

Save it to a file called down.txt, then download them all:

wget -i down.txt

Update (20110130) #1:

Unfortunately, issues below 10 are named as issueX_en.pdf and not as issue0X_en.pdf. Thus, if you download all the files and list them with ‘ls -al‘, issues < 10 will be mixed with the others. Here is how to fix it:

rename -n 's/issue(\d)_en(.*)/issue0$1_en$2/' *.pdf

It will just print the renames (without executing them). If the result is OK, remove the ‘-n‘ switch and execute the command again. Now the files will be renamed in order.

Update (20110130) #2:

This post was taken over by Ubuntu Life, and the user Capitán suggested an easier solution in a comment over there:

wget http://dl.fullcirclemagazine.org/issue{0..45}_en.pdf

I didn’t know about this wget feature :) Now I see why issues < 10 are named as issueX_en.pdf and not as issue0X_en.pdf

Update (20110203): That {0..45} thing is actually expanded by bash, not by wget! See this post for more info.

Update (20110130) #3:

Another reader of Ubuntu Life, marco, suggests a bash script solution:

for i in {0..45}
do 
wget http://dl.fullcirclemagazine.org/issue${i}_en.pdf; 
done

Or, in one line:

for i in {0..45}; do wget http://dl.fullcirclemagazine.org/issue${i}_en.pdf; done

Increase the number of simultaneous downloads in Vuze

January 9, 2011 1 comment

Problem

You use Vuze (formerly Azureus) for downloading torrents and you have a nice connection speed, but still, Vuze downloads only 4 or 5 torrents simultaneously. Why? You want to start all the torrents in the list…

Solution

The option to increase the number of simultaneous downloads is a bit hidden. Go to Tools -> Options…, and choose Queue. Here, the first entry is what we need: “Max. simultaneous downloads [0: unlimited]”. Set the value to 0 and hit Apply.

Categories: Uncategorized Tags: , ,