Loading...

Download a list of url's in Python

This article covers how to download a url in python.

There are 2 possibilities:

  • wget
  • urllib

wget

To download a file you can use the os.system module and use wget of the Linux operating system. This won’t work for Windows directly. You may install wget for Windows or using cygwin. [python] import os h = os.popen(‘wget -q -O foo1.txt http://foo.html’) h.close() s = open(‘foo1.txt’).read() [/python] The option -q in wget is quiet, i.e. it turns off wget’s output. Use it if you don’t want to see the output. For example you have a text file with links like download.txt.

http://media.cinhtau.net/01.jpg
http://media.cinhtau.net/02-03.jpg
http://media.cinhtau.net/04.jpg
http://media.cinhtau.net/05.jpg
http://media.cinhtau.net/06-07.jpg
http://media.cinhtau.net/08.jpg
http://media.cinhtau.net/09.jpg
http://media.cinhtau.net/10-11.jpg
http://media.cinhtau.net/12.jpg
http://media.cinhtau.net/13.jpg
http://media.cinhtau.net/14.jpg
http://media.cinhtau.net/15.jpg
http://media.cinhtau.net/16.jpg
http://media.cinhtau.net/17.jpg

Now you want do download each link in this file, you write a small python program that reads the file contents and do the work with wget for you. [python] author=”tan” date =”$Jul 05, 2009 9:38:04 AM$” import os if name == “main”: print “Download”; from optparse import OptionParser parser = OptionParser() parser.add_option(“-f”, “–file”, dest=”file”) (options, args) = parser.parse_args() if len(args) < 0: parser.error(“We need a download list!”)

reading contents

file = open(options.file, “r”) try: for line in file: line = line.rstrip(‘\n’) #now download link h = os.popen(‘wget ‘ + line) h.close() finally: file.close() [/python] Now invoke the python programme with this option and enjoy the work.

python download.py -f download.txt

urllib

Another possibility is to use the ‘‘urllib’’ module with equivalent functions of wget. [python] import sys, urllib def reporthook(*a): print a for url in sys.argv[1:]: i = url.rfind(‘/’) file = url[i+1:] print url, “->”, file urllib.urlretrieve(url, file, reporthook) [/python]