How to use urlretrieve in Python 3

Posted by jane on Wed, 01 Apr 2020 17:32:56 +0200

The urlretrieve() function provided by the urllib module. The urlretrieve () method downloads the remote data directly to the local.

urlretrieve(url, filename=None, reporthook=None, data=None)

  • The parameter filename specifies the save local path (if the parameter is not specified, urllib will generate a temporary file to save the data.)
  • The parameter reporthook is a callback function, which will trigger when the server is connected and the corresponding data block is transferred. We can use this callback function to display the current download progress.
  • Parameter data refers to the data of the post import server. This method returns a tuple containing two elements (filename, headers). Filename represents the path saved to the local, and header represents the response header of the server
Grab baidu's html locally, save it in the file '. / baidu.html', and display the download progress at the same time.

  1. #!/usr/bin/env python  
  2. # coding=utf-8  
  3. import os  
  4. import urllib  
  5.   
  6. def cbk(a,b,c):  
  7.     '''''Callback function 
  8.     @a:Downloaded data block 
  9.     @b:Block size 
  10.     @c:Size of the remote file 
  11.     '''  
  12.     per=100.0*a*b/c  
  13.     if per>100:  
  14.         per=100  
  15.     print '%.2f%%' % per  
  16.   
  17. url='http://www.baidu.com'  
  18. dir=os.path.abspath('.')  
  19. work_path=os.path.join(dir,'baidu.html')  
  20. urllib.urlretrieve(url,work_path,cbk)  
  1. #!/usr/bin/env python  
  2. # coding=utf-8  
  3. import os  
  4. import urllib  
  5.   
  6. def cbk(a,b,c):  
  7.     '''''Callback function 
  8.     @a:Downloaded data block 
  9.     @b:Block size 
  10.     @c:Size of the remote file 
  11.     '''  
  12.     per=100.0*a*b/c  
  13.     if per>100:  
  14.         per=100  
  15.     print '%.2f%%' % per  
  16.   
  17. url='http://www.baidu.com'  
  18. dir=os.path.abspath('.')  
  19. work_path=os.path.join(dir,'baidu.html')  
  20. urllib.urlretrieve(url,work_path,cbk)  


The following is an example of urlretrieve() downloading file, which can show the download progress.

  1. #!/usr/bin/env python  
  2. # coding=utf-8  
  3. import os  
  4. import urllib  
  5.   
  6. def cbk(a,b,c):  
  7.     '''''Callback function 
  8.     @a:Downloaded data block 
  9.     @b:Block size 
  10.     @c:Size of the remote file 
  11.     '''  
  12.     per=100.0*a*b/c  
  13.     if per>100:  
  14.         per=100  
  15.     print '%.2f%%' % per  
  16.   
  17. url='http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2'  
  18. dir=os.path.abspath('.')  
  19. work_path=os.path.join(dir,'Python-2.7.5.tar.bz2')  
  20. urllib.urlretrieve(url,work_path,cbk)  
  1. #!/usr/bin/env python  
  2. # coding=utf-8  
  3. import os  
  4. import urllib  
  5.   
  6. def cbk(a,b,c):  
  7.     '''''Callback function 
  8.     @a:Downloaded data block 
  9.     @b:Block size 
  10.     @c:Size of the remote file 
  11.     '''  
  12.     per=100.0*a*b/c  
  13.     if per>100:  
  14.         per=100  
  15.     print '%.2f%%' % per  
  16.   
  17. url='http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tar.bz2'  
  18. dir=os.path.abspath('.')  
  19. work_path=os.path.join(dir,'Python-2.7.5.tar.bz2')  
  20. urllib.urlretrieve(url,work_path,cbk)  


urlopen() can easily obtain the remote html page information, and then use the Python Analyze the required data regularly, match the desired data, and then use urlretrieve() to download the data to the local. For the remote url address with limited access or limited number of connections, you can use proxies to connect. If the remote connection data is too large and the single thread download is too slow, you can use multi-threaded download. This is the legendary crawler

Topics: Python ftp