How to download Files from the Web with the requests Module

The requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. The requests module doesn’t come
with Python, so you’ll have to install it first. From the command line, run pip3 install requests.

Next, do a simple test to make sure the requests module installed itself correctly. Enter the following into the interactive shell:

(my_env) [[email protected] web_scraping]$ python3 
Python 3.7.4 (default, Jul 9 2019, 16:32:37) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>>

If no error messages show up, then the requests module has been successfully installed.

Downloading a Web Page with the requests.get() Function

The requests.get() function takes a string of a URL to download. By calling type() on requests.get() ’s return value, you can see that it returns a Response object, which contains the response that the webserver gave for your request.

(my_env) [[email protected] web_scraping]$ python3
Python 3.7.4 (default, Jul 9 2019, 16:32:37) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> res = requests.get('https://www.flyhiee.com/file_name')
>>> type(res)
<class 'requests.models.Response'>
>>> res.status_code == requests.codes.ok
False #Beacuse bad downlaod happened
>>> len(res.text)
29532
>>> print(res.text[:300])
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="profile" href="http://gmpg.org/xfn/11">
<link rel="pingback" href="https://www.flyhiee.com/xmlrpc.php">
<link type="text/css" media="all" href="https://ww

Checking for Errors

A simpler way to check for success is to call the raise_for_status() method on the Response object. This will raise an exception if there was an error downloading the file and will do nothing if the download succeeded.

>> res.raise_for_status()
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/home/Aps/workspace/web_scraping/my_env/lib/python3.7/site-packages/requests/models.py”, line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://www.flyhiee.com/file_name

The raise_for_status() method is a good way to ensure that a program halts if a bad download occurs.

you can wrap the raise_for_status() line with try and except statements to handle this error case without crashing.

> import requests
>>> res = requests.get(‘https://www.flyhiee.com/public_html/admin.php’)
>>> try:
…       res.raise_for_status()
… except Exception as exc:
…      print(‘There was a broblem: %s’ % (exc))

There was a broblem: 404 Client Error: Not Found

Always call raise_for_status() after calling requests.get() . You want to be sure that the download has actually worked before your program continues.

Saving Downloaded Files to the Hard Drive

From here, you can save the web page to a file on your hard drive with the standard open() function and write() method. There are some slight differences, though. First, you must open the file to write binary mode by passing
the string ‘wb’ as the second argument to open().

To write the web page to a file, you can use a for loop with the Response object’s iter_content() method.

>> import requests
>>> res = requests.get(‘https://automatetheboringstuff.com/files/rj.txt’)
>>> res.raise_for_status()
>>> playFile = open(‘RomeoAndJuliet.txt’, ‘wb’)
>>> for chunks in res.iter_content(100000):
… playFile.write(chunks)

100000
78978
>>> playFile.close()

Second Method Using ‘with’


>> import requests
>>> res = requests.get(‘https://automatetheboringstuff.com/files/rj.txt’)
>>> with open(‘text.txt’, ‘wb’) as myfile:
… for chunks in res.iter_content(100000):
… myfile.write(chunks)

100000
78978

The iter_content() method returns “chunks” of the content on each iteration through the loop. Each chunk is of the bytes data type, and you get to specify how many bytes each chunk will contain.

To review, here’s the complete process for downloading and saving a file:

  1. Call requests.get() to download the file.
  2. Call open() with ‘wb’ to create a new file in write binary mode.
  3. Loop over the Response object’s iter_content() method.
  4. Call write() on each iteration to write the content to the file.
  5. Call close() to close the file. (You don’t need to close() if you are using “with”)

Leave a Reply