Working with files in Python

Files

Note: Take care of Indentation in codes.

While a program is running, its data is stored in random access memory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time the computer is turned on and the program is started, it has to be written to a non-volatile storage medium, such a hard drive, usb drive, or CD-RW.

Data on non-volatile storage media is stored in named locations called files. By reading and writing files, programs can save information between program runs.

Working with files is a lot like working with a notebook. To use a notebook, it has to be opened. When done, it has to be closed. While the notebook is open, it can either be read from or written to. In either case, the notebook holder knows where they are. They can read the whole notebook in its natural order or they can skip around.

All of this applies to files as well. To open a file, we specify its name and indicate whether we want to read or write.

Writing our first file

Let’s begin with a simple program that writes three lines of text into a file:

with open(“test.txt”, “w”) as myfile:

myfile.write(“My first file written from Python\n”)

myfile.write(“———————————\n”)

myfile.write(“Hello, world!\n”)

Opening a file creates what we call a file handle. In this example, the variable myfile refers to the new handle object. Our program calls methods on the handle, and this makes changes to the actual file which is usually located on

our disk.

On line 1, the open function takes two arguments. The first is the name of the file, and the second is the mode. Mode “w” means that we are opening the file for writing.

With mode “w”, if there is no file named test.txt on the disk, it will be created. If there already is one, it will be replaced by the file we are writing.

A with block make sure that the file get close even if an error occurs (power outages excluded).

Reading a file line-at-a-time

Now that the file exists on our disk, we can open it, this time for reading, and read all the lines in the file, one at a time. This time, the mode argument is “r” for reading:

with open(“test.txt”, “r”) as my_new_handle:

for the_line in my_new_handle:

#Do something with the line we just read.

#Here we just print it.

    print(the_line, end=””)

we suppress the newline character that print usually appends to our strings with end=””. Why? This is because the string already has its own newline: the for statement in line reads everything up to and including the newline character.

If we try to open a file that doesn’t exist, we get an error:

>>> mynewhandle = open(“wharrah.txt”, “r”)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

FileNotFoundError: [Errno 2] No such file or directory: ‘wharrah.txt’

Turning a file into a list of lines

It is often useful to fetch data from a disk file and turn it into a list of lines. Suppose we have a file containing our friends and their email addresses, one per line in the file. But we’d like the lines sorted into alphabetical order. A good plan is to read everything into a list of lines, then sort the list, and then write the sorted list back to another file:

with open(“friends.csv”, “r”) as input_file:

all_lines = input_file.readlines()

all_lines.sort()

with open(“sortedfriends.csv”, “w”) as output_file:

for line in all_lines:

    output_file.write(line)

The readlines method in line reads all the lines and returns a list of the strings.

Reading the whole file at once

Another way of working with text files is to read the complete contents of the file into a string, and then to use our string-processing skills to work with the contents.

We’d normally use this method of processing files if we were not interested in the line structure of the file.

with open(“somefile.txt”) as f:

content = f.read()

words = content.split()

print(“There are {0} words in the file.”.format(len(words)))

Notice here that we left out the “r” mode in line 1. By default, if we don’t supply the mode, Python opens the file for reading.

Many useful line-processing programs will read a text file line-at-a-time and do some minor processing as they write the lines to an output file. They might number the lines in the output file, or insert extra blank lines after every 60 lines to make it convenient for printing on sheets of paper, or extract some specific columns only from each line in the source file, or only print lines that contain a specific substring. We call this kind of program a filter.

Here is a filter that copies one file to another, omitting any lines that begin with #:

def filter(oldfile, newfile):

with open(oldfile, “r”) as infile, open(newfile, “w”) as outfile:

for line in infile:

# Put any processing logic here

    if not line.startswith(‘#’):

        outfile.write(line)

Directories

Files on non-volatile storage media are organized by a set of rules known as a file system. File systems are made up of files and directories, which are containers for both files and other directories.

When we create a new file by opening it and writing, the new file goes in the current directory (wherever we were when we ran the program). Similarly, when we open a file for reading, Python looks for it in the current directory.

If we want to open a file somewhere else, we have to specify the path to the file, which is the name of the directory (or folder) where the file is located:

>>> wordsfile = open(“/usr/share/dict/words”, “r”)

>>> wordlist = wordsfile.readlines()

>>> print(wordlist[:6])

[‘\n’, ‘A\n’, “A’s\n”, ‘AOL\n’, “AOL’s\n”, ‘Aachen\n’]

We cannot use / or \ as part of a filename; they are reserved as a delimiter between directory and filenames.

What about fetching something from the web?

The Python libraries are pretty messy in places. But here is a very simple example that copies the contents at some web URL to a local file.

import urllib.request

url = “https://rekhasahay.wordpress.com/”

destination_filename = “aps.txt”

urllib.request.urlretrieve(url, destination_filename)

The urlretrieve function — just one call — could be used to download any kind of content from the Internet.

We’ll need to get a few things right before this works:

The resource we’re trying to fetch must exist! Check this using a browser.

We’ll need permission to write to the destination filename, and the file will be created in the “current directory” – i.e. the same folder that the Python program is saved in.

If we are behind a proxy server that requires authentication, (as some students are), this may require some more special handling to work around our proxy. Use a local resource for the purpose of this demonstration!

Here, rather than save the web resource to our local disk, we read it directly into a string, and we print that string:

import requests

url = “https://womenwithgifts.org/”

response = requests.get(url)

print(response.text)

Opening the remote URL returns the response from the server. That response contains several types of information, and the requests module allows us to access them in various ways. On line 5, we get the downloaded document as a single string. We could also read it line by line as follows:

import requests

url = “http://kashmiralad.com/”

response = requests.get(url)

for line in response:

    print(line)

Leave a Reply