Reading Data

Overview:

  • Teaching: 15 min
  • Exercises: 10 min

Questions

  • How can I read data into my program?
  • What libraries can I use to help me read in data?

Objectives

  • Use input to read from the keyboard.
  • Know that you need to open files and then read the contents.
  • Use libraries to read in files in standard formats.

Read from the keyboard

Occasionally you will want your user to input data via the keyboard, for instance this might be the name of a file to analyse. We can do this using the function input:

text = input("Enter some text: ")
print(text)

You will need to enter the string when you are prompted into the command line (or cell output if using Jupyter), and hit the return key to confirm your choice.

The value read in by input is always a string, we can check this with type.

Input

We can ask our users to input text and read it from the keyboard using the function input. The value read in will always be a string and if we want to use text as a number we will first have to convert it with int or float.

But all my data is in files

Most of the time the data we want to read will be in files. As an example, we will use the data files provided in the library you cloned from notebooks.azure.com (or alternatively downloaded as described in Setup).

Setup+: Extract data files

Depending on your choices in the setup, you will need to take the following steps obtain some data files.

JupyterHub Users

For those using JupterHub we set this up in the setup lesson so you DO NOT NEED to run the following cell, but we leave it here, to explain what we did in the set up lesson.

In [1]:
!unzip ./data/python-novice-inflammation-data.zip -d ./data
Archive:  ./data/python-novice-inflammation-data.zip
   creating: ./data/data/
  inflating: ./data/data/inflammation-01.csv  
  inflating: ./data/data/inflammation-02.csv  
  inflating: ./data/data/inflammation-03.csv  
  inflating: ./data/data/inflammation-04.csv  
  inflating: ./data/data/inflammation-05.csv  
  inflating: ./data/data/inflammation-06.csv  
  inflating: ./data/data/inflammation-07.csv  
  inflating: ./data/data/inflammation-08.csv  
  inflating: ./data/data/inflammation-09.csv  
  inflating: ./data/data/inflammation-10.csv  
  inflating: ./data/data/inflammation-11.csv  
  inflating: ./data/data/inflammation-12.csv  
 extracting: ./data/data/small-01.csv  
 extracting: ./data/data/small-02.csv  
 extracting: ./data/data/small-03.csv  

The ! runs standard bash commands instead of Python (more details on how to execute shell commands within a Python interpreter you can be found in the next lesson).

This will create a subfolder in the data folder, also called data, where the .csv files we will need are located. You will need to provide the relative path to these files when opening them in Python, if the notebook you are running commands in is not in the data/data folder too.

Local Installs

For those of you who installed Python locally, make sure you have downloaded the .zip files that are available from the setup lesson, and have extracted them. For consistency with the JupyterHub setup above, we recommend you create a data folder in the directory you are working in, then extract the .zip files there. This will give you the same data/data folder as above.

You are welcome to extract the files to an alternative location on your computer, but bear in mind that the relative paths that you will need to specify when opening them will be different!

Next, in order to read from a file we first need to open the file:

In [2]:
file = open("./data/data/inflammation-01.csv")

Note that we have prepended the filename with ./data/data/ - this is a relative path to the file inflammation-01.csv from the directory our notebook is running in. You will have to do something similar depending on where you saved the data files, and depending on the directory that you are currently running Python in. If the file is in the same directory as you are currently running/working in, you don't need to prepend anything.

When we open the file it is a bit like picking a book off the shelf and opening it at the first page. We have not yet read in any of the data contained in the file. In order to do this we must read from the file. In Python and many languages we can do this by reading each line of the file in turn.

In [3]:
line = file.readline()
print(line)
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0

Python allows us to treat the file as a collection of lines which it reads in automatically so we can use the more readable form:

In [4]:
for line in file:
    print(line)
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1

0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1

0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1

0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1

0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1

0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1

0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1

0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0

0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0

0,1,0,0,4,3,3,5,5,4,5,8,7,10,13,3,7,13,15,18,8,15,15,16,11,14,12,4,10,10,4,3,4,5,5,3,3,2,2,1

0,1,0,0,3,4,2,7,8,5,2,8,11,5,5,8,14,11,6,11,9,16,18,6,12,5,4,3,5,7,8,3,5,4,5,5,4,0,1,1

0,0,2,1,4,3,6,4,6,7,9,9,3,11,6,12,4,17,13,15,13,12,8,7,4,7,12,9,5,6,5,4,7,3,5,4,2,3,0,1

0,0,0,0,1,3,1,6,6,5,5,6,3,6,13,3,10,13,9,16,15,9,11,4,6,4,11,11,12,3,5,8,7,4,6,4,1,3,0,0

0,1,2,1,1,1,4,1,5,2,3,3,10,7,13,5,7,17,6,9,12,13,10,4,12,4,6,7,6,10,8,2,5,1,3,4,2,0,2,0

0,1,1,0,1,2,4,3,6,4,7,5,5,7,5,10,7,8,18,17,9,8,12,11,11,11,14,6,11,2,10,9,5,6,5,3,4,2,2,0

0,0,0,0,2,3,6,5,7,4,3,2,10,7,9,11,12,5,12,9,13,19,14,17,5,13,8,11,5,10,9,8,7,5,3,1,4,0,2,1

0,0,0,1,2,1,4,3,6,7,4,2,12,6,12,4,14,7,8,14,13,19,6,9,12,6,4,13,6,7,2,3,6,5,4,2,3,0,1,0

0,0,2,1,2,5,4,2,7,8,4,7,11,9,8,11,15,17,11,12,7,12,7,6,7,4,13,5,7,6,6,9,2,1,1,2,2,0,1,0

0,1,2,0,1,4,3,2,2,7,3,3,12,13,11,13,6,5,9,16,9,19,16,11,8,9,14,12,11,9,6,6,6,1,1,2,4,3,1,1

0,1,1,3,1,4,4,1,8,2,2,3,12,12,10,15,13,6,5,5,18,19,9,6,11,12,7,6,3,6,3,2,4,3,1,5,4,2,2,0

0,0,2,3,2,3,2,6,3,8,7,4,6,6,9,5,12,12,8,5,12,10,16,7,14,12,5,4,6,9,8,5,6,6,1,4,3,0,2,0

0,0,0,3,4,5,1,7,7,8,2,5,12,4,10,14,5,5,17,13,16,15,13,6,12,9,10,3,3,7,4,4,8,2,6,5,1,0,1,0

0,1,1,1,1,3,3,2,6,3,9,7,8,8,4,13,7,14,11,15,14,13,5,13,7,14,9,10,5,11,5,3,5,1,1,4,4,1,2,0

0,1,1,1,2,3,5,3,6,3,7,10,3,8,12,4,12,9,15,5,17,16,5,10,10,15,7,5,3,11,5,5,6,1,1,1,1,0,2,1

0,0,2,1,3,3,2,7,4,4,3,8,12,9,12,9,5,16,8,17,7,11,14,7,13,11,7,12,12,7,8,5,7,2,2,4,1,1,1,0

0,0,1,2,4,2,2,3,5,7,10,5,5,12,3,13,4,13,7,15,9,12,18,14,16,12,3,11,3,2,7,4,8,2,2,1,3,0,1,1

0,0,1,1,1,5,1,5,2,2,4,10,4,8,14,6,15,6,12,15,15,13,7,17,4,5,11,4,8,7,9,4,5,3,2,5,4,3,2,1

0,0,2,2,3,4,6,3,7,6,4,5,8,4,7,7,6,11,12,19,20,18,9,5,4,7,14,8,4,3,7,7,8,3,5,4,1,3,1,0

0,0,0,1,4,4,6,3,8,6,4,10,12,3,3,6,8,7,17,16,14,15,17,4,14,13,4,4,12,11,6,9,5,5,2,5,2,1,0,1

0,1,1,0,3,2,4,6,8,6,2,3,11,3,14,14,12,8,8,16,13,7,6,9,15,7,6,4,10,8,10,4,2,6,5,5,2,3,2,1

0,0,2,3,3,4,5,3,6,7,10,5,10,13,14,3,8,10,9,9,19,15,15,6,8,8,11,5,5,7,3,6,6,4,5,2,2,3,0,0

0,1,2,2,2,3,6,6,6,7,6,3,11,12,13,15,15,10,14,11,11,8,6,12,10,5,12,7,7,11,5,8,5,2,5,5,2,0,2,1

0,0,2,1,3,5,6,7,5,8,9,3,12,10,12,4,12,9,13,10,10,6,10,11,4,15,13,7,3,4,2,9,7,2,4,2,1,2,1,1

0,0,1,2,4,1,5,5,2,3,4,8,8,12,5,15,9,17,7,19,14,18,12,17,14,4,13,13,8,11,5,6,6,2,3,5,2,1,1,1

0,0,0,3,1,3,6,4,3,4,8,3,4,8,3,11,5,7,10,5,15,9,16,17,16,3,8,9,8,3,3,9,5,1,6,5,4,2,2,0

0,1,2,2,2,5,5,1,4,6,3,6,5,9,6,7,4,7,16,7,16,13,9,16,12,6,7,9,10,3,6,4,5,4,6,3,4,3,2,1

0,1,1,2,3,1,5,1,2,2,5,7,6,6,5,10,6,7,17,13,15,16,17,14,4,4,10,10,10,11,9,9,5,4,4,2,1,0,1,0

0,1,0,3,2,4,1,1,5,9,10,7,12,10,9,15,12,13,13,6,19,9,10,6,13,5,13,6,7,2,5,5,2,1,1,1,1,3,0,1

0,1,1,3,1,1,5,5,3,7,2,2,3,12,4,6,8,15,16,16,15,4,14,5,13,10,7,10,6,3,2,3,6,3,3,5,4,3,2,1

0,0,0,2,2,1,3,4,5,5,6,5,5,12,13,5,7,5,11,15,18,7,9,10,14,12,11,9,10,3,2,9,6,2,2,5,3,0,0,1

0,0,1,3,3,1,2,1,8,9,2,8,10,3,8,6,10,13,11,17,19,6,4,11,6,12,7,5,5,4,4,8,2,6,6,4,2,2,0,0

0,1,1,3,4,5,2,1,3,7,9,6,10,5,8,15,11,12,15,6,12,16,6,4,14,3,12,9,6,11,5,8,5,5,6,1,2,1,2,0

0,0,1,3,1,4,3,6,7,8,5,7,11,3,6,11,6,10,6,19,18,14,6,10,7,9,8,5,8,3,10,2,5,1,5,4,2,1,0,1

0,1,1,3,3,4,4,6,3,4,9,9,7,6,8,15,12,15,6,11,6,18,5,14,15,12,9,8,3,6,10,6,8,7,2,5,4,3,1,1

0,1,2,2,4,3,1,4,8,9,5,10,10,3,4,6,7,11,16,6,14,9,11,10,10,7,10,8,8,4,5,8,4,4,5,2,4,1,1,0

0,0,2,3,4,5,4,6,2,9,7,4,9,10,8,11,16,12,15,17,19,10,18,13,15,11,8,4,7,11,6,7,6,5,1,3,1,0,0,0

0,1,1,3,1,4,6,2,8,2,10,3,11,9,13,15,5,15,6,10,10,5,14,15,12,7,4,5,11,4,6,9,5,6,1,1,2,1,2,1

0,0,1,3,2,5,1,2,7,6,6,3,12,9,4,14,4,6,12,9,12,7,11,7,16,8,13,6,7,6,10,7,6,3,1,5,4,3,0,0

0,0,1,2,3,4,5,7,5,4,10,5,12,12,5,4,7,9,18,16,16,10,15,15,10,4,3,7,5,9,4,6,2,4,1,4,2,2,2,1

0,1,2,1,1,3,5,3,6,3,10,10,11,10,13,10,13,6,6,14,5,4,5,5,9,4,12,7,7,4,7,9,3,3,6,3,4,1,2,0

0,1,2,2,3,5,2,4,5,6,8,3,5,4,3,15,15,12,16,7,20,15,12,8,9,6,12,5,8,3,8,5,4,1,3,2,1,3,1,0

0,0,0,2,4,4,5,3,3,3,10,4,4,4,14,11,15,13,10,14,11,17,9,11,11,7,10,12,10,10,10,8,7,5,2,2,4,1,2,1

0,0,2,1,1,4,4,7,2,9,4,10,12,7,6,6,11,12,9,15,15,6,6,13,5,12,9,6,4,7,7,6,5,4,1,4,2,2,2,1

0,1,2,1,1,4,5,4,4,5,9,7,10,3,13,13,8,9,17,16,16,15,12,13,5,12,10,9,11,9,4,5,5,2,2,5,1,0,0,1

0,0,1,3,2,3,6,4,5,7,2,4,11,11,3,8,8,16,5,13,16,5,8,8,6,9,10,10,9,3,3,5,3,5,4,5,3,3,0,1

0,1,1,2,2,5,1,7,4,2,5,5,4,6,6,4,16,11,14,16,14,14,8,17,4,14,13,7,6,3,7,7,5,6,3,4,2,2,1,1

0,1,1,1,4,1,6,4,6,3,6,5,6,4,14,13,13,9,12,19,9,10,15,10,9,10,10,7,5,6,8,6,6,4,3,5,2,1,1,1

0,0,0,1,4,5,6,3,8,7,9,10,8,6,5,12,15,5,10,5,8,13,18,17,14,9,13,4,10,11,10,8,8,6,5,5,2,0,2,0

0,0,1,0,3,2,5,4,8,2,9,3,3,10,12,9,14,11,13,8,6,18,11,9,13,11,8,5,5,2,8,5,3,5,4,1,3,1,1,0

This is the equivalent to reading all the lines in a book. If we run this again:

In [5]:
for line in file:
    print(line)

There is no output. It is as if we have reached the end of the book and are stuck. We need to close the book and then we could open it and read it again, or choose another book and read that instead:

In [6]:
file.close()

Pythonic reading

Having to open, read and close files like this was is common in many programming languages. However typically we just want to read in a whole file for processing in one go. Python has a particular structure that allows us to do this in a very compact form which we can combine with the function readlines to read an entire file with one command:

In [7]:
with open("./data/data/inflammation-01.csv") as file:
    read_file=file.readlines()
for line in read_file:
    print(line)
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1
0,0,1,2,2,4,2,1,6,4,7,6,6,9,9,15,4,16,18,12,12,5,18,9,5,3,10,3,12,7,8,4,7,3,5,4,4,3,2,1
0,0,2,2,4,2,2,5,5,8,6,5,11,9,4,13,5,12,10,6,9,17,15,8,9,3,13,7,8,2,8,8,4,2,3,5,4,1,1,1
0,0,1,2,3,1,2,3,5,3,7,8,8,5,10,9,15,11,18,19,20,8,5,13,15,10,6,10,6,7,4,9,3,5,2,5,3,2,2,1
0,0,0,3,1,5,6,5,5,8,2,4,11,12,10,11,9,10,17,11,6,16,12,6,8,14,6,13,10,11,4,6,4,7,6,3,2,1,0,0
0,1,1,2,1,3,5,3,5,8,6,8,12,5,13,6,13,8,16,8,18,15,16,14,12,7,3,8,9,11,2,5,4,5,1,4,1,2,0,0
0,1,0,0,4,3,3,5,5,4,5,8,7,10,13,3,7,13,15,18,8,15,15,16,11,14,12,4,10,10,4,3,4,5,5,3,3,2,2,1
0,1,0,0,3,4,2,7,8,5,2,8,11,5,5,8,14,11,6,11,9,16,18,6,12,5,4,3,5,7,8,3,5,4,5,5,4,0,1,1
0,0,2,1,4,3,6,4,6,7,9,9,3,11,6,12,4,17,13,15,13,12,8,7,4,7,12,9,5,6,5,4,7,3,5,4,2,3,0,1
0,0,0,0,1,3,1,6,6,5,5,6,3,6,13,3,10,13,9,16,15,9,11,4,6,4,11,11,12,3,5,8,7,4,6,4,1,3,0,0
0,1,2,1,1,1,4,1,5,2,3,3,10,7,13,5,7,17,6,9,12,13,10,4,12,4,6,7,6,10,8,2,5,1,3,4,2,0,2,0
0,1,1,0,1,2,4,3,6,4,7,5,5,7,5,10,7,8,18,17,9,8,12,11,11,11,14,6,11,2,10,9,5,6,5,3,4,2,2,0
0,0,0,0,2,3,6,5,7,4,3,2,10,7,9,11,12,5,12,9,13,19,14,17,5,13,8,11,5,10,9,8,7,5,3,1,4,0,2,1
0,0,0,1,2,1,4,3,6,7,4,2,12,6,12,4,14,7,8,14,13,19,6,9,12,6,4,13,6,7,2,3,6,5,4,2,3,0,1,0
0,0,2,1,2,5,4,2,7,8,4,7,11,9,8,11,15,17,11,12,7,12,7,6,7,4,13,5,7,6,6,9,2,1,1,2,2,0,1,0
0,1,2,0,1,4,3,2,2,7,3,3,12,13,11,13,6,5,9,16,9,19,16,11,8,9,14,12,11,9,6,6,6,1,1,2,4,3,1,1
0,1,1,3,1,4,4,1,8,2,2,3,12,12,10,15,13,6,5,5,18,19,9,6,11,12,7,6,3,6,3,2,4,3,1,5,4,2,2,0
0,0,2,3,2,3,2,6,3,8,7,4,6,6,9,5,12,12,8,5,12,10,16,7,14,12,5,4,6,9,8,5,6,6,1,4,3,0,2,0
0,0,0,3,4,5,1,7,7,8,2,5,12,4,10,14,5,5,17,13,16,15,13,6,12,9,10,3,3,7,4,4,8,2,6,5,1,0,1,0
0,1,1,1,1,3,3,2,6,3,9,7,8,8,4,13,7,14,11,15,14,13,5,13,7,14,9,10,5,11,5,3,5,1,1,4,4,1,2,0
0,1,1,1,2,3,5,3,6,3,7,10,3,8,12,4,12,9,15,5,17,16,5,10,10,15,7,5,3,11,5,5,6,1,1,1,1,0,2,1
0,0,2,1,3,3,2,7,4,4,3,8,12,9,12,9,5,16,8,17,7,11,14,7,13,11,7,12,12,7,8,5,7,2,2,4,1,1,1,0
0,0,1,2,4,2,2,3,5,7,10,5,5,12,3,13,4,13,7,15,9,12,18,14,16,12,3,11,3,2,7,4,8,2,2,1,3,0,1,1
0,0,1,1,1,5,1,5,2,2,4,10,4,8,14,6,15,6,12,15,15,13,7,17,4,5,11,4,8,7,9,4,5,3,2,5,4,3,2,1
0,0,2,2,3,4,6,3,7,6,4,5,8,4,7,7,6,11,12,19,20,18,9,5,4,7,14,8,4,3,7,7,8,3,5,4,1,3,1,0
0,0,0,1,4,4,6,3,8,6,4,10,12,3,3,6,8,7,17,16,14,15,17,4,14,13,4,4,12,11,6,9,5,5,2,5,2,1,0,1
0,1,1,0,3,2,4,6,8,6,2,3,11,3,14,14,12,8,8,16,13,7,6,9,15,7,6,4,10,8,10,4,2,6,5,5,2,3,2,1
0,0,2,3,3,4,5,3,6,7,10,5,10,13,14,3,8,10,9,9,19,15,15,6,8,8,11,5,5,7,3,6,6,4,5,2,2,3,0,0
0,1,2,2,2,3,6,6,6,7,6,3,11,12,13,15,15,10,14,11,11,8,6,12,10,5,12,7,7,11,5,8,5,2,5,5,2,0,2,1
0,0,2,1,3,5,6,7,5,8,9,3,12,10,12,4,12,9,13,10,10,6,10,11,4,15,13,7,3,4,2,9,7,2,4,2,1,2,1,1
0,0,1,2,4,1,5,5,2,3,4,8,8,12,5,15,9,17,7,19,14,18,12,17,14,4,13,13,8,11,5,6,6,2,3,5,2,1,1,1
0,0,0,3,1,3,6,4,3,4,8,3,4,8,3,11,5,7,10,5,15,9,16,17,16,3,8,9,8,3,3,9,5,1,6,5,4,2,2,0
0,1,2,2,2,5,5,1,4,6,3,6,5,9,6,7,4,7,16,7,16,13,9,16,12,6,7,9,10,3,6,4,5,4,6,3,4,3,2,1
0,1,1,2,3,1,5,1,2,2,5,7,6,6,5,10,6,7,17,13,15,16,17,14,4,4,10,10,10,11,9,9,5,4,4,2,1,0,1,0
0,1,0,3,2,4,1,1,5,9,10,7,12,10,9,15,12,13,13,6,19,9,10,6,13,5,13,6,7,2,5,5,2,1,1,1,1,3,0,1
0,1,1,3,1,1,5,5,3,7,2,2,3,12,4,6,8,15,16,16,15,4,14,5,13,10,7,10,6,3,2,3,6,3,3,5,4,3,2,1
0,0,0,2,2,1,3,4,5,5,6,5,5,12,13,5,7,5,11,15,18,7,9,10,14,12,11,9,10,3,2,9,6,2,2,5,3,0,0,1
0,0,1,3,3,1,2,1,8,9,2,8,10,3,8,6,10,13,11,17,19,6,4,11,6,12,7,5,5,4,4,8,2,6,6,4,2,2,0,0
0,1,1,3,4,5,2,1,3,7,9,6,10,5,8,15,11,12,15,6,12,16,6,4,14,3,12,9,6,11,5,8,5,5,6,1,2,1,2,0
0,0,1,3,1,4,3,6,7,8,5,7,11,3,6,11,6,10,6,19,18,14,6,10,7,9,8,5,8,3,10,2,5,1,5,4,2,1,0,1
0,1,1,3,3,4,4,6,3,4,9,9,7,6,8,15,12,15,6,11,6,18,5,14,15,12,9,8,3,6,10,6,8,7,2,5,4,3,1,1
0,1,2,2,4,3,1,4,8,9,5,10,10,3,4,6,7,11,16,6,14,9,11,10,10,7,10,8,8,4,5,8,4,4,5,2,4,1,1,0
0,0,2,3,4,5,4,6,2,9,7,4,9,10,8,11,16,12,15,17,19,10,18,13,15,11,8,4,7,11,6,7,6,5,1,3,1,0,0,0
0,1,1,3,1,4,6,2,8,2,10,3,11,9,13,15,5,15,6,10,10,5,14,15,12,7,4,5,11,4,6,9,5,6,1,1,2,1,2,1
0,0,1,3,2,5,1,2,7,6,6,3,12,9,4,14,4,6,12,9,12,7,11,7,16,8,13,6,7,6,10,7,6,3,1,5,4,3,0,0
0,0,1,2,3,4,5,7,5,4,10,5,12,12,5,4,7,9,18,16,16,10,15,15,10,4,3,7,5,9,4,6,2,4,1,4,2,2,2,1
0,1,2,1,1,3,5,3,6,3,10,10,11,10,13,10,13,6,6,14,5,4,5,5,9,4,12,7,7,4,7,9,3,3,6,3,4,1,2,0
0,1,2,2,3,5,2,4,5,6,8,3,5,4,3,15,15,12,16,7,20,15,12,8,9,6,12,5,8,3,8,5,4,1,3,2,1,3,1,0
0,0,0,2,4,4,5,3,3,3,10,4,4,4,14,11,15,13,10,14,11,17,9,11,11,7,10,12,10,10,10,8,7,5,2,2,4,1,2,1
0,0,2,1,1,4,4,7,2,9,4,10,12,7,6,6,11,12,9,15,15,6,6,13,5,12,9,6,4,7,7,6,5,4,1,4,2,2,2,1
0,1,2,1,1,4,5,4,4,5,9,7,10,3,13,13,8,9,17,16,16,15,12,13,5,12,10,9,11,9,4,5,5,2,2,5,1,0,0,1
0,0,1,3,2,3,6,4,5,7,2,4,11,11,3,8,8,16,5,13,16,5,8,8,6,9,10,10,9,3,3,5,3,5,4,5,3,3,0,1
0,1,1,2,2,5,1,7,4,2,5,5,4,6,6,4,16,11,14,16,14,14,8,17,4,14,13,7,6,3,7,7,5,6,3,4,2,2,1,1
0,1,1,1,4,1,6,4,6,3,6,5,6,4,14,13,13,9,12,19,9,10,15,10,9,10,10,7,5,6,8,6,6,4,3,5,2,1,1,1
0,0,0,1,4,5,6,3,8,7,9,10,8,6,5,12,15,5,10,5,8,13,18,17,14,9,13,4,10,11,10,8,8,6,5,5,2,0,2,0
0,0,1,0,3,2,5,4,8,2,9,3,3,10,12,9,14,11,13,8,6,18,11,9,13,11,8,5,5,2,8,5,3,5,4,1,3,1,1,0

This is incredibly powerful and we can now process each of the lines in turn.
However even for a standard format like a .csv file this is not trivial.
First we have to split each line to turn it into a list. At this point all the items in the list are strings as when we used input to read from the keyboard, so each value needs to be turned into a numerical value with int or float. Each individual value then needs to be appended to a list, and finally the data from all lines needs to be assembled into a list of lists. This is given as an exercise at the end of the episode.

For now, wouldn't it just be much easier if someone had written a library that we could use instead?

Big files!

Note that the above method will read the entire file into your computer's memory. This means that if your file is particularly large it may cause the computer to crash!

Numpy

There are of course a number of libraries that we can use to read in our data. One of the most useful of these is numpy a numerical library for Python that has a host of features and optimised libraries for performing efficient calculations. In particular, for our purposes, it has a function for reading 'csv' files and converting them automatically into numerical values, if possible. First we must import the library, and according to near universal tradition when we import numpy we use the alias np.

In [8]:
import numpy as np

You do not have to use the alias np, in which case you can just import numpy and write numpy everywhere we use np in the code that follows. However we mention and use it because if you look at anyone else's code it is almost certain that this is how they will use the library. To read in a 'csv' file we can now use the single command:

In [9]:
data = np.loadtxt(fname='./data/data/inflammation-01.csv', delimiter=',')

Let's check that something has happened and that the data has been read in:

In [10]:
print(data)
[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]

We can see that data contains values and that printing them results in something different from just printing out each line in the file. When print is used with a numpy object and the data is bigger than can be neatly printed, the first and last few values are printed. In between ellipsis is printed to indicate the data that is present in data but ommitted for clarity. To check that all the data has been read in as before we can access each 'line' of the data with:

In [11]:
for line in data:
    print(line)
[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.  3.  3. 10.  5.  7.  4.  7.  7.
 12. 18.  6. 13. 11. 11.  7.  7.  4.  6.  8.  8.  4.  4.  5.  7.  3.  4.
  2.  3.  0.  0.]
[ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6. 10. 11.  5.  9.  4.  4.  7. 16.
  8.  6. 18.  4. 12.  5. 12.  7. 11.  5. 11.  3.  3.  5.  4.  4.  5.  5.
  1.  1.  0.  1.]
[ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.  5.  7.  4.  5.  4. 15.  5. 11.
  9. 10. 19. 14. 12. 17.  7. 12. 11.  7.  4.  2. 10.  5.  4.  2.  2.  3.
  2.  2.  1.  1.]
[ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7. 10.  7.  9. 13.  8.  8. 15. 10.
 10.  7. 17.  4.  4.  7.  6. 15.  6.  4.  9. 11.  3.  5.  6.  3.  3.  4.
  2.  3.  2.  1.]
[ 0.  1.  1.  3.  3.  1.  3.  5.  2.  4.  4.  7.  6.  5.  3. 10.  8. 10.
  6. 17.  9. 14.  9.  7. 13.  9. 12.  6.  7.  7.  9.  6.  3.  2.  2.  4.
  2.  0.  1.  1.]
[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.  7.  6.  6.  9.  9. 15.  4. 16.
 18. 12. 12.  5. 18.  9.  5.  3. 10.  3. 12.  7.  8.  4.  7.  3.  5.  4.
  4.  3.  2.  1.]
[ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.  6.  5. 11.  9.  4. 13.  5. 12.
 10.  6.  9. 17. 15.  8.  9.  3. 13.  7.  8.  2.  8.  8.  4.  2.  3.  5.
  4.  1.  1.  1.]
[ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.  7.  8.  8.  5. 10.  9. 15. 11.
 18. 19. 20.  8.  5. 13. 15. 10.  6. 10.  6.  7.  4.  9.  3.  5.  2.  5.
  3.  2.  2.  1.]
[ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.  2.  4. 11. 12. 10. 11.  9. 10.
 17. 11.  6. 16. 12.  6.  8. 14.  6. 13. 10. 11.  4.  6.  4.  7.  6.  3.
  2.  1.  0.  0.]
[ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.  6.  8. 12.  5. 13.  6. 13.  8.
 16.  8. 18. 15. 16. 14. 12.  7.  3.  8.  9. 11.  2.  5.  4.  5.  1.  4.
  1.  2.  0.  0.]
[ 0.  1.  0.  0.  4.  3.  3.  5.  5.  4.  5.  8.  7. 10. 13.  3.  7. 13.
 15. 18.  8. 15. 15. 16. 11. 14. 12.  4. 10. 10.  4.  3.  4.  5.  5.  3.
  3.  2.  2.  1.]
[ 0.  1.  0.  0.  3.  4.  2.  7.  8.  5.  2.  8. 11.  5.  5.  8. 14. 11.
  6. 11.  9. 16. 18.  6. 12.  5.  4.  3.  5.  7.  8.  3.  5.  4.  5.  5.
  4.  0.  1.  1.]
[ 0.  0.  2.  1.  4.  3.  6.  4.  6.  7.  9.  9.  3. 11.  6. 12.  4. 17.
 13. 15. 13. 12.  8.  7.  4.  7. 12.  9.  5.  6.  5.  4.  7.  3.  5.  4.
  2.  3.  0.  1.]
[ 0.  0.  0.  0.  1.  3.  1.  6.  6.  5.  5.  6.  3.  6. 13.  3. 10. 13.
  9. 16. 15.  9. 11.  4.  6.  4. 11. 11. 12.  3.  5.  8.  7.  4.  6.  4.
  1.  3.  0.  0.]
[ 0.  1.  2.  1.  1.  1.  4.  1.  5.  2.  3.  3. 10.  7. 13.  5.  7. 17.
  6.  9. 12. 13. 10.  4. 12.  4.  6.  7.  6. 10.  8.  2.  5.  1.  3.  4.
  2.  0.  2.  0.]
[ 0.  1.  1.  0.  1.  2.  4.  3.  6.  4.  7.  5.  5.  7.  5. 10.  7.  8.
 18. 17.  9.  8. 12. 11. 11. 11. 14.  6. 11.  2. 10.  9.  5.  6.  5.  3.
  4.  2.  2.  0.]
[ 0.  0.  0.  0.  2.  3.  6.  5.  7.  4.  3.  2. 10.  7.  9. 11. 12.  5.
 12.  9. 13. 19. 14. 17.  5. 13.  8. 11.  5. 10.  9.  8.  7.  5.  3.  1.
  4.  0.  2.  1.]
[ 0.  0.  0.  1.  2.  1.  4.  3.  6.  7.  4.  2. 12.  6. 12.  4. 14.  7.
  8. 14. 13. 19.  6.  9. 12.  6.  4. 13.  6.  7.  2.  3.  6.  5.  4.  2.
  3.  0.  1.  0.]
[ 0.  0.  2.  1.  2.  5.  4.  2.  7.  8.  4.  7. 11.  9.  8. 11. 15. 17.
 11. 12.  7. 12.  7.  6.  7.  4. 13.  5.  7.  6.  6.  9.  2.  1.  1.  2.
  2.  0.  1.  0.]
[ 0.  1.  2.  0.  1.  4.  3.  2.  2.  7.  3.  3. 12. 13. 11. 13.  6.  5.
  9. 16.  9. 19. 16. 11.  8.  9. 14. 12. 11.  9.  6.  6.  6.  1.  1.  2.
  4.  3.  1.  1.]
[ 0.  1.  1.  3.  1.  4.  4.  1.  8.  2.  2.  3. 12. 12. 10. 15. 13.  6.
  5.  5. 18. 19.  9.  6. 11. 12.  7.  6.  3.  6.  3.  2.  4.  3.  1.  5.
  4.  2.  2.  0.]
[ 0.  0.  2.  3.  2.  3.  2.  6.  3.  8.  7.  4.  6.  6.  9.  5. 12. 12.
  8.  5. 12. 10. 16.  7. 14. 12.  5.  4.  6.  9.  8.  5.  6.  6.  1.  4.
  3.  0.  2.  0.]
[ 0.  0.  0.  3.  4.  5.  1.  7.  7.  8.  2.  5. 12.  4. 10. 14.  5.  5.
 17. 13. 16. 15. 13.  6. 12.  9. 10.  3.  3.  7.  4.  4.  8.  2.  6.  5.
  1.  0.  1.  0.]
[ 0.  1.  1.  1.  1.  3.  3.  2.  6.  3.  9.  7.  8.  8.  4. 13.  7. 14.
 11. 15. 14. 13.  5. 13.  7. 14.  9. 10.  5. 11.  5.  3.  5.  1.  1.  4.
  4.  1.  2.  0.]
[ 0.  1.  1.  1.  2.  3.  5.  3.  6.  3.  7. 10.  3.  8. 12.  4. 12.  9.
 15.  5. 17. 16.  5. 10. 10. 15.  7.  5.  3. 11.  5.  5.  6.  1.  1.  1.
  1.  0.  2.  1.]
[ 0.  0.  2.  1.  3.  3.  2.  7.  4.  4.  3.  8. 12.  9. 12.  9.  5. 16.
  8. 17.  7. 11. 14.  7. 13. 11.  7. 12. 12.  7.  8.  5.  7.  2.  2.  4.
  1.  1.  1.  0.]
[ 0.  0.  1.  2.  4.  2.  2.  3.  5.  7. 10.  5.  5. 12.  3. 13.  4. 13.
  7. 15.  9. 12. 18. 14. 16. 12.  3. 11.  3.  2.  7.  4.  8.  2.  2.  1.
  3.  0.  1.  1.]
[ 0.  0.  1.  1.  1.  5.  1.  5.  2.  2.  4. 10.  4.  8. 14.  6. 15.  6.
 12. 15. 15. 13.  7. 17.  4.  5. 11.  4.  8.  7.  9.  4.  5.  3.  2.  5.
  4.  3.  2.  1.]
[ 0.  0.  2.  2.  3.  4.  6.  3.  7.  6.  4.  5.  8.  4.  7.  7.  6. 11.
 12. 19. 20. 18.  9.  5.  4.  7. 14.  8.  4.  3.  7.  7.  8.  3.  5.  4.
  1.  3.  1.  0.]
[ 0.  0.  0.  1.  4.  4.  6.  3.  8.  6.  4. 10. 12.  3.  3.  6.  8.  7.
 17. 16. 14. 15. 17.  4. 14. 13.  4.  4. 12. 11.  6.  9.  5.  5.  2.  5.
  2.  1.  0.  1.]
[ 0.  1.  1.  0.  3.  2.  4.  6.  8.  6.  2.  3. 11.  3. 14. 14. 12.  8.
  8. 16. 13.  7.  6.  9. 15.  7.  6.  4. 10.  8. 10.  4.  2.  6.  5.  5.
  2.  3.  2.  1.]
[ 0.  0.  2.  3.  3.  4.  5.  3.  6.  7. 10.  5. 10. 13. 14.  3.  8. 10.
  9.  9. 19. 15. 15.  6.  8.  8. 11.  5.  5.  7.  3.  6.  6.  4.  5.  2.
  2.  3.  0.  0.]
[ 0.  1.  2.  2.  2.  3.  6.  6.  6.  7.  6.  3. 11. 12. 13. 15. 15. 10.
 14. 11. 11.  8.  6. 12. 10.  5. 12.  7.  7. 11.  5.  8.  5.  2.  5.  5.
  2.  0.  2.  1.]
[ 0.  0.  2.  1.  3.  5.  6.  7.  5.  8.  9.  3. 12. 10. 12.  4. 12.  9.
 13. 10. 10.  6. 10. 11.  4. 15. 13.  7.  3.  4.  2.  9.  7.  2.  4.  2.
  1.  2.  1.  1.]
[ 0.  0.  1.  2.  4.  1.  5.  5.  2.  3.  4.  8.  8. 12.  5. 15.  9. 17.
  7. 19. 14. 18. 12. 17. 14.  4. 13. 13.  8. 11.  5.  6.  6.  2.  3.  5.
  2.  1.  1.  1.]
[ 0.  0.  0.  3.  1.  3.  6.  4.  3.  4.  8.  3.  4.  8.  3. 11.  5.  7.
 10.  5. 15.  9. 16. 17. 16.  3.  8.  9.  8.  3.  3.  9.  5.  1.  6.  5.
  4.  2.  2.  0.]
[ 0.  1.  2.  2.  2.  5.  5.  1.  4.  6.  3.  6.  5.  9.  6.  7.  4.  7.
 16.  7. 16. 13.  9. 16. 12.  6.  7.  9. 10.  3.  6.  4.  5.  4.  6.  3.
  4.  3.  2.  1.]
[ 0.  1.  1.  2.  3.  1.  5.  1.  2.  2.  5.  7.  6.  6.  5. 10.  6.  7.
 17. 13. 15. 16. 17. 14.  4.  4. 10. 10. 10. 11.  9.  9.  5.  4.  4.  2.
  1.  0.  1.  0.]
[ 0.  1.  0.  3.  2.  4.  1.  1.  5.  9. 10.  7. 12. 10.  9. 15. 12. 13.
 13.  6. 19.  9. 10.  6. 13.  5. 13.  6.  7.  2.  5.  5.  2.  1.  1.  1.
  1.  3.  0.  1.]
[ 0.  1.  1.  3.  1.  1.  5.  5.  3.  7.  2.  2.  3. 12.  4.  6.  8. 15.
 16. 16. 15.  4. 14.  5. 13. 10.  7. 10.  6.  3.  2.  3.  6.  3.  3.  5.
  4.  3.  2.  1.]
[ 0.  0.  0.  2.  2.  1.  3.  4.  5.  5.  6.  5.  5. 12. 13.  5.  7.  5.
 11. 15. 18.  7.  9. 10. 14. 12. 11.  9. 10.  3.  2.  9.  6.  2.  2.  5.
  3.  0.  0.  1.]
[ 0.  0.  1.  3.  3.  1.  2.  1.  8.  9.  2.  8. 10.  3.  8.  6. 10. 13.
 11. 17. 19.  6.  4. 11.  6. 12.  7.  5.  5.  4.  4.  8.  2.  6.  6.  4.
  2.  2.  0.  0.]
[ 0.  1.  1.  3.  4.  5.  2.  1.  3.  7.  9.  6. 10.  5.  8. 15. 11. 12.
 15.  6. 12. 16.  6.  4. 14.  3. 12.  9.  6. 11.  5.  8.  5.  5.  6.  1.
  2.  1.  2.  0.]
[ 0.  0.  1.  3.  1.  4.  3.  6.  7.  8.  5.  7. 11.  3.  6. 11.  6. 10.
  6. 19. 18. 14.  6. 10.  7.  9.  8.  5.  8.  3. 10.  2.  5.  1.  5.  4.
  2.  1.  0.  1.]
[ 0.  1.  1.  3.  3.  4.  4.  6.  3.  4.  9.  9.  7.  6.  8. 15. 12. 15.
  6. 11.  6. 18.  5. 14. 15. 12.  9.  8.  3.  6. 10.  6.  8.  7.  2.  5.
  4.  3.  1.  1.]
[ 0.  1.  2.  2.  4.  3.  1.  4.  8.  9.  5. 10. 10.  3.  4.  6.  7. 11.
 16.  6. 14.  9. 11. 10. 10.  7. 10.  8.  8.  4.  5.  8.  4.  4.  5.  2.
  4.  1.  1.  0.]
[ 0.  0.  2.  3.  4.  5.  4.  6.  2.  9.  7.  4.  9. 10.  8. 11. 16. 12.
 15. 17. 19. 10. 18. 13. 15. 11.  8.  4.  7. 11.  6.  7.  6.  5.  1.  3.
  1.  0.  0.  0.]
[ 0.  1.  1.  3.  1.  4.  6.  2.  8.  2. 10.  3. 11.  9. 13. 15.  5. 15.
  6. 10. 10.  5. 14. 15. 12.  7.  4.  5. 11.  4.  6.  9.  5.  6.  1.  1.
  2.  1.  2.  1.]
[ 0.  0.  1.  3.  2.  5.  1.  2.  7.  6.  6.  3. 12.  9.  4. 14.  4.  6.
 12.  9. 12.  7. 11.  7. 16.  8. 13.  6.  7.  6. 10.  7.  6.  3.  1.  5.
  4.  3.  0.  0.]
[ 0.  0.  1.  2.  3.  4.  5.  7.  5.  4. 10.  5. 12. 12.  5.  4.  7.  9.
 18. 16. 16. 10. 15. 15. 10.  4.  3.  7.  5.  9.  4.  6.  2.  4.  1.  4.
  2.  2.  2.  1.]
[ 0.  1.  2.  1.  1.  3.  5.  3.  6.  3. 10. 10. 11. 10. 13. 10. 13.  6.
  6. 14.  5.  4.  5.  5.  9.  4. 12.  7.  7.  4.  7.  9.  3.  3.  6.  3.
  4.  1.  2.  0.]
[ 0.  1.  2.  2.  3.  5.  2.  4.  5.  6.  8.  3.  5.  4.  3. 15. 15. 12.
 16.  7. 20. 15. 12.  8.  9.  6. 12.  5.  8.  3.  8.  5.  4.  1.  3.  2.
  1.  3.  1.  0.]
[ 0.  0.  0.  2.  4.  4.  5.  3.  3.  3. 10.  4.  4.  4. 14. 11. 15. 13.
 10. 14. 11. 17.  9. 11. 11.  7. 10. 12. 10. 10. 10.  8.  7.  5.  2.  2.
  4.  1.  2.  1.]
[ 0.  0.  2.  1.  1.  4.  4.  7.  2.  9.  4. 10. 12.  7.  6.  6. 11. 12.
  9. 15. 15.  6.  6. 13.  5. 12.  9.  6.  4.  7.  7.  6.  5.  4.  1.  4.
  2.  2.  2.  1.]
[ 0.  1.  2.  1.  1.  4.  5.  4.  4.  5.  9.  7. 10.  3. 13. 13.  8.  9.
 17. 16. 16. 15. 12. 13.  5. 12. 10.  9. 11.  9.  4.  5.  5.  2.  2.  5.
  1.  0.  0.  1.]
[ 0.  0.  1.  3.  2.  3.  6.  4.  5.  7.  2.  4. 11. 11.  3.  8.  8. 16.
  5. 13. 16.  5.  8.  8.  6.  9. 10. 10.  9.  3.  3.  5.  3.  5.  4.  5.
  3.  3.  0.  1.]
[ 0.  1.  1.  2.  2.  5.  1.  7.  4.  2.  5.  5.  4.  6.  6.  4. 16. 11.
 14. 16. 14. 14.  8. 17.  4. 14. 13.  7.  6.  3.  7.  7.  5.  6.  3.  4.
  2.  2.  1.  1.]
[ 0.  1.  1.  1.  4.  1.  6.  4.  6.  3.  6.  5.  6.  4. 14. 13. 13.  9.
 12. 19.  9. 10. 15. 10.  9. 10. 10.  7.  5.  6.  8.  6.  6.  4.  3.  5.
  2.  1.  1.  1.]
[ 0.  0.  0.  1.  4.  5.  6.  3.  8.  7.  9. 10.  8.  6.  5. 12. 15.  5.
 10.  5.  8. 13. 18. 17. 14.  9. 13.  4. 10. 11. 10.  8.  8.  6.  5.  5.
  2.  0.  2.  0.]
[ 0.  0.  1.  0.  3.  2.  5.  4.  8.  2.  9.  3.  3. 10. 12.  9. 14. 11.
 13.  8.  6. 18. 11.  9. 13. 11.  8.  5.  5.  2.  8.  5.  3.  5.  4.  1.
  3.  1.  1.  0.]

Now, as before when we read and printed each line of the file, the full data set is visible. The format is slightly different, all the commas have been removed and instead of the original strings, each value has now been converted to a float as indicated by the decimal point.

The use of the loop shows that we can treat it as a list. Each line of the original data can be indexed as though it were a list:

In [12]:
print(data[0])
print(data[17])
[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.  3.  3. 10.  5.  7.  4.  7.  7.
 12. 18.  6. 13. 11. 11.  7.  7.  4.  6.  8.  8.  4.  4.  5.  7.  3.  4.
  2.  3.  0.  0.]
[ 0.  0.  0.  1.  2.  1.  4.  3.  6.  7.  4.  2. 12.  6. 12.  4. 14.  7.
  8. 14. 13. 19.  6.  9. 12.  6.  4. 13.  6.  7.  2.  3.  6.  5.  4.  2.
  3.  0.  1.  0.]

We can access individual items in the dataset with two indices and also use the slice that we applied to lists earlier:

In [13]:
print(data[0][0])
print(data[0][1])
print(data[0][2])
print(data[0][:3])
print(data[17][-3:])
0.0
0.0
1.0
[0. 0. 1.]
[0. 1. 0.]

We can also verify that the value in the dataset has been converted to a numerical type with type:

In [14]:
print(type(data[0][0]))
<class 'numpy.float64'>

This reveals that the value is not simply a float but a special numpy.float, the 64 refers to the amount of memory allocated to the value, we can think of this as how accurately the computer can represent the value.

Processing a string

When we have data in a standard format, libraries will generally be available to help us read in files - even for proprietary formats (see e.g. the numpy and pandas libraries). However on occasion we will have to write our own parser, for instance our colleague might pass us a text file of marks from a piece of coursework, with each line of data looking like:

#Firstname Surname Mark1 Mark2 Total
James Grant 33 21 54

Write a function that takes a string of this form, i.e. datum = "James Grant 33 21 54 and returns a list, list = ['James', 'Grant', 33, 21, 54], the line beginning with a # indicates that this is a comment. Remember that for the integer (int) values you will also need to convert them from strings (str)!

Suggestion: Before writing any code write out in natural language each of the steps that your function will perform.

Hint: my_string.split() takes a string and splits it into a list, the default 'delimiter' is whitespace but this can be changed. If you need to split on commas or another character instead you would need to specify this as a parameter e.g. comma_separated_string.split(',').

Solution

Re-invent the wheel

We would always advise you to use existing libraries wherever possible, however parsing files is a useful practice ground for the ideas we have been covering. Write a funciton to read in the .csv formatted inflammation data (that you extracted earlier), that uses a specified filename passed as a string, and returns a 2D list (list of lists) having converted all entries to floats.

Begin by writing in natural language each of the steps that your functions will need to perform. Then implement your function and verify that it produces similar output to the numpy parser we introduced above.

Solution

Key Points:

  • User defined values can be read in form the keyboard with input.
  • In order to read the contents of a file it must first be opened, once read the file must be closed.
  • By using the with structure Python makes sure the file is closed for us.
  • file.readlines() reads the entire contents of a file into a list, where each line is a string.
  • Libraries such as numpy have built in libraries that can read in standard formatted files with a single command.
  • numpy arrays are like multiple dimensional lists, and the contents are accessed with indices and the operations that can be used for lists.