Part 1: data frames (extra powerful dictionaries)#

This workbook does not require you to load any datasets.

We will need to import both pandas and numpy by running the cell below:

import pandas as pd
import numpy as np

Before we analyse this spreadsheet let us dig in a bit more into Data Frames.

Let us create a simple dictionary first:

gradebook = {}

gradebook["Student id"] = ["UP123", "UP124", "UP125", "UP126"]
gradebook["Marks (out of 10)"] = ["10", "5", "7", "6"]
#display the gradebook

gradebook
{'Student id': ['UP123', 'UP124', 'UP125', 'UP126'],
 'Marks (out of 10)': ['10', '5', '7', '6']}

Now, let’s create a Data Frame that holds the same information, but it is much more flexible than a dictionary in terms of functionality:

df_gradebook = pd.DataFrame()

df_gradebook["Student id"] = ["UP123", "UP124", "UP125", "UP126"]
df_gradebook["Marks (out of 10)"] = ["10", "5", "7", "6"]
# Display the gradebook

df_gradebook
Student id Marks (out of 10)
0 UP123 10
1 UP124 5
2 UP125 7
3 UP126 6

Even though we have used lists to construct the Data Frame columns they are flexible to use. The columns were converted to pandas Series: the pandas Series uses numpy arrays, but adds extra functionality!

type(df_gradebook["Student id"])
pandas.core.series.Series

To print a single column of the data frame we use:

print(df_gradebook["Marks (out of 10)"])
0    10
1     5
2     7
3     6
Name: Marks (out of 10), dtype: object

Exercise 1.1#

Create a simple Data Frame that represents the following spreadsheet:

example data

Print each of the columns of this Data Frame.