Introduction - SOLVED
Contents
Introduction - SOLVED#

In this lecture, we will introduce you to the data analysis library for Python pandas. pandas is used extensively analysing and manipulating data, designed to be open source, powerful, flexible and easy to use. It is used widely in a variety of academic and commercial settings.
OUTLINE:
DataFrames - dictionaries on steroids
Viewing data
Selection - Selecting particular elements of a Data Frame
More operations on DataFrame’s rows/columns/entries
Grouping
Visualisation of DataFrames
Importing and Exporting files/spreadsheets.
Summary Exercises
Importing Pandas#
Like numpy, pandas is a libary and will therefore need to be imported before we can use its commands. We’ll also be making use of the numpy library, so let’s import this at the same time as pandas. We do this by running the following cell:
import pandas as pd # we will be using pandas via the abbreviation pd
import numpy as np
## Important: please read below
Before starting this week’s material please download the following files from Moodle:
titanic.xlsx
avocado.xlsx
C19_portsmouth.xlsx
Upload the titanic and avocado data files to this notebook (ask how if you’ve forgotten).
Each individual notebook will start with a section telling you which data files are required for the exercises contained in that notebook.
Spreadsheets in Pandas - No more Excel!#
In excel or google sheets you deal with spreadsheets. Here is an example of a Titanic spreadsheet containing the basic information about the passengers including if they survived the incident (0 if did not survived, 1 if survived).

The same spreadsheet can be opened by pandas and viewed as so-called DataFrame.
We have attached this spreadsheet to this notebook; it is contained in the titanic.xlsx file.
To import it as pandas Data Frame we use:
df_titanic = pd.read_excel("titanic.xlsx")
First let’s explain the meaning of each column:#
PassengerId, an id was assigned to every passenger,
Name contains the full name of the passenger,
Sex is the stated gender of a passenger.
Age contains the age of the passenger during the cruise.
Ticket contains ticket number
Fare contains passenger fare.
Cabin contains the cabin number for each passenger.
Survived contains information if the passenger survived the sinking of titanic (0 = No; 1 = Yes)
To display the header of this file we use:
df_titanic.head() #run this cell
| PassengerId | Name | Sex | Age | Ticket | Fare | Cabin | Survived | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Braund, Mr. Owen Harris | male | 22.0 | A/5 21171 | 7.2500 | NaN | 0 |
| 1 | 2 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | PC 17599 | 71.2833 | C85 | 1 |
| 2 | 3 | Heikkinen, Miss. Laina | female | 26.0 | STON/O2. 3101282 | 7.9250 | NaN | 1 |
| 3 | 4 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 113803 | 53.1000 | C123 | 1 |
| 4 | 5 | Allen, Mr. William Henry | male | 35.0 | 373450 | 8.0500 | NaN | 0 |
You can also display the tail (end rows) of this spreadsheet:
df_titanic.tail()
| PassengerId | Name | Sex | Age | Ticket | Fare | Cabin | Survived | |
|---|---|---|---|---|---|---|---|---|
| 886 | 887 | Montvila, Rev. Juozas | male | 27.0 | 211536 | 13.00 | NaN | 0 |
| 887 | 888 | Graham, Miss. Margaret Edith | female | 19.0 | 112053 | 30.00 | B42 | 1 |
| 888 | 889 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | W./C. 6607 | 23.45 | NaN | 0 |
| 889 | 890 | Behr, Mr. Karl Howell | male | 26.0 | 111369 | 30.00 | C148 | 1 |
| 890 | 891 | Dooley, Mr. Patrick | male | 32.0 | 370376 | 7.75 | NaN | 0 |
Both head and tail function take an integer input, inputting an integer \(n\) allows us to display \(n\) entries
n = 8
df_titanic.head(n)
| PassengerId | Name | Sex | Age | Ticket | Fare | Cabin | Survived | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Braund, Mr. Owen Harris | male | 22.0 | A/5 21171 | 7.2500 | NaN | 0 |
| 1 | 2 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | PC 17599 | 71.2833 | C85 | 1 |
| 2 | 3 | Heikkinen, Miss. Laina | female | 26.0 | STON/O2. 3101282 | 7.9250 | NaN | 1 |
| 3 | 4 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 113803 | 53.1000 | C123 | 1 |
| 4 | 5 | Allen, Mr. William Henry | male | 35.0 | 373450 | 8.0500 | NaN | 0 |
| 5 | 6 | Moran, Mr. James | male | NaN | 330877 | 8.4583 | NaN | 0 |
| 6 | 7 | McCarthy, Mr. Timothy J | male | 54.0 | 17463 | 51.8625 | E46 | 0 |
| 7 | 8 | Palsson, Master. Gosta Leonard | male | 2.0 | 349909 | 21.0750 | NaN | 0 |
df_titanic.tail(n)
| PassengerId | Name | Sex | Age | Ticket | Fare | Cabin | Survived | |
|---|---|---|---|---|---|---|---|---|
| 883 | 884 | Banfield, Mr. Frederick James | male | 28.0 | C.A./SOTON 34068 | 10.500 | NaN | 0 |
| 884 | 885 | Sutehall, Mr. Henry Jr | male | 25.0 | SOTON/OQ 392076 | 7.050 | NaN | 0 |
| 885 | 886 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 382652 | 29.125 | NaN | 0 |
| 886 | 887 | Montvila, Rev. Juozas | male | 27.0 | 211536 | 13.000 | NaN | 0 |
| 887 | 888 | Graham, Miss. Margaret Edith | female | 19.0 | 112053 | 30.000 | B42 | 1 |
| 888 | 889 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | W./C. 6607 | 23.450 | NaN | 0 |
| 889 | 890 | Behr, Mr. Karl Howell | male | 26.0 | 111369 | 30.000 | C148 | 1 |
| 890 | 891 | Dooley, Mr. Patrick | male | 32.0 | 370376 | 7.750 | NaN | 0 |
Exercise 0.1#
Import the file avocado.xlsx as pandas Data Frame.
df_avocado = pd.read_excel("avocado.xlsx")
Display its header and its tail:
df_avocado.head()
| Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-12-27 | 1.33 | 64236.62 | 1036.74 | 54454.85 | 48.16 | 8696.87 | 8603.62 | 93.25 | 0.0 | conventional | 2015 | Albany |
| 1 | 2015-12-20 | 1.35 | 54876.98 | 674.28 | 44638.81 | 58.33 | 9505.56 | 9408.07 | 97.49 | 0.0 | conventional | 2015 | Albany |
| 2 | 2015-12-13 | 0.93 | 118220.22 | 794.70 | 109149.67 | 130.50 | 8145.35 | 8042.21 | 103.14 | 0.0 | conventional | 2015 | Albany |
| 3 | 2015-12-06 | 1.08 | 78992.15 | 1132.00 | 71976.41 | 72.58 | 5811.16 | 5677.40 | 133.76 | 0.0 | conventional | 2015 | Albany |
| 4 | 2015-11-29 | 1.28 | 51039.60 | 941.48 | 43838.39 | 75.78 | 6183.95 | 5986.26 | 197.69 | 0.0 | conventional | 2015 | Albany |
df_avocado.tail()
| Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18244 | 2018-02-04 | 1.63 | 17074.83 | 2046.96 | 1529.20 | 0.00 | 13498.67 | 13066.82 | 431.85 | 0.0 | organic | 2018 | WestTexNewMexico |
| 18245 | 2018-01-28 | 1.71 | 13888.04 | 1191.70 | 3431.50 | 0.00 | 9264.84 | 8940.04 | 324.80 | 0.0 | organic | 2018 | WestTexNewMexico |
| 18246 | 2018-01-21 | 1.87 | 13766.76 | 1191.92 | 2452.79 | 727.94 | 9394.11 | 9351.80 | 42.31 | 0.0 | organic | 2018 | WestTexNewMexico |
| 18247 | 2018-01-14 | 1.93 | 16205.22 | 1527.63 | 2981.04 | 727.01 | 10969.54 | 10919.54 | 50.00 | 0.0 | organic | 2018 | WestTexNewMexico |
| 18248 | 2018-01-07 | 1.62 | 17489.58 | 2894.77 | 2356.13 | 224.53 | 12014.15 | 11988.14 | 26.01 | 0.0 | organic | 2018 | WestTexNewMexico |
Display first 20 rows of the header:
n = 20
df_avocado.head(n)
| Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-12-27 | 1.33 | 64236.62 | 1036.74 | 54454.85 | 48.16 | 8696.87 | 8603.62 | 93.25 | 0.0 | conventional | 2015 | Albany |
| 1 | 2015-12-20 | 1.35 | 54876.98 | 674.28 | 44638.81 | 58.33 | 9505.56 | 9408.07 | 97.49 | 0.0 | conventional | 2015 | Albany |
| 2 | 2015-12-13 | 0.93 | 118220.22 | 794.70 | 109149.67 | 130.50 | 8145.35 | 8042.21 | 103.14 | 0.0 | conventional | 2015 | Albany |
| 3 | 2015-12-06 | 1.08 | 78992.15 | 1132.00 | 71976.41 | 72.58 | 5811.16 | 5677.40 | 133.76 | 0.0 | conventional | 2015 | Albany |
| 4 | 2015-11-29 | 1.28 | 51039.60 | 941.48 | 43838.39 | 75.78 | 6183.95 | 5986.26 | 197.69 | 0.0 | conventional | 2015 | Albany |
| 5 | 2015-11-22 | 1.26 | 55979.78 | 1184.27 | 48067.99 | 43.61 | 6683.91 | 6556.47 | 127.44 | 0.0 | conventional | 2015 | Albany |
| 6 | 2015-11-15 | 0.99 | 83453.76 | 1368.92 | 73672.72 | 93.26 | 8318.86 | 8196.81 | 122.05 | 0.0 | conventional | 2015 | Albany |
| 7 | 2015-11-08 | 0.98 | 109428.33 | 703.75 | 101815.36 | 80.00 | 6829.22 | 6266.85 | 562.37 | 0.0 | conventional | 2015 | Albany |
| 8 | 2015-11-01 | 1.02 | 99811.42 | 1022.15 | 87315.57 | 85.34 | 11388.36 | 11104.53 | 283.83 | 0.0 | conventional | 2015 | Albany |
| 9 | 2015-10-25 | 1.07 | 74338.76 | 842.40 | 64757.44 | 113.00 | 8625.92 | 8061.47 | 564.45 | 0.0 | conventional | 2015 | Albany |
| 10 | 2015-10-18 | 1.12 | 84843.44 | 924.86 | 75595.85 | 117.07 | 8205.66 | 7877.86 | 327.80 | 0.0 | conventional | 2015 | Albany |
| 11 | 2015-10-11 | 1.28 | 64489.17 | 1582.03 | 52677.92 | 105.32 | 10123.90 | 9866.27 | 257.63 | 0.0 | conventional | 2015 | Albany |
| 12 | 2015-10-04 | 1.31 | 61007.10 | 2268.32 | 49880.67 | 101.36 | 8756.75 | 8379.98 | 376.77 | 0.0 | conventional | 2015 | Albany |
| 13 | 2015-09-27 | 0.99 | 106803.39 | 1204.88 | 99409.21 | 154.84 | 6034.46 | 5888.87 | 145.59 | 0.0 | conventional | 2015 | Albany |
| 14 | 2015-09-20 | 1.33 | 69759.01 | 1028.03 | 59313.12 | 150.50 | 9267.36 | 8489.10 | 778.26 | 0.0 | conventional | 2015 | Albany |
| 15 | 2015-09-13 | 1.28 | 76111.27 | 985.73 | 65696.86 | 142.00 | 9286.68 | 8665.19 | 621.49 | 0.0 | conventional | 2015 | Albany |
| 16 | 2015-09-06 | 1.11 | 99172.96 | 879.45 | 90062.62 | 240.79 | 7990.10 | 7762.87 | 227.23 | 0.0 | conventional | 2015 | Albany |
| 17 | 2015-08-30 | 1.07 | 105693.84 | 689.01 | 94362.67 | 335.43 | 10306.73 | 10218.93 | 87.80 | 0.0 | conventional | 2015 | Albany |
| 18 | 2015-08-23 | 1.34 | 79992.09 | 733.16 | 67933.79 | 444.78 | 10880.36 | 10745.79 | 134.57 | 0.0 | conventional | 2015 | Albany |
| 19 | 2015-08-16 | 1.33 | 80043.78 | 539.65 | 68666.01 | 394.90 | 10443.22 | 10297.68 | 145.54 | 0.0 | conventional | 2015 | Albany |