Part 8: Summary exercises
Contents
Part 8: Summary exercises#
This workbook does requires the titanic and avocado datasets. As usual you will need to import numpy and pandas
# Use this cell to import numpy and pandas
import numpy as np
import pandas as pd
# Use this cell to import the titanic dataset.
df_titanic = pd.read_excel("titanic.xlsx")
df_avocado = pd.read_excel("avocado.xlsx")
Exercise 8.1#
We will be processing the titanic.xlsx spreadsheet. Your task is to extract the title each passenger gave (e.g. Miss, Mr etc.) from the Name column and saved its value in the new column title.
After you succeed plot the histogram of Survived by the groups in Title. You should investigate whether the title that somebody used affected if they were more likely to survive.
Hint use: your_data_frame["Your Column"].str.split(' ')
To make your histograms readable add the parameter figsize=(10, 10) to your plotting function.
Exercise 8.2#
This exercise is also about the tititanic data.
When we plotted the correlation matrix, we noticed that the displayed columns were only numerical. To convert a column whose values are categorical (e.g. in this dataset the Sex column has two categories male and female).
We can use
pd.factorize(your_data_frame_object)
this will put the integers values to represent each category. If there are two categories we would get 0 and 1.
You task is to create a new column that represents Sex numerically via using pd.factorize.
Then display the correlation matrix of your new data frame. Which column has the highest correlation with the column Survived (apart from Survived itself).
Exercise 8.3#
You are an advisor of a restaurant that is famous from making Poached eggs with smashed avocado. The restaurant plans to open a new branch, but it cannot decide on the region (it considers all available regions from avocado.xlsx). You are asked to make some suggestions. You thought that it may be best to open a restaurant where on average the avocado prices are the lowest.
From the avocado.xlsx find out which region has on average the cheapest (lowest AveragePrice) avocado.
Hint: Use groupby