# Part 2: viewing data

This workbook requires you to load the `` titanic`` and ``avocado`` datasets. You will also need to run the following block of code to import ``numpy`` and ``pandas``:

In [None]:
import pandas as pd
import numpy as np

Load the ``titanic`` and ``avocado`` data sets as a ``pandas`` dataframes in the code block below:

In [None]:
# Load the Titanic and avocado data sets
df_titanic = pd.read_excel("titanic.xlsx")
df_avocado = pd.read_excel("avocado.xlsx")

## Exploring datasets in more detail

Let's come back to our ``titanic`` example. We can access the index of the DataFrame as follows:


In [None]:
df_titanic.index

By default it is a pandas RangeIndex type, it works similarly to ``range`` it starts at 0, the last entry is stop - 1, the step is 1.

We may use different types of indexing, but for now we are going to use the default one.

How about displaying all the column names of our Data Frame?
We do it as follows:

In [None]:
df_titanic.columns

This type can be treated as a list or numpy array, we can call its elements via an index.

In [None]:
df_titanic.columns[0], df_titanic.columns[-1]

We already know how to view values of a particular column, e.g.

``df_titanic["Survived"]``

If the name of the column does not contain spaces we can also view the values by

In [None]:
df_titanic.Survived

If we wish to get a numpy array from the pd.Series we use ``your_pd_series.values``:

In [None]:
df_titanic.Survived.values

Pandas is very compatible with numpy, in fact, we can simply convert a DataFrame to a numpy array.

In [None]:
titanic_np_array = df_titanic.to_numpy()
print(titanic_np_array)

We can also get a quick statistical summary of our data. This is done via:

In [None]:
df_titanic.describe()

Note that for pandas ``displaying`` produces nicer outputs than ``printing``

In [None]:
print(df_titanic.describe())

Displaying can be also achieved through display command as follows:

In [None]:
display(df_titanic.describe())

Note that the above stats are only for the numerical columns.

### Exercise 2.1

With the avocado data frame from Exercise 0.1:

Please display its columns.

Display the stats of this data frame:

Display all the entries of this data frame in the column ``Total Bags``.

Convert your data frame to a numpy array and then print it.

### Transposing your data

You may have heard about the transposing operation. In matrices, a transpose swaps the rows with columns. This operation makes sense with numpy arrays and Data Frames as well.

In [None]:
my_matrix = np.array([[1, 2], [3, 4]])
print(f"Original matrix \n {my_matrix}")
print(f"Transposed matrix \n {my_matrix.T}")

In [None]:
df_titanic_t = df_titanic.T
df_titanic_t

With this particular example it is not the best thing to do. However, we have got an interesting data frame. Let's spend some time on it.

### Exercise 2.2

Display the index of the above transposed titanic DataFrame.

Note that it starts with an index but its elements can be accessed as lists.