Summary Exercises
Contents
Summary Exercises#
import numpy as np #Run this cell or none of the np commands will work
Exercise 8.1#
Comparing Data on the Same Scale#
Sometimes, different sets of data use very different numbers. For example, one dataset might have values like [1, 3, 5], while another might have much larger values like [100, 200, 300].
If we want to compare these datasets fairly, we need to put them on the same scale. One way to do this is to transform each dataset so that:
its average (mean) is 0, and
its spread (standard deviation) is 1.
This process is called standardising the data.
Let’s see a small example:
Our dataset is [1, 3, 5].
The mean (average) is 3.
The standard deviation is about 1.633.
To standardise this data:
Subtract the mean from each value.
Divide each result by the standard deviation.
So, \( [1, 3, 5] \rightarrow \left[\frac{1-3}{1.633}, \frac{3-3}{1.633}, \frac{5-3}{1.633}\right] = [-1.225, 0, 1.225] \)
The new data has mean 0 and standard deviation 1 — perfect for comparing with other standardised datasets.
Your coding task#
Write a Python function that:
takes a 2D NumPy array as input,
standardises each row (treat each row as a separate dataset), and
returns a new 2D NumPy array with the standardised rows.
You don’t need to round the results.
Hint: You can use NumPy’s built-in functions np.mean() and np.std() with the axis parameter to calculate the mean and standard deviation for each row.
Example to test your function: Try it on arrays like
np.array([[1, 3, 5],
[2, 4, 6]])
Exercise 8.2#
Padding is an important tool when comes to image processing or image recognition. Surrounding an image with some empty pixels makes it clearer where the real data begins in e.g. astronomical image analysis.
Earlier on in Exercise 5.2 we padded an 1d array with on the right hand side. You task it to write a function that pads with zeros a 2d array from the left and the right (one column in the left and one in the right).
Example:
If the input array was [[1, 2], [3, 4]]
then the ouput padded array is
[ [0, 1, 2, 0], [0, 3, 4, 0]]
You function should have one input
a 2d numpy array.
It should return the new array that is the padded version of the input.
Hint: Use can use np.vstack, np.concatenate and axis. Alternatively, you could create a bigger numpy array of zeros and use slicing to insert the input array in the right place.
Exercise 8.3 (Difficult)#
An integral of a function \(f\) over an interval \([a, b]\) (\(a\), \(b\) are numbers so that \(a<b\)) yields the area under the curve \(f\) between \(a\) and \(b\).
In mathematics the intergrals are denoted by \(\int_a^b f(x) dx\).
We can approximate the value of the integral by summation as follows:
where we divide the interval [a, b] into \(n\) subintervals (\(n\) should be large) of the same length.
This can be summarised mathematically as:
where \(\delta x = \frac{b-a}{n}\). Note that \(x_k = a + k \delta x\).
Essentially, we can approximate the definite integral as a sum of lots of rectangles of width \(\delta x\) and height \(f(x)\).
### An example
To illustrate the interval split: take an interval \([0, 1]\) (all real numbers between \(0\) and \(1\) including both ends, so \(a=0\), \(b=1\)). Taking \(n = 2\), we therefore calculate \(\delta x = \frac{b-a}{n} = \frac{1}{2} = 0.5\). In mathematical notation, we can decompose \([0, 1]\) as the union of
We get that \(x_0 = 1\), \(x_1 = \frac{1}{2}\), \(x_2 = 1\). Of course, in general, we want large \(n\) but this was just to explain the above complicated notation.
Task#
Please complete the code of the function approx_integral below. This function should return the right-hand side of the formula identified with (\(\star\)) above.
You can first try using the loops as this is very similar to the summation exercises from previous weeks. After you succeed try using only numpy functions.
def approx_integral_for_loop(f, a, b, n):
dx = (b-a)/n # this is the step delta x
xk_vals = [a + k*dx for k in range(0, n+1)] # this is the list of [x_0, x_1, ..., x_n]
# Complete the code you will need to use for loop to capture the sum from the formula (*)
Now try to use only numpy, the comments should be helpful:
def approx_integral(f, a, b, n):
dx = (b-a)/n # this is the step delta x
xk_vals = np.linspace(a, b, n+1) # this is the array of [x_0, x_1, ..., x_n]
# in this line define the array of f(x_k)-s so that it is [f(x_0), f(x_1), f(x_2), ..., f(x_n)]
# in this line by using broadcasting multiply the array from the previous line by dx this should give you [f(x_0)*dx, f(x_1)*dx, f(x_2)*dx, ..., f(x_n)*dx]
# in this line sum up the array that you obtained in the previous line this will give you the desired output
# don't forget to return the output
Test your function on the examples below:
def f(x):
return x**2
a = 0
b = 1
n = 1000
print(f"Without numpy: The integral of f over the interval a={a}, b={b} is {approx_integral_for_loop(f, a, b, n)}")
print(f"Numpy: The integral of f over the interval a={a}, b={b} is {approx_integral(f, a, b, n)}")
print("Your answer should be close to 1/3 = 0.333 (3 d.p).")
Without numpy: The integral of f over the interval a=0, b=1 is None
Numpy: The integral of f over the interval a=0, b=1 is None
Your answer should be close to 1/3 = 0.333 (3 d.p).
def f(x):
return x*np.exp(x)
a = 0
b = 1
n = 1000
print(f"Without numpy: The integral of f over the interval a={a}, b={b} is {approx_integral_for_loop(f, a, b, n)}")
print(f"Numpy: The integral of f over the interval a={a}, b={b} is {approx_integral(f, a, b, n)}")
print("Your answer should be close to 1.0.")
Without numpy: The integral of f over the interval a=0, b=1 is None
Numpy: The integral of f over the interval a=0, b=1 is None
Your answer should be close to 1.0.
External links to other numpy tutorials#
For more details, and extended numpy functionalities please see