Summary Exercises - SOLVED#

import numpy as np #Run this cell or none of the np commands will work

Exercise 8.1#

In statistics, we often need to standardise the data. This means placing all datasets we are considering in our analysis onto a common scale. Standardisation is the process of considering different data sets on the same scale to make some comparisons between different type of datasets.

This means that we need to transform our dataset so that it has zero mean and its standard deviation is 1. So that every dataset that we process will follow this rule.

To demonstrate this on a simple example our data set is \([1, 3, 5]\) then its mean is \(3\) and standard deviation is approximately \(1.633\) (rounding with precision to 3 d.p.).

To standardise this data set we would subtract the mean from each of the element and then divide each element by the standard deviation.

Hence we transform \([1, 3, 5]\) into \(\left[\frac{1-3}{1.633}, \frac{3-3}{1.633}, \frac{5-3}{1.633} \right] = \left[ -1.225, 0, 1.225\right]\). The transformed data will have mean zero, and standard deviation one.

Your task

Write a function that takes as an input a 2D numpy array.

Then standardise each row of the input array (we treat each row as a new dataset).

Your function should return a new 2D numpy array that contains standarlised row. Please test your function on a few examples. We don’t need any rounding.

Hint: Use axis, and in-build numpy functions np.mean and np.std.

def standardise_array(test_array):
    mu = test_array.mean(axis=1)
    std = test_array.std(axis=1)
    zero_mean = test_array - np.vstack(mu)
    unit_std = zero_mean / np.vstack(std)
    return unit_std
#test the above
input_array = np.array([[1, 3, 5], [2, 5, 6], [1, 3, 5]])
print(standardise_array(input_array))
[[-1.22474487  0.          1.22474487]
 [-1.37281295  0.39223227  0.98058068]
 [-1.22474487  0.          1.22474487]]

Exercise 8.2#

Padding is an important tool when comes to image processing or image recognition. Earlier on in Exercise 5.2 we padded an 1d array with on the right hand side. You task it to write a function that pads with zeros a 2d array from the left and the right (one column in the left and one in the right).

Example: If the input array was [[1, 2], [3, 4]] then the ouput padded array is [ [0, 1, 2, 0], [0, 3, 4, 0]]

You function should have one input

  • a 2d numpy array.

It should return the new array that is the padded version of the input.

Hint: Use can use np.vstack, np.concatenate and axis. Alternatively, you could create a bigger numpy array of zeros and use slicing to insert the input array in the right place.

def pad_array(input_array):
  numrows, numcols = input_array.shape
  zerocol = np.zeros(numrows)
  zerostack = np.vstack(zerocol)
  a = np.hstack((zerostack,input_array))
  b = np.hstack((a,zerostack))
  return b
input_array = np.array([[1, 2, 6], [3, 4, 5], [3, 4, 6]])

pad_array(input_array)
array([[0., 1., 2., 6., 0.],
       [0., 3., 4., 5., 0.],
       [0., 3., 4., 6., 0.]])

Exercise 8.3 (\(\star\))#

An integral of a function \(f\) over an interval \([a, b]\) (\(a\), \(b\) are numbers so that \(a<b\)) yields the area under the curve \(f\) between \(a\) and \(b\).

In mathematics the intergrals are denoted by \(\int_a^b f(x) dx\).

We can approximate the value of the integral by summation as follows

\[(\star) \qquad \int_a^b f(x) dx \approx \sum_{k=0}^n f(x_k) \delta x,\]

where we divide the interval [a, b] into \(n\) subintervals (\(n\) should be large) of the same length, as follows $\( [a, b] = [\underbrace{a}_{=x_0}, \underbrace{a + \delta x}_{=x_1}] \cup [\underbrace{a + \delta x}_{=x_1}, \underbrace{a + 2 \delta x}_{=x_2}] \cup \cdots \cdots \cup [\underbrace{a+ (n-1) \delta x}_{=x_{n-1}}, \underbrace{a+n \delta x}_{=x_n} = b],\)\( where \)\delta x = \frac{b-a}{n}\(. Note that \)x_k = a + k \delta x$.

An example to illustrate the interval split. Take an interval \([0, 1]\) (all real numbers between \(0\) and \(1\) including both ends, so \(a=0\), \(b=1\)). Take \(n = 2\) then \(\delta x = \frac{b-a}{n} = \frac{1}{2} = 0.5\), we can decompose \([0, 1]\) as the union of

\[[0, 1] = \left[0, \frac{1}{2}\right] \cup \left[ \frac{1}{2}, 1\right].\]

We get that \(x_0 = 1\), \(x_1 = \frac{1}{2}\), \(x_2 = 1\). Of course, in general, we want large \(n\) but this was just to explain the above complicated notation.

Task

Please complete the code of the function approx_integral below. This function should return the right-hand side of the formula (\(\star\)).

You can first try using the loops as this is very similar to the summation exercises from previous weeks. After you succeed try using only numpy functions.

def approx_integral_for_loop(f, a, b, n):
    dx = (b-a)/n # this is the step delta x
    xk_vals = [a + k*dx for k in range(0, n+1)] # this is the list of [x_0, x_1, ..., x_n]
    total = 0.0
    for xk in xk_vals:
        total += f(xk)*dx
    return total

Now try to use only numpy, the comments should be helpful:

def approx_integral(f, a, b, n):
    dx = (b-a)/n
    xk_vals = np.linspace(a, b, n+1)
    f_vals = f(xk_vals)
    summand = f(xk_vals)*dx
    output = np.sum(summand)
    return output

Test your function on the examples below:

def f(x):
    return x**2

a = 0
b = 1
n = 1000

print(f"Without numpy: The integral of f over the interval a={a}, b={b} is {approx_integral_for_loop(f, a, b, n)}")
print(f"Numpy: The integral of f over the interval a={a}, b={b} is {approx_integral(f, a, b, n)}")
print("Your answer should be close to 1/3 = 0.333 (3 d.p).")
Without numpy: The integral of f over the interval a=0, b=1 is 0.33383350000000034
Numpy: The integral of f over the interval a=0, b=1 is 0.33383350000000006
Your answer should be close to 1/3 = 0.333 (3 d.p).
def f(x):
    return x*np.exp(x)

a = 0
b = 1
n = 1000

print(f"No numpy: The integral of f over the interval a={a}, b={b} is {approx_integral_for_loop(f, a, b, n)}")
print(f"Numpy: The integral of f over the interval a={a}, b={b} is {approx_integral(f, a, b, n)}")
print("Your answer should be close to 1.0.")
No numpy: The integral of f over the interval a=0, b=1 is 1.0013595106278563
Numpy: The integral of f over the interval a=0, b=1 is 1.0013595106278566
Your answer should be close to 1.0.