6 Numerical mathematics

Download notebook.

We will be using numpy and some scipy. These packages provide powerful and fast computations for numerical mathematics. (Most of it is implemented in C.) We will only learn about some parts of it, some linear algebra (using numpy), optimization (using scipy) and statistics (using scipy). Together with pandas (see Section 7.1) for data handling, this is the basic framework for data analysis in Python.

6.1 Arrays, vectors, matrices (`np.array`)

Let us start by importing numpy.

# uncomment the following line for installing numpy
# !pip3 install numpy
import numpy as np

(Then, the functions from numpy are called np.something.)

A vector or matrix is an np.array. numpy interprets a vector as a column vector. Here, x is an np.array of type float64:

x.ndim: number of dimensions of x.
x.shape: tuple of length x.ndim which gives the length of dimension i as entry i. (Recall that e.g. (3,) is a tuple of length 1.)
x.dtype: type of entries in x.
x.reshape(m, n): returns a view of x with shape (m, n). The total number of entries must match. Using -1 for one dimension lets NumPy infer it.
x.T: the transpose of x. For a 1d array, this has no effect.
x.copy(): returns an independent copy of x.
x.flatten(): returns a 1d copy of x.

x = np.array([1.0, 2.0, 3.0])
A = np.array([
    [1.0, 0.0, 2.0],
    [0.0, 1.0, 1.0]
])
print("x =", x)
print("A =")
print(A)

x = [1. 2. 3.]
A =
[[1. 0. 2.]
 [0. 1. 1.]]

print("x.shape =", x.shape)
print("A.shape =", A.shape)
print("x.ndim =", x.ndim)
print("A.ndim =", A.ndim)
print("x.dtype =", x.dtype)

x.shape = (3,)
A.shape = (2, 3)
x.ndim = 1
A.ndim = 2
x.dtype = float64

# reshape and transpose return views, not copies
print("A.reshape(3,2) =")
print(A.reshape(3,2))
print("A.reshape(-1) =", A.reshape(-1))
print("A.T =")
print(A.T)

A.reshape(3,2) =
[[1. 0.]
 [2. 0.]
 [1. 1.]]
A.reshape(-1) = [1. 0. 2. 0. 1. 1.]
A.T =
[[1. 0.]
 [0. 1.]
 [2. 1.]]

# column vector vs row vector vs 1d array
x_col = x.reshape(-1, 1)
x_row = x.reshape(1, -1)
print("x.shape =", x.shape)
print("x_col.shape =", x_col.shape)
print("x_row.shape =", x_row.shape)

x.shape = (3,)
x_col.shape = (3, 1)
x_row.shape = (1, 3)

An important observation is that the shape of a vector in NumPy is (3,), not (3, 1). This is mathematically convenient in some places and dangerous in others. Much of good NumPy programming consists in keeping track of such distinctions.

6.2 Creating arrays (`np.zeros`, `np.ones`, `np.arange`, `np.linspace`, `np.eye`)

There are several convenient ways to create arrays:

np.zeros(n): array of n zeros.
np.ones(n): array of n ones.
np.arange(n): array [0, 1, ..., n-1]. Also np.arange(start, stop, step).
np.linspace(a, b, n): n equally spaced points from a to b.
np.eye(n): the \(n \times n\) identity matrix.
np.empty(n): uninitialized array of length n (fast, but entries are garbage).
np.column_stack([a, b, ...]): stack 1d arrays as columns of a 2d array.
np.vstack([A, B]) / np.hstack([A, B]): stack arrays vertically / horizontally.

print("np.zeros(4) =", np.zeros(4))
print("np.ones(3) =", np.ones(3))
print("np.arange(10) =", np.arange(10))
print("np.linspace(0, 1, 5) =", np.linspace(0, 1, 5))

np.zeros(4) = [0. 0. 0. 0.]
np.ones(3) = [1. 1. 1.]
np.arange(10) = [0 1 2 3 4 5 6 7 8 9]
np.linspace(0, 1, 5) = [0.   0.25 0.5  0.75 1.  ]

print("np.eye(3) =")
print(np.eye(3))

np.eye(3) =
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

# stacking arrays
a = np.array([1.0, 2.0, 3.0])
b = np.array([10.0, 20.0, 30.0])
print("np.column_stack([a, b]) =")
print(np.column_stack([a, b]))

np.column_stack([a, b]) =
[[ 1. 10.]
 [ 2. 20.]
 [ 3. 30.]]

6.3 Basic arithmetic

Arithmetic operations on arrays are elementwise. This is different from matrix multiplication.

x + y, x - y: elementwise addition / subtraction.
x * y: elementwise multiplication (not matrix multiplication!).
x / y: elementwise division.
x ** n: elementwise power.
np.abs(x): elementwise absolute value. Also np.sqrt(x), np.exp(x), np.log(x), np.sin(x), etc.

x = np.array([1.0, -2.0, 4.0])
y = np.array([3.0, 5.0, -1.0])
print("x + y =", x + y)
print("x * y =", x * y)
print("2 * x =", 2 * x)
print("x ** 2 =", x ** 2)

x + y = [4. 3. 3.]
x * y = [  3. -10.  -4.]
2 * x = [ 2. -4.  8.]
x ** 2 = [ 1.  4. 16.]

6.4 Indexing, slicing, views, copies

Indexing and slicing work similarly to lists, but NumPy arrays support multi-dimensional indexing. An important difference: slicing an array creates a view, not a copy. Modifying the slice modifies the original.

a[i]: element at index i.
a[i:j]: slice from i to j-1.
a[::k]: every k-th element.
A[i, j]: element at row i, column j.
A[:, j]: column j.
A[i, :]: row i.
a[mask]: elements where boolean array mask is True.

a = np.arange(10)
print("a =", a)
print("a[2:7] =", a[2:7])
print("a[::2] =", a[::2])

a = [0 1 2 3 4 5 6 7 8 9]
a[2:7] = [2 3 4 5 6]
a[::2] = [0 2 4 6 8]

A = np.arange(1, 13).reshape(3, 4)
print("A =")
print(A)
print("A[:, 1] =", A[:, 1])
print("A[1, :] =", A[1, :])
print("A[:2, 1:3] =")
print(A[:2, 1:3])

A =
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
A[:, 1] = [ 2  6 10]
A[1, :] = [5 6 7 8]
A[:2, 1:3] =
[[2 3]
 [6 7]]

# views vs copies: modifying a slice modifies the original!
a = np.arange(10)
b = a[2:6]
b[0] = 99
print("a =", a)

a = [ 0  1 99  3  4  5  6  7  8  9]

# to avoid this, use .copy()
a = np.arange(10)
c = a[2:6].copy()
c[0] = -7
print("a =", a)
print("c =", c)

a = [0 1 2 3 4 5 6 7 8 9]
c = [-7  3  4  5]

# boolean masks
x = np.array([3.0, -1.0, 2.5, -4.0, 0.0, 7.0])
mask = x > 0
print("x[mask] =", x[mask])

x[mask] = [3.  2.5 7. ]

6.5 Vectorization and broadcasting

In numerical computing, we want to avoid Python loops whenever possible and instead work with full arrays at once. This is called vectorization. It is both faster and more readable.

x = np.linspace(0.0, 1.0, 6)
# vectorized: looks like the mathematical formula
y = 3 * x + 2
print("y =", y)

y = [2.  2.6 3.2 3.8 4.4 5. ]

Broadcasting allows NumPy to combine arrays of compatible but different shapes. The smaller array is “broadcast” across the larger one.

X = np.array([
    [1.0, 2.0],
    [3.0, 4.0],
    [5.0, 6.0]
])
b = np.array([10.0, 20.0])
# b is broadcast across all rows of X
print("X + b =")
print(X + b)

X + b =
[[11. 22.]
 [13. 24.]
 [15. 26.]]

# pairwise operations via broadcasting
x = np.array([0.0, 1.5, 3.0, 5.0])
D = np.abs(x[:, None] - x[None, :])
print("distance matrix =")
print(D)

distance matrix =
[[0.  1.5 3.  5. ]
 [1.5 0.  1.5 3.5]
 [3.  1.5 0.  2. ]
 [5.  3.5 2.  0. ]]

The expression x[:, None] turns x into a column vector (shape (4,1)), and x[None, :] into a row vector (shape (1,4)). Broadcasting then produces the full \(4 \times 4\) distance matrix without any loop.

6.6 Linear algebra (`np.linalg`)

NumPy provides a comprehensive linear algebra module. Here, x and y are vectors, A and B are matrices.

A @ B or np.dot(A, B): matrix multiplication. For two vectors, x @ y computes the inner product \(\langle x, y \rangle = \sum_j x_j y_j\). For a matrix and a vector, A @ x is the matrix-vector product.
np.linalg.norm(x): Euclidean norm \(\|x\|_2\).
np.linalg.solve(A, b): solve \(Ax = b\). Preferred over computing \(A^{-1}b\).
np.linalg.inv(A): matrix inverse. Usually solve is preferred.
np.linalg.det(A): determinant.
np.linalg.cond(A): condition number. Large values indicate numerical instability.
np.linalg.eig(A): eigenvalues and eigenvectors.

x = np.array([1.0, -2.0, 2.0])
y = np.array([3.0, 1.0, -1.0])
print("<x,y> =", x @ y)
print("||x||_2 =", np.linalg.norm(x))
# np.abs(), np.sum(), np.max() are elementwise/aggregation functions
# described in detail in the aggregation section below
print("||x||_1 =", np.sum(np.abs(x)))
print("||x||_inf =", np.max(np.abs(x)))

<x,y> = -1.0
||x||_2 = 3.0
||x||_1 = 5.0
||x||_inf = 2.0

A = np.array([[1.0, 2.0], [3.0, 4.0]])
x = np.array([1.0, -1.0])
print("A @ x =", A @ x)

A @ x = [-1. -1.]

# solving Ax = b
A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])
x = np.linalg.solve(A, b)
print("solution x =", x)
print("A @ x =", A @ x)

solution x = [2. 3.]
A @ x = [9. 8.]

# eigenvalues
A = np.array([[2.0, 1.0], [1.0, 2.0]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("eigenvalues =", eigenvalues)

eigenvalues = [3. 1.]

One often sees formulas like \(x = A^{-1}b\), but numerically it is usually better to solve the system directly using np.linalg.solve rather than computing the inverse explicitly.

6.7 Aggregation and the `axis` parameter

NumPy provides functions for aggregating arrays. When applied to 2d arrays, the axis parameter controls the direction.

np.sum(x), np.mean(x), np.median(x), np.var(x), np.std(x): sum, mean, median, variance, standard deviation.
np.min(x), np.max(x): minimum, maximum.
np.cumsum(x): cumulative sum.
axis=0: aggregate down the rows (result: one entry per column).
axis=1: aggregate across the columns (result: one entry per row).

data = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0]
])
print("sum over all =", np.sum(data))
print("column means =", np.mean(data, axis=0))
print("row means =", np.mean(data, axis=1))

sum over all = 45.0
column means = [4. 5. 6.]
row means = [2. 5. 8.]

# standardization: subtract column mean, divide by column std
X = np.array([[1.0, 10.0], [2.0, 20.0], [3.0, 30.0], [4.0, 40.0]])
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_std = (X - mu) / sigma
print("column means after standardization =", np.mean(X_std, axis=0))
print("column stds after standardization =", np.std(X_std, axis=0))

column means after standardization = [0. 0.]
column stds after standardization = [1. 1.]

Standardization is a common preprocessing step. It ensures that all features are on comparable scales, which is important e.g. for gradient-based optimization.

6.8 Random numbers (`np.random`)

Randomness enters data analysis in many places: synthetic data generation, random initialization of parameters, train-test splitting, and randomized algorithms. The recommended interface is np.random.default_rng, which creates a random number generator with a given seed.

rng = np.random.default_rng(seed): create a random number generator.
rng.random(n): n uniform random numbers in \([0,1)\).
rng.standard_normal(n): n standard normal random numbers.
rng.normal(loc, scale, size): normal with given mean and standard deviation.
rng.integers(low, high, size): random integers in [low, high).
rng.choice(a, size, replace): random sample from array a.
rng.permutation(n): a random permutation of 0, ..., n-1.

rng = np.random.default_rng(42)
print("uniform =", rng.random(5))
print("normal =", rng.standard_normal(5))
print("integers =", rng.integers(0, 10, size=5))

uniform = [0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]
normal = [-1.30217951  0.1278404  -0.31624259 -0.01680116 -0.85304393]
integers = [5 3 1 9 7]

# reproducibility: same seed gives same numbers
rng1 = np.random.default_rng(123)
rng2 = np.random.default_rng(123)
print(rng1.standard_normal(4))
print(rng2.standard_normal(4))

[-0.98912135 -0.36778665  1.28792526  0.19397442]
[-0.98912135 -0.36778665  1.28792526  0.19397442]

# rng.choice: draw random samples from an array
rng = np.random.default_rng(42)
names = np.array(["Alice", "Bob", "Charlie", "Diana", "Eve"])
print("pick 3 with replacement =", rng.choice(names, size=3, replace=True))
print("pick 3 without replacement =", rng.choice(names, size=3, replace=False))

pick 3 with replacement = ['Alice' 'Diana' 'Diana']
pick 3 without replacement = ['Eve' 'Diana' 'Bob']

# rng.permutation: shuffle an array (returns a new array)
rng = np.random.default_rng(7)
cards = np.array(["7", "8", "9", "10", "J", "Q", "K", "A"])
shuffled = rng.permutation(cards)
print("original =", cards)
print("shuffled =", shuffled)
# permutation of integers
print("random order of 0..4 =", rng.permutation(5))

original = ['7' '8' '9' '10' 'J' 'Q' 'K' 'A']
shuffled = ['7' 'K' 'A' '9' 'J' 'Q' '8' '10']
random order of 0..4 = [4 2 3 0 1]

6.9 Optimization (`scipy.optimize`)

The scipy.optimize module provides tools for finding minima of functions and roots of equations. This is essential in many areas of mathematics and data analysis.

optimize.minimize_scalar(f): find the minimum of a scalar function \(f: \mathbb{R} \to \mathbb{R}\). Returns a result object with .x (minimizer), .fun (minimum value), .nfev (number of function evaluations).
optimize.minimize(f, x0): find the minimum of \(f: \mathbb{R}^d \to \mathbb{R}\), starting from x0. Returns a result object with attributes .x (minimizer), .fun (minimum value), .success, .nit (number of iterations).
optimize.minimize(f, x0, jac=grad_f): same, but with analytically supplied gradient for faster convergence.
optimize.minimize(f, x0, method='...'): choose a specific algorithm, e.g. 'BFGS', 'Nelder-Mead', 'L-BFGS-B'.
optimize.root_scalar(f, bracket=[a, b]): find a root of \(f\) in the interval \([a, b]\). Returns a result object with .root (the root location).

from scipy import optimize

# minimizing a function of one variable
def f(x):
    return (x - 3)**2 + 1

result = optimize.minimize_scalar(f)
print("minimum at x =", result.x)
print("minimum value =", result.fun)
print("function evaluations =", result.nfev)

minimum at x = 3.0
minimum value = 1.0
function evaluations = 9

# minimizing a function of several variables (Rosenbrock function)
def rosenbrock(x):
    return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2

x0 = np.array([0.0, 0.0])
result = optimize.minimize(rosenbrock, x0)
print("minimum at x =", result.x)
print("success =", result.success)

minimum at x = [0.99999467 0.99998932]
success = True

# providing the gradient for faster convergence
def rosenbrock_grad(x):
    dfdx0 = -2 * (1 - x[0]) - 400 * x[0] * (x[1] - x[0]**2)
    dfdx1 = 200 * (x[1] - x[0]**2)
    return np.array([dfdx0, dfdx1])

result = optimize.minimize(rosenbrock, x0, jac=rosenbrock_grad, method='BFGS')
print("minimum at x =", result.x)
print("iterations =", result.nit)

minimum at x = [0.99999913 0.99999825]
iterations = 19

# root finding
def g(x):
    return x**3 - 2*x - 5

result = optimize.root_scalar(g, bracket=[2, 3])
print("root at x =", result.root)
print("g(root) =", g(result.root))

root at x = 2.094551481542327
g(root) = 3.552713678800501e-15

6.10 Exercises

Exercise 1 Find out about the singular value decomposition in np.linalg.svd. Compute the SVD of the matrix \(A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}\) and verify that \(A = U \Sigma V^\top\) by reconstructing \(A\) from the three factors.

# Exercise 1

Exercise 2 Why is it better to use np.linalg.solve rather than np.linalg.inv? Demonstrate with a concrete example: create a \(10 \times 10\) random matrix \(A\) and a vector \(b\). Solve \(Ax = b\) both ways and compare the residuals \(\|Ax - b\|\). Then try with a nearly singular matrix (e.g., add a row that is almost a copy of another) and compare again.

# Exercise 2

Exercise 3 Write a function power_method(A, num_iter=100) that approximates the largest eigenvalue of a matrix \(A\) using the power iteration: start with a random vector \(v\), repeatedly compute \(v \leftarrow Av / \|Av\|\), and return \(v^\top A v\). Compare the result with np.linalg.eig.

# Exercise 3

Exercise 4 Create a 1d array of 20 equally spaced values between 0 and \(2\pi\) using np.linspace. Compute \(\sin(x)\) and \(\cos(x)\) for these values (vectorized, no loop). Then use boolean masking to select only those entries where \(\sin(x) > 0.5\).

# Exercise 4

Exercise 5 Use np.random.default_rng(seed=0) to generate a \(100 \times 4\) matrix of standard normal random numbers. Compute the mean and standard deviation of each column using the axis parameter. Then standardize the matrix (subtract column means, divide by column standard deviations) and verify that the result has column means \(\approx 0\) and column standard deviations \(\approx 1\).

# Exercise 5

Exercise 6 Broadcasting exercise: create a column vector \(a = (1, 2, 3, 4)^\top\) and a row vector \(b = (10, 20, 30)\). Compute their outer product using broadcasting (i.e., a * b with appropriate shapes). Verify the result by comparing with np.outer(a, b).

# Exercise 6

Exercise 7 Given the array temps = np.array([15.2, 18.5, 22.1, 25.3, 19.8]) of temperatures in Celsius, convert all values to Fahrenheit using the formula \(F = C \cdot 9/5 + 32\) — without using a loop. Then find all temperatures above \(70°F\) using boolean masking.

# Exercise 7

Exercise 8 Create two arrays: prices = np.array([10.0, 20.0, 30.0]) and quantities = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), where each row represents a customer’s order. Compute the total cost per customer using broadcasting and np.sum with the axis parameter.

# Exercise 8

Exercise 9 Create a \(5 \times 5\) matrix where entry \((i, j)\) contains the value \(|i - j|\) (the distance between row and column index). Do this without loops, using broadcasting with np.arange and np.abs.

# Exercise 9

6.1 Arrays, vectors, matrices (np.array)

6.2 Creating arrays (np.zeros, np.ones, np.arange, np.linspace, np.eye)