a = 2
b = 5
if a < b:
res = a
else:
res = b
print(f"The minimum of {a} and {b} is:", res)The minimum of 2 and 5 is: 2
Coding starts to become interesting when we can control how variables change, and repeat operations.
if ... elif ... else)As a simple (mathematical) example, let us compute the minimum of two numbers., which works by an if-condition:
a = 2
b = 5
if a < b:
res = a
else:
res = b
print(f"The minimum of {a} and {b} is:", res)The minimum of 2 and 5 is: 2
In general, the structure of an if-statement reads:
if _condition1_:
case1
elif _condition2_:
case2
elif _condition3_:
case3
...
else:
case_else
Here, read elif alse else if, which means that the code jumps into this case if the respective condition is the first to hold. The else case only applied if both the if and all elif conditions do not apply.
In fact, there are abbreviations in Python which are very handy. In particular, look at this piece of code.
res = a if a < b else b
print(f"The minimum of {a} and {b} is:", res)The minimum of 2 and 5 is: 2
Let us make another example, also using elif. Given some n, the task is to print Fizz if n is a multiple of 3, Buzz, if it is a multiple of 5, and FizzBuzz if both conditions are satisfied.
n = 165
res = ""
if n % 3 == 0 and n % 5 == 0:
res = "FizzBuzz"
elif n % 3 == 0:
res = "Fizz"
elif n % 5 == 0:
res = "Buzz"
print(res)FizzBuzz
for)There are for-loops in Python, as discussed in this section, and while-loops in the next section. The basic use-case of a for-loop is repeating a task several times. As a simple example, adding b a number of a times, results in a * b.
a = 3
b = 7
res = 0
for i in range(a):
res = res + b
print(f"{a} * {b} = {res}")3 * 7 = 21
Importantly, range(a) is a synonym for the numbers 0,..., a-1. (In particular, note that Python usually starts counting at 0 and not at 1, which will also be important in the next chapter.)
The general structure of a for-loop is as follows:
for _var_ in _iterable:
_sequence of instructions_
Let us look at a more interesting example, computing the greatest common divisor (gcd) of two numbers, a and b. A very simple algorithm is going through all numbers 1,..., in order, and update result each time we encounter a number which divides both, a and b. (Here, range(1,a+1) is a synoym for the numbers 1,...,a.)
# Basic algorithm for finding the gcd of two numbers
a = 105
b = 33
res = 1
for i in range(1,a+1):
if a % i == 0 and b % i == 0:
res = i
print(f"The gcd of {a} and {b} is {res}.")The gcd of 105 and 33 is 3.
Let us make some remarks on for-loops: * In the last example, note that the instructios within the for-loop depend on i. * Within a for-loop, you can use continue in order to stop the execution of the current iteration, and immediately jump to the next i. * Within a for-loop, you can use break in order to stop the whole execution of the for-loop.
# continue skips the rest of the current iteration
for i in range(5):
if i == 2:
continue
print(i, end=" ")
print()0 1 3 4
# break stops the loop entirely
for i in range(10):
if i == 5:
break
print(i, end=" ")
print()0 1 2 3 4
while)The for-loop directly iterates over some variable (i above). The while-loop instead comes with a condition, which is checked everytime it is started new. The general structure is:
while _condition_:
instructions which might change the value of _condition_
Let us make two examples: finding \(\sqrt{2}\) and computing the gcd using Euclid’s algorithm.
For finding \(\sqrt{2}\), note that this is a fixed point for the iteration \[ x_{n+1} = \frac {x_n} 2 - \frac 1 {x_n}.\] (In order to see this, assume that \(x = x_n = x_{n+1}\), and multiply the recursion by \(x\).) We can use this as follows. Here, abs() is a built-in function returning the absolute value of a number.
x = 1
eps = 1e-10
x_prev = 0
while abs(x - x_prev) > eps:
x_prev = x
x = x/2 + 1 / x
print(x)1.414213562373095
The above algorithm (using for-loops) for finding the gcd were not particulary efficient. (You might want to test them with some large numbers.) Euclid’s algorithm is much more efficient. It is based on the observation that c % a == 0 (c divides a) and c % b == 0 iff c % a == 0 and c % (b % a).
In order to see this, note that c % a == 0 and c % b == 0 iff c % a == 0 and c % (b - da) == 0 for any d. Then, the result follows from choosing d = b \\ a since b - (b \\ a) * a = b % a. Moreover, every number divides 0, so if b % a == 0, the gcd of a and b is a.
We can use these insights, taking a < b for simplicity. Starting with a and b, we compute if b % a == 0. If yes, we are done, and return a. If not, we compute the gcd of a and b % a instead.
# Euclid's algorithm for finding the gcd of two numbers
a_input = 105
b_input = 33
# Copy the input to new variables
a = a_input
b = b_input
# Run Euclid's algorithm
while a != 0:
r = b % a
print(f"b = {b}, a = {a}, remainder = {r}")
b = a
a = r
print(f"The gcd of {a_input} and {b_input} is {b}.")b = 33, a = 105, remainder = 33
b = 105, a = 33, remainder = 6
b = 33, a = 6, remainder = 3
b = 6, a = 3, remainder = 0
The gcd of 105 and 33 is 3.
def)Once we implement something, we wan to reuse it without re-writing the code. This means, we want to pack our code into its own function for reusing it. For Euclid’s algorithm, this would look as follows:
a = 105
b = 33
def gcd(a,b):
""" Compute the greatest common divisor of a and b."""
while a != 0:
r = b % a
b = a
a = r
return b
print(f"The gcd of {a} and {b} is {gcd(a,b)}.")The gcd of 105 and 33 is 3.
The general structure is
def name_of_the_funcion(arg1, arg2, arg3 = default3, arg4=default4):
some instructions
return something
Not all functions return something. Those who don’t usually produce some output. Here is an example
def greeting(name, prefix="Hello"):
""" Return a greeting for name."""
print(f"{prefix}, {name}!")
greeting("Alice")
greeting("Bob", prefix="Hi")
greeting(prefix="Welcome", name="Charlie")Hello, Alice!
Hi, Bob!
Welcome, Charlie!
Here are some important remarks on functions:
a and b occur both, within and outside of the function. Although the variables a and b, which are private to the function, are changed within the function, the print-statement still knowns their value from before the function definition.prefix = "Hello" in the greeting example.greeting("Bob", prefix="Hi"), where you rely on "Bob" being in the name position.) These are called positional variables.greeting(prefix="Welcome", name="Charlie"), the output was correct although the order of the two variables differs from the function definition.)def-line. They indicate what this function is doing and are displayed in various places, e.g. when you hover over a function in your jupyter notebook.lambda-functions, which are very quick: lambda x: x * x. Most use cases are with container data types, and we will have one in Section 4.1.raise, try ... except)In code, various things happen, and some go wrong. Raising (throwing) errors, catching them, and dealing with them is the topic of this section. There are many error types implemented. Let us look at some of them:
| Exception | When it happens | Example |
|---|---|---|
ValueError |
Right type, wrong value | int("abc") |
TypeError |
Wrong type used | "2" + 3 |
IndexError |
Index out of range | [1,2][5] |
KeyError |
Dictionary key missing | {"a":1}["b"] |
ZeroDivisionError |
Division by zero | 1 / 0 |
FileNotFoundError |
File does not exist | open("x.txt") |
PermissionError |
No access rights | opening protected file |
AttributeError |
Object has no attribute | "abc".foo |
NameError |
Variable not defined | print(x) |
ImportError |
Import fails | import nonexisting |
ModuleNotFoundError |
Module not found | import xyz |
AssertionError |
assert fails |
assert False |
We obtain an error here:
print("This will not work")
print(f"1/0 = {1/0}")
print("If this print works, the program has continued past 1/0")This will not work
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) Cell In[12], line 2 1 print("This will not work") ----> 2 print(f"1/0 = {1/0}") 3 print("If this print works, the program has continued past 1/0") ZeroDivisionError: division by zero
However, if we catch the error, the code can run past it:
try:
print("This will not work")
print(f"1/0 = {1/0}")
print("This never runs")
except ZeroDivisionError as e:
print("You must not divide by zero!")
print("This is the error message:", e)
print("The program continues!")This will not work
You must not divide by zero!
This is the error message: division by zero
The program continues!
Now we know how to catch errors, but often, we need to throw them. This is actually simple using raise:
def compute_sth(x):
if type(x) != int:
raise TypeError("x must be an int!")
y = x ** x
return y
try:
print(compute_sth(2))
print(compute_sth("abc"))
except TypeError:
print("Calculation not possible, sorry")4
Calculation not possible, sorry
open, read, write, with)Reading and writing files is one of the most common tasks in data analysis. Python’s built-in open function, combined with the with statement, provides a clean way to handle files. The with context manager handles opening and closing files for you. More generall, our Python file (or notebook) must interact with other files or ressources (databases, other connections). In order to do this safely, we are using with, which guarantees that the ressource is entered (opened) and exited (closed) correctly. The general structure is as follows:
with something as name:
do_something()
Specifically for input and output from and to files, let f be an open file object. Then, the following are useful: * open(path, mode): open a file. See below for some details on mode. * f.read(): read the entire file as a single string. * f.readline(): read a single line. * f.readlines(): read all lines into a list of strings. * f.write(s): write string s to the file. * f.writelines(lines): write a list of strings to the file.
with open("hello.txt", "w") as f:
f.write("Hello")
f.write("This is a new line in the file!\n")
print("File written.")File written.
with open("hello.txt", "r") as f:
text = f.read()
print("This is the content of the file:", text)This is the content of the file: HelloThis is a new line in the file!
# reading line by line (useful for large files)
with open("hello.txt", "r") as f:
for line in f:
print(line.strip())
with open("hello.txt", "r") as f:
print(f.read())HelloThis is a new line in the file!
HelloThis is a new line in the file!
Here, "hello.txt" is closed at the end of the with block. The second argument of open comes with the following modes: * r: read (text mode) * w: (over-)write (text mode) * a: append (text mode) * x: create new (i.e. throw error if already exists) * rb: read (binary mode) * wb: (over-)write (binary mode) * ab: append (binary mode) * xb: create new (i.e. throw error if already exists)
Exercise 1 Do newlines and tabs \n, \t count in str.isspace()? How do numbers interact with str.islower()? If s is the string of a number, what is s.lower()? Play around a bit with these functions and edge cases you might see in data. Then write a function classify_char(c) that takes a single character and returns "letter", "digit", "whitespace", or "other".
# Exercise 1Exercise 2 Compute \(\sum_{i=1}^{100} i\) using a for-loop.
# Exercise 2Exercise 3 Find the largest \(n\) such that \(\sum_{i=1}^n i < 1000\).
# Exercise 3Exercise 4 for-loops can also become nested. Compute \(\sum_{i=1}^{100} \sum_{j=1}^i j\). (Note the double indentation for the inner for-loop!)
# Exercise 4Exercise 5 Write a function (without using imports) which computes the sum of all digits (cross sum) of an int.
# Exercise 5Exercise 6 Can you code the FizzBuzz example in a single line of code? So, dependent on some n, give one line of code, which gives Fizz if n is a multiple of 3, Buzz, if it is a multiple of 5, and FizzBuzz if both conditions are satisfied.
# Exercise 6Exercise 7 Python (since version 3.10) comes with match, which is useful for branching on specific values. Write a function season(month) that takes a month number (1–12) and returns the season ("Winter", "Spring", "Summer", "Fall"). Use match with grouped patterns (e.g. case 3 | 4 | 5:) and a wildcard _ for invalid inputs. Find out how match differs from a chain of elifs.
# Exercise 7Exercise 8 The following function is supposed to compute the average of a list of numbers, but it contains a subtle bug. Find and fix it. (Hint: try it with a few inputs of different lengths and compare the result to what you expect. Debugging with print statements or the built-in debugger can help — see Section 11.7.)
def average(numbers):
total = 0
for i in range(1, len(numbers)):
total += numbers[i]
return total / len(numbers)
print(average([10, 20, 30])) # expected: 20.0
print(average([1, 2, 3, 4])) # expected: 2.5
print(average([7, 8])) # expected: 7.516.666666666666668
2.25
4.0
# Exercise 8Exercise 9 Write a function collatz(n) that returns the number of steps it takes for the Collatz sequence starting at n to reach 1. (The rule is: if n is even, divide by 2; if odd, compute 3n+1.)
# Exercise 9Exercise 10 Write a function newton_sqrt(a, tol=1e-10) that computes \(\sqrt{a}\) using Newton’s method, i.e. the iteration \(x_{k+1} = \frac{1}{2}\left(x_k + \frac{a}{x_k}\right)\), starting from \(x_0 = a\), and stopping when \(|x_{k+1} - x_k| < \text{tol}\).
# Exercise 10Exercise 11 Write a function primes_up_to(n) that returns a list of all prime numbers up to n, using the Sieve of Eratosthenes.
# Exercise 11Exercise 12 Write a function bisection(f, a, b, tol=1e-10) that finds a root of a continuous function \(f\) on \([a, b]\) using the bisection method. The function should raise a ValueError if \(f(a)\) and \(f(b)\) have the same sign. Test it on \(f(x) = x^3 - 2\).
# Exercise 12Exercise 13 Write a function caesar(s, k) that shifts every letter in the string s by k positions in the alphabet (wrapping around from z to a). Non-letter characters should remain unchanged. (Hint: ord(c) returns the integer code of a character, e.g. ord("a") == 97, and chr(n) converts an integer back to a character, e.g. chr(97) == "a".)
# Exercise 13