4 Container data types

Download notebook.

When working with data, we usually not only have one int or one str, but many of them. For this situation, there are the container data types, list, tuple, set, and dict.

4.1 Lists (`list`)

A list is the most common container data type in Python. Two fundamental operations on lists are indexing and slicing. Indexing means accessing a single element by its position (starting at 0), and slicing means extracting a sublist using start:stop notation (where the element at position stop is excluded). Negative indices count from the end of the list, so l[-1] is the last element, l[-2] the second-to-last, and so on.

# Indexing: access a single element by position
names = ["Alice", "Bob", "Charlie", "Donna"]
print(names[0])
print(names[2])
# Negative indices count from the end
print(names[-1])
print(names[-2])

Alice
Charlie
Donna
Charlie

# Slicing: extract a sublist
names = ["Alice", "Bob", "Charlie", "Donna"]
print(names[1:3])
print(names[:2])

['Bob', 'Charlie']
['Alice', 'Bob']

The notation for a list comes with square brackets, where the entries are separated by a comma. The list of numbers 1, 2, 3 is [1,2,3]. A list of names is e.g. ["Alice", "Bob", "Charlie", "Donna"]. So, for a list, the entries have simple (or again container) data types.

We can access the entries of a list using square brackets. However, we have to be careful if the length of the list is too short.
We can access sublists by using i:j for the entries i,...,j-1. If we want to have the beginning until j-1, we use :j. If we want everything after i, we use i:.
Elements of a list can be lists (or other container data types) themselves.
The types of elements of a list can be mixed.
When talking about for-loops, we already had range(n), which stands for 0,...,n-1. This is not exactly a list, but can be turned into one by list(range(n)). The range function also accepts range(start, stop, step), e.g. range(0, 10, 2) gives 0, 2, 4, 6, 8. In general, range(start, stop, step) produces start, start + step, start + 2*step, ..., stopping before stop.

# a list of ints
x = [1,2,3]
print(x[0])
# len(x) gives the length of x
print(x[3] if len(x) > 3 else None)

1
None

Slicing refers to sublists.

x = [1,2,3]
print(f"The first and second element: {x[0:2]}")
print(f"The first two elements: {x[:2]}")
print(f"All elements after the first element: {x[1:]}")

The first and second element: [1, 2]
The first two elements: [1, 2]
All elements after the first element: [2, 3]

Playing around with range.

print(f"The first and second odd number: {list(range(1, 5, 2))}")
print(f"The first and second odd number: {list(range(1, 100, 2))[:2]}")

The first and second odd number: [1, 3]
The first and second odd number: [1, 3]

The elements of a list can be lists themselves.

y = [[1,2], [3,4,5], 6, 7, 8]
print(y[1])
print(y[1][0])

[3, 4, 5]
3

A list can have mixed types.

z = [1, "two", [3,4], 5.0]
print(z[1])

two

Here are some functions on lists.

len(l): length of l.
sum(l): sums all elements of l, if possible.
min(l), max(l): computes the minimum/maximum of l.
sorted(l): returns a sorted version of l.
filter(fun, l): returns a filtered list, which only contains elements x of l with fun x == True.
list.append(l, x) (or l.append(x)) (where x is any object) appends the element x to the list l.
list.extend(l, x) (or l.extend(x)) (where x is a list) extends l by all elements of x. (You can also write l + x for this.)
list.insert(l, i, x) (or l.insert(i,x)) inserts x at position min(i, len(l)) in l.
list.remove(l, x) (or l.remove(x)) removes the first occurrence of x in l.
list.pop(l) (or l.pop()) removes the last element from l.
list.index(l, x) (or l.index(x)) gives the index of the first occurrence of x in l.
list.count(x) (or l.count(x)) counts the number of occurrences of x in l.
list.sort(l) (or l.sort()) sorts the list.
list.reverse(l) (or l.reverse()) reverses the list.

For the functions starting with list, there is a catch: While we mentioned that a function takes its input (here the list), and does not alter it, lists are mutable objects, which means that they are changed when calling such a function on them. So, e.g. after l.sort(), the list l itself is sorted, and there is no return value of that function (more precisely, the return value is None.) This can be counterintuitive:

l = [3,2,1]
# This gives none since the return value of l.sort() is None
print(l.sort())

l = [3,2,1]
# This changes l and makes it sorted.
l.sort()
# Now we can print it 
print(f"l = {l}")

# In contrast, sorted() does not change the list but returns a new sorted list.
l = [3,2,1]
print(sorted(l))

None
l = [1, 2, 3]
[1, 2, 3]

Assume you want to sort a list not as usual. In this situation, you will find out about the key-parameter in sorted. It takes a function, which sorts the list according to the values of this function. Here, we use lambda to define short anonymous functions inline: lambda x: expression creates a function that takes x and returns expression (see also Section 3.4 in Chapter 3). As an example, assume we want to sort a list of ints according to their absolute value:

l = [5, -4, 0, 7, -3, 2]
print(sorted(l, key = lambda x: abs(x)))

[0, 2, -3, -4, 5, 7]

Similarly, you can e.g. filter the list by only taking numbers which are at least one:

l = [5, -4, 0, 7, -3, 2]
print(list(filter(lambda x: x >= 1, l)))

[5, 7, 2]

Let us shortly describe two functions on a list of bools l.

all(l) gives True if all entries of l are `True.
any(l) gives True if at least one entry of l is `True.

A useful function that takes a list of strings and combines them into a single string is str.join (already mentioned briefly in Section 2.5). It is called on the separator string and takes the list as argument.

"sep".join(l): joins the elements of the list of strings l into a single string, with "sep" placed between consecutive elements.

names = ["Alice", "Bob", "Charlie"]
print(", ".join(names))
print(" and ".join(names))

Alice, Bob, Charlie
Alice and Bob and Charlie

4.2 Tuples (`tuple`)

At first glance, a tuple is like a list, but comes with normal brackets () instead of square brackets []. So, (1,2,3) is the tuple of these three numbers, and ("Alice", "Bob", "Charlie") is a tuple of three names. For the tuplel, still,l[0]gives the first entry etc. The main difference betweentupleandlistis thatlists are more flexible when adding or changing entries, andtuple`s are fixed (immutable) once they have been created.

l = ("Alice", "Bob", "Charlie")
# The next line does not work, since l is immutable
# l[0] = "Anna"
l = ["Alice", "Bob", "Charlie"]
# The next line works, since l is mutable. 
l[0] = "Anna"
print(f"l = {l}")

l = ['Anna', 'Bob', 'Charlie']

An important aspect of tuples is that single values can easily be packed into a tuple. An important example is to use a tuple in order to assign two or more values at the same time:

def min_max(a,b):
    (mi, ma) = (a if a < b else b, b if a < b else a)
    return mi, ma

a = 20
b = 10
res = min_max(a,b)
print(f"Minimum and maximum are {res[0]} and {res[1]}.")

Minimum and maximum are 10 and 20.

A special note applies to tuples of length 1. Here, note that e.g. (1) does not create a tuple of length 1, but an int. Instead, use (1,) for the tuple of length one.

thisisnotuple = (1)
print(type(thisisnotuple))
print(thisisnotuple[0])
thisisatuple = (1,)
print(type(thisisatuple))
print(f"Its first and only element is {thisisatuple[0]}.")

<class 'int'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 3
      1 thisisnotuple = (1)
      2 print(type(thisisnotuple))
----> 3 print(thisisnotuple[0])
      4 thisisatuple = (1,)
      5 print(type(thisisatuple))
      6 print(f"Its first and only element is {thisisatuple[0]}.")

TypeError: 'int' object is not subscriptable

4.3 Sets (`set`)

As in mathematics, sets have no special order, and there are noduplicates within a set. In Python, they can be created using curly brackets, so e.g. {1,2,3} as well as {1,3,2,3,1} is the set of the numbers 1,2,3. Although there are functions for updating sets (as for lists), the most frequent application is for removing duplicates. For example, if you have a list, which can contain duplicates, list(set(l)) is the same list (maybe the order changed) but with duplicates removed.

Here are some functions on some s and t:

len(s): returns the number of unique elements in s.
s | t: The union of s and t.
s & t: The intersection of s and t.
s - t: The difference of s and t.

l = [1,3,2,4,3,2]
print(f"There are {len(set(l))} unique elements in {l}.")

There are 4 unique elements in [1, 3, 2, 4, 3, 2].

Unions, intersection, and set differences work as expected:

s = {1,2,3}
t = {5,4,3}
print(f"The union of {s} and {t} is {s | t}.")
print(f"The intersection of {s} and {t} is {s & t}.")
print(f"The difference of {s} and {t} is {s - t}.")

The union of {1, 2, 3} and {3, 4, 5} is {1, 2, 3, 4, 5}.
The intersection of {1, 2, 3} and {3, 4, 5} is {3}.
The difference of {1, 2, 3} and {3, 4, 5} is {1, 2}.

4.4 Dictionaries (`dict`)

Dictionaries (dict) are nothing but a collection of key-value pairs. Here is an example:

d = {
        "first_name": "Alice",
        "last_name": "Müller",
        "age": 29,
        "hobbies": ["reading", "cycling", "chess"]
    }
print(f"""{d['first_name']} {d['last_name']} is {d['age']} and likes 
    {', '.join(d['hobbies'])}.""")

Alice Müller is 29 and likes 
    reading, cycling, chess.

As you see in the example, the f-string uses nested quotation marks, "...'...'...". For this reason, we use single quotation marks '' as inner separators and double quotation marks "" for the outer quotation marks. If there is no nesting, there is no difference in using single/double quotation marks.
keys in a dict can be any immutable data type (like int, str, float, tuple).

As you see, accessing field 'age' in the dict p is done using p['age']. However, assume 'age' is not a field in p. Then, p['age'] gives an error. For this reason, it is often better to use the get-function:

print(f"{d['first_name']} {d['last_name']} lives in {d.get('address')}.")
# This version uses a default field in get
print(f"""{d['first_name']} {d['last_name']} lives 
    in {d.get('address', 'unknown')}.""")
# We can also add "address" to the dict
d["address"] = "Freiburg, Germany"
print(f"""{d['first_name']} {d['last_name']} lives 
    in {d.get('address', 'unknown')}.""")

Alice Müller lives in None.
Alice Müller lives 
    in unknown.
Alice Müller lives 
    in Freiburg, Germany.

Some of the most important functions on d as a dicts are:

dict.keys(d) (or d.keys()): Returns the list of all keys.
dict.values(d) (or d.values()): Returns the list of all values.
dict.items(d) (or d.items()): Returns the list of tuples (key, value).

print(f"The keys are {', '.join(d.keys())}.")
print(f"The values are:")
for v in d.values():
    print(str(v))
print(f"The key-value-pairs are:")
for key, value in d.items():
    print((key, str(value)))

The keys are first_name, last_name, age, hobbies, address.
The values are:
Alice
Müller
29
['reading', 'cycling', 'chess']
Freiburg, Germany
The key-value-pairs are:
('first_name', 'Alice')
('last_name', 'Müller')
('age', '29')
('hobbies', "['reading', 'cycling', 'chess']")
('address', 'Freiburg, Germany')

Let us mention some more functions on dicts:

dict.update(d, d') (or d.update(d')): extends d by the keys in d', and overwrites entries in d by the values of d'. Another possibility is d |= d' (similar to a set-union).
dict.pop(d, key, default) (or d.pop(key, default)): removes the key key from d and returs its value (or default, if the key isn’t found).
del d[key]: Deletes the entry with key key from d.

d |= {
    "profession" : "student",
    "subject" : "Mathematics"
}
# del removes an entry from the dict
del d["age"]
for key, value in d.items():
    print((key, str(value)))

('first_name', 'Alice')
('last_name', 'Müller')
('hobbies', "['reading', 'cycling', 'chess']")
('address', 'Freiburg, Germany')
('profession', 'student')
('subject', 'Mathematics')

Here is a last example for a list of dicts.

people = [
    {
        "first_name": "Alice",
        "last_name": "Müller",
        "age": 29,
        "hobbies": ["reading", "cycling", "chess"]
    },
    {
        "first_name": "Bob",
        "last_name": "Schmidt",
        "age": 35,
        "hobbies": ["hiking", "photography", "cooking"]
    },
    {
        "first_name": "Clara",
        "last_name": "Weber",
        "age": 22,
        "hobbies": ["painting", "piano", "running"]
    },
    {
        "first_name": "David",
        "last_name": "Fischer",
        "age": 41,
        "hobbies": ["gardening", "woodworking", "cycling"]
    }
]
for p in people:
    print(f"{p["first_name"]} {p["last_name"]} is {p["age"]} and likes "
        + ", ".join(p["hobbies"]) + ".")

Alice Müller is 29 and likes reading, cycling, chess.
Bob Schmidt is 35 and likes hiking, photography, cooking.
Clara Weber is 22 and likes painting, piano, running.
David Fischer is 41 and likes gardening, woodworking, cycling.

4.5 Combining lists (`zip`, `enumerate`)

When working with multiple lists in parallel, zip and enumerate are very useful built-in functions.

zip(a, b): combines two (or more) lists element-wise into pairs (tuples). Stops at the shorter list.
enumerate(a): yields pairs (index, element) for each element of a.

names = ["Alice", "Bob", "Charlie"]
scores = [85, 92, 78]

# zip: iterate over two lists in parallel
for name, score in zip(names, scores):
    print(f"{name} scored {score}")

Alice scored 85
Bob scored 92
Charlie scored 78

# enumerate: get index and value
for i, name in enumerate(names):
    print(f"{i}: {name}")

0: Alice
1: Bob
2: Charlie

# zip in a list comprehension
diffs = [a - b for a, b in zip([10, 20, 30], [3, 7, 12])]
print(diffs)

[7, 13, 18]

4.6 Comprehensions

Python has very expressive (i.e. short) ways to alter container data types, called comprehensions. The general form is (where “iterable” is a container data type):

[expression for variable in list if condition]
[expression_if_true if condition else expression_if_false for variable in list]
{key: value for item in dict if condition}

Let us make some examples, and you will see how powerful these complehensions are!

# The squares of all even numbers below 10
print([x*x for x in range(10) if x % 2 == 0])
# The names of people who like cycling
print("People who like cycling are " + 
    ', '.join([f'{p["first_name"]} {p["last_name"]}' for p in people 
        if "cycling" in p['hobbies']]) + ".")

[0, 4, 16, 36, 64]
People who like cycling are Alice Müller, David Fischer.

4.7 Exercises

Exercise 1 Find out how you can sort a list descendingly.

# Exercise 1

Exercise 2 Write a function is_palindrome(s) that returns True if the string s reads the same forwards and backwards (e.g. "racecar"), and False otherwise. Ignore upper/lower case. (Hint: you can reverse a string using slicing.)

# Exercise 2

Exercise 3 Given a list of names [“ALICE”, “BOB”, “CHARLIE”], convert them all to lowercase.

# Exercise 3

Exercise 4 From a list of fruits [“apple”, “banana”, “kiwi”, “mango”, “pear”], create a list of fruits that have exactly 5 letters.

# Exercise 4

Exercise 5 Create a list of tuples representing all possible (x, y) coordinates where x is from [1, 2, 3] and y is from [4, 5].

# Exercise 5

Exercise 6 Given the dict {"a": 1, "b": 2, "c": 3}, create a new dictionary that swaps keys and values {1: "a", ...}. This is best done using a comprehension.

# Exercise 6

Exercise 7 For two strings s and t, write a function myCount(s, t), which counts the numbers of occurrences of t in s. Unlike count, it should count the number of positions in s starting with t. For example, "bababab".count("bab") gives 2, but "bababab".myCount("bab") should give 3.

# Exercise 7

Exercise 8 Write a function which removes duplicates from a list (you might want to use set here), but keeps the order of the list.

# Exercise 8

Exercise 9 Assume you have a = [10, 20, 30, 40] and mask = [True, False, True, False], you might want obtain a sublist of a which only contains the positions where mask has True (without using other libraries). (You will probably need the enumerate- or zip-function here.)

# Exercise 9

Exercise 10 Given a list of integers, write a function most_frequent(lst) that returns the element that appears most often. If there is a tie, return any of the most frequent ones. (Hint: use a dict to count.)

# Exercise 10

Exercise 11 Write a function flatten(lst) that takes a nested list (e.g. [[1, 2], [3, [4, 5]], 6]) and returns a flat list [1, 2, 3, 4, 5, 6]. Use recursion.

# Exercise 11

Exercise 12 Write a function invert_dict(d) that swaps keys and values of a dictionary. If multiple keys have the same value, collect the keys in a list. For example, {"a": 1, "b": 2, "c": 1} becomes {1: ["a", "c"], 2: ["b"]}.

# Exercise 12

Exercise 13 A sparse vector can be represented as a dict mapping indices to nonzero values, e.g. {0: 3.0, 5: -1.0, 99: 2.0} for a vector that is zero everywhere except at positions 0, 5, 99. Write functions sparse_add(v, w) and sparse_dot(v, w) that compute the sum and inner product of two sparse vectors.

# Exercise 13

4.1 Lists (list)

4.2 Tuples (tuple)

4.3 Sets (set)

4.4 Dictionaries (dict)

4.5 Combining lists (zip, enumerate)