Python for Data Analysis

Download notebook. Download as PDF.

These are the materials which are used for a course on Python for Data Analysis at the University of Freiburg in summer 2026. It does not assume any prior knowledge of Python (although this might help). The goal of the course is some literacy in Python, including basic knowledge such as data types and control flow, but also using libraries and APIs, reading and dealing with data, up to some methods such as PCA and regression.

Basically, the course has three parts:

  1. Python basics: data types, functions, loops, conditions, libraries (Chapters 2–5)
  2. Tools for data analysis: numpy, pandas, matplotlib, scipy.stats (Chapters 6–9)
  3. Project: choose data that interests you and create an analysis (Chapter 10)

Within the first two parts, you will be assigned homeworks, which you can find at the end of each section. For the third part, you will have to find a project by yourself. Chapter 11 collects miscellaneous topics (error messages, type hints, git, etc.) that are useful throughout the course.

0.1 How to get the Studienleistung?

To obtain the Studienleistung for this course, two things are required:

  1. A 45-minute multiple choice exam on June 2nd, in class. The exam covers the Python basics from Chapters 2–5 (simple data types, control flow, container data types, and libraries). The exam is held on paper — no computers (and no books or notes) are allowed. The list of possible questions is available here.

  2. The project. The third part of this course is a small data analysis project that you carry out in groups of 2. The goal is to apply the skills from Chapters 2–9 to a topic of your choice. Use git and either GitHub or the university GitLab to share code within your group (see Section 11.8 for an introduction). The repository should be public (or shared with the instructor) so that the instructors can follow your progress. The timeline is:

    1. Find a project. Choose a dataset and a question you want to answer. The dataset can come from any source — public APIs (e.g. World Bank, Eurostat, OpenStreetMap), CSV files from the web, or data you collect yourself. The question should be concrete enough that you can answer it with several plots, tables, and statistical tests.

    2. Hand in your project idea (by the Pentecost break). Write a short description (one to two pages are enough) containing what data you will use and where it comes from, and what question(s) you want to answer.

    3. Implement and present your project (last two or three sessions). Build a Jupyter notebook that loads, cleans, analyzes, and visualizes your data. Each project will be presented briefly in class. Details on the presentation are given in Chapter 10.

0.2 Preliminaries

Let us discuss some preliminaries.

Your Python environment: We will assume that Python and Jupyter is installed on your system. These course notes are written with Python 3.13, but other versions might work as well. In addition to plain python, we will need some libraries. Which ones we need (including their versions) is written in the file requirements.txt. Some Jupyter notebooks might not work if these are not installed on your system. (To be more precise, they must be installed on the Jupyter server which runs the notebooks.) It is recommended to use a virtual environment for managing your Python packages — see Section 11.6 for details.

General note: We avoid repeating the distinction between the commands python and python3, and will simply write Python in the sequel.

Starting a Jupyter notebook:

There are basically three ways Python commands can be run:

  1. In interactive mode on the command line by typing python3 in your shell,
  2. In Python-scripts, usually ending with myScript.py. You can run these using python3 myScript.py.
  3. Jupyter notebooks have the opportunity to mix usual text with Python commands.

In this course, we will only use option 3. For such Jupyter notebooks, there are at least two ways to run them:

  1. Start the Jupyter server using jupyter lab (hopefully, it is already installed), which opens your browser where you can edit and save what you have done;
  2. Use vscode as you IDE and work from there; see below.

Integrated Development Environment (IDE): When programming, we need to type code. This is possible in a code editor, and best done in a more sophisticated environment such as vscode or pycharm. It is best to activate your virtual environment (see Section 11.6) using source venv/bin/activate before starting your IDE. In order to start vscode, go to the main folder of the course notes, and type code . E.g., vscode can be used for all three ways to write Python commands:

  1. Use Terminal -> New Terminal in order to open a terminal, type python3 and you are in interactive mode,
  2. Write a Python script and run it using the opened terminal,
  3. Click on some Jupyter notebook, and vscode starts the Jupyter server for you.

A first Python cell: Here is the first cell of Python-code:

x = "Hello, "
y = "world!"
print(x+y)
Hello, world!

If you are working within the Jupyter notebook (and not just in the html-file), you can execute this piece of code: Press the play button on the left (or hit Control + Enter within the Python-cell).

0.3 Using AI (ChatGPT, Gemini etc, Copilot, Claude, etc)

During the last years, AI has increasingly improved in code writing. You are encouraged to use this new technology! As usual when learning new things, this new tool can help a lot, and will be able to solve the exercises within the Jupyter notebooks. However, you still have to undestand the code, and what might go wrong during execution. As a first course in Python, you will end up having up to a few hundred lines of code, which has to be coherent. You must know the structure of that code, and in order to extend it, fix bugs etc. If you are ever involved in bigger projects, this is even more important.

For these course notes, I have also used AI (ChatGPT, Gemini, Claude Opus) in order to ask the AI which topics I should cover, and make concrete suggestions for a workflow. (While I have been programming in Python for some years now, my experience with numpy and matplotlib was still limited when I started to write the course, since I have used R for such topics in the past.) Not all suggestions from the AI were great. However, I think that the end result for this course is much better than without the help of AI. I hope this also helps the learning experience. In particular, earlier versions of several chapters (generated with ChatGPT) served as a source of material: its content on numpy, pandas, and scipy.stats was integrated into Chapters 06, 07, 08, and 09 with the help of Claude Code. A detailed log of all AI-assisted edits (including the prompts used) is available in CLAUDE.md.