11  Miscellaneous

We collect here some stuff, which does not really fit anywhere else. We use the material in this section without notice in the whole course.

11.1 Comments

In Python, the key for a comment is #.

# This line is a comment.
print("This line is not a comment.")
This line is not a comment.

11.2 Two equality signs

In Python (and many other programming languages), there are assignments such as x = 0, which says that x should from now on have the value 0, and x == 0. The latter is a condition, which results in True if x is indeed 0, and False otherwise. There is also the is keyword, which checks whether two variables refer to the same object in memory (not just the same value). For example, x is None is the recommended way to check for None, but for comparing values you should always use ==.

11.3 User input

Sometimes, when operating on the command line, one wants to ask for user input. This works as follows:

name = input("What is your name?")
print(f"Hello, {name}!")

However, in the jupyter notebooks we are using in this course, one rather adds input direct to the code blocks.

11.4 Dot-notation

Here is another very useful notation convention, known as dot-notation: For a function str.something(s,...), where s is of type str, you can as well write s.something(). For example, s.strip() is the same as str.strip(s).

11.5 Type hints and assertions (assert, type annotations)

Python does not enforce that you say which type a variable has (in contrast to C, say), but you can annotate them. This is not enforced by Python, but helps readability.

  • x: int = 5: tell Python that x is an int.
  • def f(x: float) -> float:: declare that f takes a float and returns a float.
def square(x: float) -> float:
    """Return x squared."""
    return x * x

print(square(3.0))
# This also works, since Python does not enforce the type hint:
print(square(5))
9.0
25

11.6 Virtual environments and pip (venv, pip)

When working on different Python projects, you may need different versions of the same library. Virtual environments solve this by creating an isolated Python installation per project. The pip tool installs packages into the active environment.

  • python3 -m venv venv: create a virtual environment in the folder venv.
  • source venv/bin/activate: activate the virtual environment (Linux/Mac). On Windows: venv\Scripts\activate. When the environment is active, your terminal prompt will be prefixed with (venv), e.g. (venv) user@host:~$. You can also verify by running which python, which should point to venv/bin/python instead of the system Python.
  • deactivate: leave the virtual environment.
  • pip install package: install a package into the active environment.
  • pip install -r requirements.txt: install all packages listed in requirements.txt.
  • pip freeze: list all installed packages and their versions.
# typical workflow when starting a new project
python3 -m venv venv
source venv/bin/activate
pip install numpy pandas matplotlib scipy
pip freeze > requirements.txt
# typical workflow when cloning an existing project
git clone https://github.com/someone/some_project
cd some_project
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Without a virtual environment, installing packages changes your system-wide Python installation, which can break other projects. It is good practice to always work inside a virtual environment.

11.7 Reading error messages and debugging using print

When your code raises an error, Python prints a traceback — a summary of what went wrong and where. Learning to read tracebacks is an important skills in programming. Here is an example:

Traceback (most recent call last):
  File "example.py", line 5, in <module>
    result = divide(10, 0)
  File "example.py", line 2, in divide
    return a / b
ZeroDivisionError: division by zero

Read a traceback from bottom to top:

  1. The last line tells you the error type and a short description (ZeroDivisionError: division by zero).
  2. The lines above show the call stack: which functions were called, in which file, and at which line number. The most recent call is at the bottom.
  3. The line of code that caused the error is shown directly below each location.

For a list of common error types and how to handle them with try/except, see Section 3.5.

When the error message alone is not enough to find the bug, you need to debug — i.e. inspect what your code is actually doing step by step.

  • print(x): the simplest debugging tool. Print the value (and possibly the type) of a variable at a critical point.
  • print(f"{x = }"): a shorthand (since Python 3.8) that prints both the variable name and its value.
  • type(x): check the type of a variable. Many bugs come from unexpected types. For more complex issues, Python has a built-in interactive debugger (pdb). Placing breakpoint() in your code pauses execution at that point and lets you inspect variables interactively. See the Python debugger documentation for details.

11.8 git and github

git is a version control system: it tracks changes to files over time, so you can go back to earlier versions, collaborate with others, and see who changed what and when. github is a website that hosts git repositories online.

These course notes are organized using git. You need to install git on your system. Here are the most important git commands:

Getting started:

  • git clone url: download a repository from github to your computer.
  • git pull: update your local copy with the latest changes from github.
  • git status: show which files have been changed, added, or deleted since the last commit.
  • git log: show the history of commits (press q to quit).
  • git log --oneline: same, but one line per commit.

Making changes (for your own projects):

  • git init: turn the current folder into a new git repository.
  • git add file: stage a file for the next commit (i.e. mark it to be included).
  • git add .: stage all changed files.
  • git commit -m "message": save the staged changes as a new commit with a description.
  • git diff: show what has changed since the last commit.
  • git diff --staged: show what has been staged for the next commit.

Working with github:

  • git push: upload your commits to github.
  • git remote -v: show which github repository your local copy is connected to.

Undoing things:

  • git checkout -- file: discard local changes to a file (revert to last commit).
  • git reset HEAD file: unstage a file (undo git add).

A typical workflow looks like this:

# 1. make some changes to files
# 2. check what changed
git status
git diff
# 3. stage and commit
git add file1.py file2.py
git commit -m "add data loading function"
# 4. upload to github
git push

The .gitignore file lists files and folders that git should ignore (e.g. venv/, __pycache__/, .ipynb_checkpoints/, data files). This prevents large or private files from being tracked.

Getting the course materials:

git clone https://github.com/pfaffelh/python_for_data
cd python_for_data
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

The course notes will be updated during the semester. To get the latest version, run git pull. There is one catch: if you have changed a course file locally, git pull will refuse to overwrite your changes. The best way to avoid this is to do all your work in the folder myFiles or only change the .ipynb files — these are excluded from updates via .gitignore.