7  Writing Pythonic Code

Learning objectives for this lesson

After this lesson you should…

  • know the difference between positional and keyword arguments
  • be experts in the use of slice notation when indexing
  • understand how the splat operation works when unpacking collections
  • be comfortable with the use of the set collection and its special features
  • be able to use list and dict comprehensions in place of simple for-loops, including if-clauses
  • be able to enumerate over a collection, as well as zip multiple collections together
  • do nested for-loops to iterate over multiple directions
  • know how to specify default argument values in a function
  • know how to write a “docstring” for your custom functions
Figure 7.1: Programmers really care about coding looking nice. Some people consider code to be poetry.

When Guido created Python he was partly motivated by the realization that code is read more often than it is written. With this insight in mind he created a syntax that was as close to spoken English as possible. We have seen examples of this with statements like for item in collection. When code is written in a clean, concise, and consistent manner it becomes easy to read your own code as well as someone elses. Writing clean code is called “pythonic code”.

Pythonic code means two things. First is that the formatting of the code follows the standard conventions:

To facilitate nice and consistent code, there is a style guide generally referred to as pep8.

pep: Python Enhancement Proposal

The Python community created a process where new features can be proposed for the language. These proposals are called “peps”, an accronym for “Python Enhancement Proposal”. A list of all proposals is available, along with whether they have been accepted, rejected or are in progress. They are numbered, starting from 001, and we are now up to around 800.

The second aspect of Pythonic code is how the code is written, meaning the method used for certain common tasks uses the features of Python to their fullest. Examples include:

7.1 PEP8, The Python Style Guide

PEP8 was created very early on, and has been a core part of Python ever since. This PEP basically describes how Python code should look. A condensed version of this guide, including examples is given here.

Why is it truly important to format your code?

Readability Formatting your code will help you read your code efficiently. It looks more organized, and when someone looks at your code they’ll get a good impression.
Coding Interviews When you’re in a coding interview, sometime the interviewers will care if you’re formatting your code properly. If you forget to do that formatting you might lose your job prospects, just because of your poorly formatted code.
Team Support Formatting your code becomes more important when you are working in a team. Several people will likely be working on the same software project and code you write must be understood by your teammates. Otherwise it becomes harder to work together.
Easier to spot bugs Badly formatted code can make it really, really hard to spot bugs or even to work on a program. It is also just really horrible to look at. It’s an offense to your eyes.

Here are a few highlights:

  • Indent each new logical block by 4 spaces
  • Use 2 lines between function definitions, but only 1 line or less otherwise
  • Use “snake_case” for all variable and function names (‘snake_case’ instead of ‘camelCase’ or ‘PascalCase’)

Instead of listing off all the rules, the following table demonstrates several (but not all):

Table 7.1: Summary of some coding convention do’s and don’ts.
Yes No
spam(ham[1], {eggs: 2}) spam( ham[ 1 ], { eggs: 2 } )
if x == 4: print x, y; x, y = y, x if x == 4 : print x , y ; x , y = y , x
spam(1) spam (1)
dct['key'] = lst[index] dct ['key'] = lst [index]
i = i + 1 i=i+1
submitted += 1 submitted +=1
x = x*2 - 1 x = x * 2 - 1
hypot2 = x**2 + y*y hypot2 = x ** 2 + y * y
c = (a+b) * (a-b) c = (a + b) * (a - b)
c = (a+b) / (a-b) c = (a + b)/(a - b)
def complex(real, imag=0.0): def complex(real, imag = 0.0):
return magic(r=real, i=imag) return magic(r = real, i = imag)

Since there are so many rules, it takes quite a while to learn them all. Luckily, there is a tool we can use called a linter. In IDE’s such as Spyder and VSCode it is possible to enable linting using a plugin or optional feature. The result looks like this:

Figure 7.2: Screenshot of the Spyder IDE showing orange triangle on each line with a PEP8 violation, and a pop-up describing the specific problem.

7.2 A Deeper Dive into Collections

7.2.1 Unpacking Collections

As we will see below when looking at functions in more detail, Python allows us to unpack values in a collection into separate variables. This is sometimes called the “splat” operator.

a, *b = 1, 2, 3
print('a is:', a)
print('b is:', b)
a is: 1
b is: [2, 3]

We can even catch several individual values, and splat the rest into a single variable.

a, *b, c = 1, 2, 3, 4
print('a is:', a)
print('b is:', b)
print('c is:', c)
a is: 1
b is: [2, 3]
c is: 4

7.3 Slicing

We briefly covered slicing, but it will come up a lot more when we start to talk about numpy, so let’s take a closer look. We can use the : symbol to indicate a range of indices.

It works like this: start:stop:step. The default is 0:None:1, which means start at element 0, stop at the end indicated by None, and take steps of size 1. Since None does “nothing” you can also write 0::1.

Figure 7.3: A visual representation of indexing and slicing. The green squares are what is retrieved with normal indexing. The blue squares indicate the elements that will be retrieved when indexing, but counting backward. Finally, the organge squares indicate the elements that will be retrieved using normal slicing.

Using the defaults gets us the entire list:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1b = a[:]
print(b)
1
The default is to start at 0 and stop at the end, so a single : alone will give the whole list. It is like writing [0:-1]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can change the start and stop values to get a section of the list:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1b = a[2:6]
print(b)
1
Here we start at 2 and stop at 6. It is crucial to note that it stops before 6, not at 6.
[2, 3, 4, 5]

The end point can be specified as distance from the start, or distance from the end by using a negative number:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1b = a[2:-3]
print(b)
1
Here we start at 2 and stop at 3 from the end.
[2, 3, 4, 5, 6]

Changing the step size is also possible if we want every Nth item:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1b = a[2:8:2]
print(b)
1
The third value is the step size, so in this case it retrieves every second item between 2 and 8. The default is 1, so it can be excluded to get the normal behavior.
[2, 4, 6]

And of course the step can be negative if we want to walk through the collection backwards:

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1b = a[6:2:-1]
print(b)
1
The step size can be negative too, which is useful for reversing the order of the items
[6, 5, 4, 3]

Example 7.1 (Using complicated slicing notation) Given a string, retreive every third letter, starting 5 from the end, and stopping at the beginning.

Solution

s = 'The_quick_brown_fox_jumped_over_the_lazy_dog'
s2 = s[-5:0:-3]
print(s2)
ylhroeux_o_i_

Comments

To make sure this worked, let’s do it a different way. Let’s create a list of numerical indices, then use that list to retreive the letters:

l = list(range(len(s)))
l2 = l[-5:0:-3]
print(l2)
[39, 36, 33, 30, 27, 24, 21, 18, 15, 12, 9, 6, 3]

Now we can retreive the letters from s using the indices in l2:

s3 = ''
for i in l2:
    s3 = ''.join((s3, s[i]))
print(s3)
ylhroeux_o_i_

And finally, let’s officially confirm that the s3 is equal to s2:

s2 == s3
True

7.3.1 Sets

The final type of collection is called a set. This word is referring to the mathematical concept of a set. A set behaves similarly to a list (and therefor a tuple) with the defining feature being that it contains no duplicate elements.

A set is defined by putting curly braces around a comma-separated sequence of values:

1s = {1, 2.0, 'text'}
print("Result:", type(s))
1
The use of curly braces is a bit unfortunate since this looks a lot like the definition of a dict. The difference is that the collection only contains values, wheres in a dict it contains key:value pairs. There is actually some logic behind this: a set cannot contain any duplicate items, and similarly a dict cannot contain any duplicate keys. So {'a': 1, 'a': 2} will become {'a': 2}.
Result: <class 'set'>

Python also includes a built-in function for creating sets:

1s = set([1, 2.0, 2.0, 2.0, 'text'])
print("Result:", s)
1
Note that we have included multiple instances of 2.0, yet the resulting set only has one instance.
Result: {1, 2.0, 'text'}

Probably the most useful application of sets is for “Venn-diagram” style logic. For instance, if you have two lists of values, you might want to know what they have in common, or what are their differences. This can be accomplished with sets.

Disclaimer: This is not a political statement for or against a particular candidate as it does not attempt to locate Donald Trump!

A less political Venn diagram that might be more relatable to a room full of engineers!
Figure 7.4: Examples of Venn diagrams
Figure 7.5: Visual representation of operations between 2 sets

Consider the following 2 sets:

a = {'bob', 'dave', 'jane'}
b = {'dave', 'sue', 'taylor'}

We can find what the two sets have in common by finding the intersection:

a = {'bob', 'dave', 'jane'}
b = {'dave', 'sue', 'taylor'}
c = a.intersection(b)
print("Result:", c)
Result: {'dave'}

Or we can find how they differ using difference:

a = {'bob', 'dave', 'jane'}
b = {'dave', 'sue', 'taylor'}
c = a.difference(b)
print("Result:", c)
Result: {'bob', 'jane'}

Table D.1 gives a list of the methods available on a set object which can be used for performing a variety of other “set operations” such those shown in Figure 7.5. An abbreviated list is given below with just the ones associated with “Venn-diagram”-type logic.

Table 7.2: Partial list of methods on a set object, including the shortcut keys which have been defined
Method Shortcut Description
difference() - Returns a set containing the difference between two or more sets
intersection() & Returns a set, that is the intersection of two other sets
issubset() <= Returns whether another set contains this set or not
< Returns whether all items in this set is present in other, specified set(s)
issuperset() >= Returns whether this set contains another set or not
> Returns whether all items in other, specified set(s) is present in this set
symmetric_difference() ^ Returns a set with the symmetric differences of two sets
union() | Return a set containing the union of sets

Example 7.2 (Using Sets) We can write chemical formulae in the following form: ['C', 'H', 'H', 'H', 'H'] which would be methane. Write a function which accepts 2 such lists and returns a list of elements which are in common.

Solution

def common_elements(chem1, chem2):
    chem1 = set(chem1)
    chem2 = set(chem2)
    common = chem1.intersection(chem2)
    common = list(common)
    return common

d = dict()  # Could also use empty curly braces - {}

d['ethane'] = ['C', 'C', 'H', 'H', 'H', 'H', 'H', 'H', ]
d['methanol'] = ['C', 'O', 'H', 'H', 'H', 'H'] 

print(common_elements(chem1=d['ethane'], chem2=d['methanol']))
['H', 'C']

Comments

Here we have used a dictionary to store the chemical formulae, and taken advantage of the ability to use readable words as the keys.

7.4 For-Loops Again

7.4.1 Enumerating while Iterating

We have seen For-Loops which iterate over the items in a collection:

c = ['a', 'b', 'c']
for item in c:
    print(item)
a
b
c

And we have seen For-Loops which using an index into a collection:

c = ['a', 'b', 'c']
for i in range(len(c)):
    print(c[i])
a
b
c

But we can do both!

c = ['a', 'b', 'c']
for i, item in enumerate(c):
    print(i, item)
0 a
1 b
2 c

Having access to the item and to the index where is located is quite useful:

c = [22, 33, 44]
for i, item in enumerate(c):
    c[i] = item/11
print(c)
[2.0, 3.0, 4.0]

7.4.2 Looping over Two Collections Simultaneously

We sometiems have 2 collections which we want to scan together, for example to combine the values into a dict. We can use the zip function as follows:

names = ['a', 'b', 'c']
nums = [1, 2, 3]
d = {}
for key, value in zip(names, nums):
    d[key] = value
print(d)
{'a': 1, 'b': 2, 'c': 3}

7.4.3 Nested For-Loops

We have been mostly looking at linear collections like lists and strings. However, we often have “multidimensional” data, like a “list of lists”. In such cases we need to use one for-loop to scan over the “outer” list, then a second for-loop to scan each “inner” list.

Another way to think about this is a 2D (or ND!) array of data, where one for-loop scans each row, then the second loop scans the columns of each row. Consider the following case of a 5-by-5 matrix:

Example 7.3 (Process each number in a list-of-lists) Compute the square-root each number in the following 2D array:

data = [[1, 2, 3, 4, 5],
        [4, 2, 6, 9, 1],
        [9, 4, 7, 8, 4],
        [1, 6, 2, 8, 2],
        [3, 4, 8, 4, 4]]

Solution

for row in range(len(data)):
    for col in range(len(data[row])):
        data[row][col] = round(data[row][col]**0.5, 5)

Now let’s print our data array, one row at a time:

for row in data:
    print(row)
[1.0, 1.41421, 1.73205, 2.0, 2.23607]
[2.0, 1.41421, 2.44949, 3.0, 1.0]
[3.0, 2.0, 2.64575, 2.82843, 2.0]
[1.0, 2.44949, 1.41421, 2.82843, 1.41421]
[1.73205, 2.0, 2.82843, 2.0, 2.0]

Comments

In languages other than Python, it is common to have 2 or 3 levels of nested for-loops which corresponds to 2D and 3D arrays. Beyond that, the code and the data become quite hard for us simple humans to understand.

In Python, doing for-loops to analyze arrays of data like above is quite rare! The reason is that Python is quite slow at this, for reasons to be discussed in the next lecture when we are introduced to numpy.

7.4.4 List Comprehensions

Python prides itself on being easy to read. Often this means using as few lines as possible so that logic can be read as pseudo-sentence. It therefore offers an alternative way to write for-loops in a single line:

result = [item/4 for item in [-1, 0, 2, 3]]
print("Result:", result)
Result: [-0.25, 0.0, 0.5, 0.75]

Technically this is called a list comprehension. They can be confusing at first, but eventually you may appreciate the conciseness they offer. It often helps readability to condense logic to a single line.

Example 7.4 (Compare for-loop and list comprehension) Given a list of strings, each a single word, capilizize each word using both a for-loop and a list comprehension

Solution

strings = ['firstname', 'lastname']

1new_1 = []
for i in strings:
2    new_1.append(i.capitalize())

3new_2 = [i.capitalize() for i in strings]
1
We must start by defining a new, empty list to which we’ll add the capitalized words
2
Here we do the capitalization and append it to new_
3
This version creates a new list inside the [ ], then we catch it in new_2

Comments

Doing this on a single line is helpful for readability, but only if the for-loop was simple. It is technically possible to do list-comprehensions inside other list comprehensions, which is equivalent to nested for-loops, but these are quite difficult to read/understand, so defeat the purpose.

7.4.5 Other Comprehensions

In addition to comprehensions with lists, we can also do dicts and tuples.

Dictionary comprehensions are helpful for combining two different data sources into a single dictionary. For instance if we have a list of names, and another list of values, we can do:

name = ['a', 'b', 'c', 'd', 'e']
data = [1, 2, 3, 4, 5]
1d = {name[i]: data[i] for i in range(len(data))}
print(d)
1
Note that this looks a lot like defining a dict, like d = {'a': 1, 'b': 2, 'c': 3}.
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
Note

The reason this is helpful is that we can effortlessly look up the value associated with 'c' as:

val = d['c']

If this information were stuck in the list format, we’d have to do an extra step:

i = name.index('c')
val = data[i]
print(val)
3

The dict comprehension thus lets us combine the two lists into a single dict, and makes it simple to retrieve values corresponding to each name.

Since comprehensions are just For-Loops, they also work with zip and enumerate too:

name = ['a', 'b', 'c', 'd', 'e']
data = [1, 2, 3, 4, 5]
d = {k: v for k, v in zip(name, data)}
print(d)
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

7.4.6 Combining If-Statements and Comprehensions

Recall the “One-line If-Statement”:

limit = 5
value = 'a' if 4 < limit else 'b'
print(value)
a

We can combine this if condition in comprehensions too:

vals = [1, 2, 3, 4]
limit = 3
1a = [i for i in vals if i < limit]
print(a)
1
The simple addition of if <comparison> at the end of this line does exactly what it implies linguistically. You can (almost) read this statement and know what it does.
[1, 2]

7.5 Error Catching

In previous weeks we have discussed the differences between types and how Python tries to treat each variable accordingly. However, sometimes this just isn’t possible for Python and an error occurs. So far we have focused our energy on trying to intercept the variables and decide how to treat each one, but Python offers a way to handle this: try and except.

The basic idea is that we try to run some code first, and only if it fails do we do something else.

try:
    c = a + b
except:
    print('a and b are not compatible')

Let’s look at an example:

Example 7.5 (Using try/except) Write a function which accepts any number and converts it to an int. If the number is a float it should be truncated, if the number is complex the real part should be returned.

Solution

def convert_to_int(a):
    try:
        b = int(a)  # We know this will cause a TypeError if a is complex
    except:
        b = int(a.real)  # real is a method on complex numbers which returns the real part
    return b

x = 3.5 + 2.2j
y = convert_to_int(x)
print("The result is:", y)
The result is: 3

Comments

Another benefit of this approach is speed. We no longer need pause our processing to check things…we can just run the try block (which we had to do anyway), and only if there is a problem do we stop to fix it.

Try-Except blocks

There are more “clauses” we can add to a try-except block. We can add an else which get run if there was no exception, and we can add a finally block, which get’s run no matter what:

try:
       # Some code which will usually run
except:
       # Handling of exception if required
else:
       # Execute if no exception
finally:
       # Some code that is always executed

These extra clauses are not very common, but are mentioned so if/when you see them in the wild, you’ll know what you’re seeing.

7.6 Advanced Functions

We have written a lot of functions in this class, but so far have not really extended their capabilities.

7.6.1 Positional vs Keyword Arguments

To start with, let’s identify the difference between position and keyword arguments. Consider the following function:

def add_values(a, b):
    return a + b

We can call it using positional arguments:

c = add_values(2, 2)

Or keyword arguments:

c = add_values(a=2, b=3)

When using keywords we can do this:

1c = add_values(b=3, a=2)
1
Note that the order of the arguments is not the same as the function definition. This works because Python will map a=2 onto a and b=3 onto b.

Lastly, note that positional arguments must come first, and then keyword arguments can follow:

c = add_values(2, b=3)

7.6.2 Default Arguments

In many functions there are some arguments which we don’t usually want to change. An example is something like mode='encrypt'.

This can be done in the function definition as follows:

def add(a, b, mode='normal'):
    if mode == 'normal':
        d = a + b
    elif mode == 'absolute':
        d = abs(a) + abs(b)
    return d   

7.6.3 Unpacking Arguments

If you have a list or tuple of values, such coefficients (e.g. [1, 3, 5]), that you want to pass into a function, you can do the following:

def func(a, b, c):
    print(a, b, c)

vals = [1, 3, 5]
1func(*vals)
1
The * (asterisk) tells python to unpack the values in vals and place them into a, b, and c of the function. Obviously, the length of vals has to match the number of arguments in func.
1 3 5

And taking this to the next level, we can use dicts as keyword arguments, where the key corresponds to the argument name.

def func(a, b, c):
    print(a, b, c)

vals = {'a': 1, 'c':5, 'b': 3}
1func(**vals)
1
The ** (double asterisk) tells python to unpack vals and match up the keys with the arguments names, and pass the corresponding value from the dictionary, so the value in vals['a'] (1) gets passed to a. Obviously, the keys in vals have to match the number and names of the arguments in
1 3 5

This functionality is important because there are many packages where functions are designed to work together. In the next example we’ll jump ahead to scipy.stats to see a use case for this.

Example 7.6 (Using results from one function as arguments to another) Consider that we have a long list of data points, such as average weight of a pumpkin, and we want to fit a normal distribution to them. We will start by generating 1000 (artificial) data points with a mean value of 10 and a deviation of 2:

import numpy as np

data = np.random.normal(size=1000, loc=10, scale=2)

Solution

Now let’s use the functions in the scipy.stats module to fit a normal distribution to our data:

from scipy import stats

params = stats.norm.fit(data)
print(params)
(np.float64(10.049930995187621), np.float64(1.9974659423824541))

Finally, let’s plot a graph of the cumulative distribution curve using the values of params we found above:

import matplotlib.pyplot as plt

1x = np.linspace(0, 20, 100)
2y = stats.norm.cdf(x, *params)
plt.plot(x, y, c='tab:blue')
plt.xlabel('Value')
plt.ylabel('Fraction of values smaller than stated value');
1
Here we generate 100 values between 0 and 20, equally (i.e. linearly) spaced
2
Here ask scipy.stats.norm to generate y values for the given x values using the params we found for this particular dataset.

Comments

7.7 Writing Docstrings

We are now quite familiar with writing our own functions. We also also well aware that some functions can be complicated, and often need to be explained. This is what “docstrings” are for.

A basic (i.e. minimal) docstring is added to a function as follows:

def calc_area_and_perimeter(r):
1    r"""
2    Computes the area and perimeter of a circle

3    Parameters
4    ----------
5    r : scalar
6        The radius of the circle

7    Returns
    -------
    result : tuple of floats
        A tuple containing the area and perimeter, in that order
    """
8    A = 3.14159*(r)**2
    P = 2*3.14159*r
    return (A, P)
1
The docstring will span several lines, so should be started with a triple " (i.e. """). The r tells python this is ’raw` text so ignore any escape characters.
2
This is a simple one-line description of what the function does
3
This is the first section outline all the arguments the function accepts and what they each do or are used for
4
Each section heading needs to be followed by a set of underscores as long as the heading itself
5
Now we list each argument by name (r), and indicate the expected type…though remember that Python will allow any type to be passed to a function, so we often need to deal with the possibility that the user sent an unexpected type.
6
A description of what the function does or is used for. This can be lengthy and include any additional information like units, typical values, etc.
7
This is another section like Parameters, and is formatted the same. This one explains what value(s) is/are returned.
8
And finally here we put the actual code

The full docstring can optionally contain any of the following section, if applicable:

Table 7.3: The defined sections in the official Numpydoc specification
Section Description
Short summary A one-line summary that does not use variable names or the function name
Deprecation warning To warn users that the object is deprecated
Extended Summary A few sentences giving an extended description.
Parameters Description of the function arguments, keywords and their respective types.
Returns Explanation of the returned values and their types.
Yields Explanation of the yielded values and their types.
Receives Explanation of parameters passed to a generator’s .send() method
Other Parameters Used to describe infrequently used parameters.
Raises Details which errors get raised and under what conditions
Warns Details which warnings get raised and under what conditions, formatted similarly to Raises.
Warnings Cautions to the user in free text/reST.
See Also Used to refer to related code.
Notes Provides additional information about the code, possibly including a discussion of the algorithm.
References References cited in the Notes section may be listed here
Examples For examples using the doctest format
Numpydoc vs Google

There are several accepted ways ways to format docstrings, as well described by thist SO Discussion. I strongly suggest you use the numpydoc option since you’ll be dealing with numpy and other packages in the numpy universe which all use it.

A docstring as it appears in the source code

A docstring being converted to a webpage and placed on the project’s documentation site

A docstring being rendered by Spyder and shown in the help panel

A docstring being rendered by VSCode and shown as a popup window
Figure 7.6: Examples of the docstring being rendered by different user interfaces.

7.8 Writing Modules

Figure 7.7: The difference between a “script” and a “module”

Here are some definitions to help

  • A script is a .py file which we create in our preferred IDE to solve a problem.

  • A module is a .py file which we maintain in our collection of files that contains helpful functions which we use repeatedly for many problems.

  • A library (or package) is a collection of several .py files which contain many functions and are categorized for easier access (e.g. numpy.linalg.solve, numpy.random.choice). We can write our own libraries, but usually we will use ones that already exist (e.g. pandas, numpy, scipy, matplotlib, etc.)

Now let’s take a look at this in action. First, let’s create a module which has 2 functions: one for taking the square root of every element in a list, and one for squaring each element.

def square_it(collection):
    for i, val in enumerate(collection):
        collection[i] = val**2
    return collection


def root_it(collection):
    for i, val in enumerate(collection):
        collection[i] = val**0.5
    return collection

The above code should be placed in a new file called something like “myfuncs.py”, as shown below:

Figure 7.8: Screenshot of the myfuncs.py file containing our custom functions

We can then use the functions in this module in a script as follows:

from myfuncs import root_it, square_it

vals = [1, 3, 5, 9]
squares = square_it(vals)

And the script would look like this:

Figure 7.9: Screenshot of myscript.py showing some calculations using the functions in myfuncs.py

7.8.1 Using if __name__ == "__main__" to Test Your Module

It is as important to test our code as it is to write it in the first place. In fact, in large important projects, more time is spend writing tests than writing code.

Like everything, there are many ways we can do tests, from simple (a script) to complicated (unit testing suites that are run by continuous integration services!).

We probably want the tests to exist closely with the code, ideally in the same file. However, we don’t want a bunch of tests to run each time we import the functions. As usual, Python has a solution for us: The if __name__ == "__main__" block.

Let’s update our myfunc.py file:

def square_it(collection):
    for i, val in enumerate(collection):
        collection[i] = val**2
    return collection


def root_it(collection):
    for i, val in enumerate(collection):
        collection[i] = val**0.5
    return collection


1if __name__ == "__main__":
    # The code inside this "if block" is ONLY executed 
    # when you "run" this file. It is NOT run when you 
    # import the functions from this file in a script
2    vals = [1, 2, 3]
3    ans = [1, 4, 9]
4    test = square_it(vals)
5    print(test == ans)

    vals = [1, 4, 9]
    ans = [1, 2, 3]
    test = root_it(vals)
    print(test == ans)
1
This odd looking statement uses some “backend” Python magic. Don’t overthink it, just use it. It means that the code inside the block is only run when the current file is run (i.e. with F5).
2
Here we define some data which we use in our test
3
This step is crucial: here we define the EXPECTED output given the test data
4
Next we run our function on the test data
5
Lastly, we check that our function gave the expected answer. In this case we print True/False, so can visually look at the output and see if anything failed.
True
True
Important

With the above if __name__ == "__main__" block added to our module file we can run (F5) the file each time we make changes to the code to ensure our changes didn’t break anything!

We can add as many tests as we want/need to check all the “edge cases”.

7.8.2 Using “Unit Tests” to Test Your Module

It is not necessary to a “unit test suite” for a small personal project, but it is worth knowing what unit tests are, and roughly how they work.

A “unit test” is a function that tests one single aspect of your code, or a single ‘unit’. A unit test might look as follows:

1def test_square_it():
    vals = [1, 2, 3] 
    ans = [1, 4, 9] 
    test = square_it(vals)
2    assert ans == test
1
We have literally just wrapped our test from above inside a function call. The function name starts with test_, which is the common convention for tests.
2
Here we have introduced a new Python keyword: assert. This is meant for testing purposes, and it will cause an Error if test is not equal to the expected ans. As we saw above, we can catch Errors and deal with them, which is what “testing packages” do. If they detect an error, then collect it, then report them all at the end of testing.

We can then include several of these “unit tests” in a single file (mytests.py) as follows:

Figure 7.10: Screenshot of the mytests.py file which imports our functions from myfuncs and contains several “unit test” functions, indicated by the test_ prefix.
unittest vs pytest

Although Python include a unittest package in its standard libary, there is another package called pytest, that is more widely used. This package is used by everybody and has many, many features. It even supports a plug-in system so that extra functionality can be added. Here is a detailed tutorial on RealPython. And here is a video introduction to pytest specifically.

And we can “run” all these tests using a command-line. For this demo we will use pytest. To have pytest test our package we do the following:

  1. Open our “conda/anaconda” prompt
  2. Navigate to the directory containing our test file(s) (mytests.py)
  3. Type python -m pytest mytests.py and hit enter
  4. Enjoy the warm fuzzy feeling inside when all our tests pass and knowing our code is flawless
Figure 7.11: Screenshot showing the output of from pytest after running all the tests it found in the mytests.py file

Here is what things look like when a test fails:

Figure 7.12: Screenshot of the output from pytest when one or more tests fail

Large projects contain a LOT of tests. My lab group maintains a package called PoreSpy, which is hosted on GitHub. Each time we write new code and “push it” to GitHub, a set of unit tests is automatically run. Below is the output:

Figure 7.13: Screenshot of the output from pytest on PoreSpy. There were 311 tests, and unfortunately quite a few of them failed (as indicated by the “F”s), so this changes made by this code broke quite a few tests. We’ll have to do quite a bit of work to get this code ready for inclusion in PoreSpy.