3  Data Types and Data Collections

Learning Outcomes for this Lesson

After this lesson you should:

  • …understand the details of the different data types
  • …know how to convert between data types, when possible
  • …know the main kinds of data collections or containers and how they work
  • …know the details of using the most common collections
  • …be comfortable with the basic concepts of loops to iterate through collections

3.1 Data Types

There are several types of data which are used regularly. We have seen most of these already:

  • integers (int): whole numbers
  • floats (float): numbers with decimal places
  • complex numbers (complex): numbers with real and imaginary parts
  • booleans (bool): True and False values
  • strings (str): letters and words
  • None: an abstract type that is self explanatory

To best understand why Python needs to know the “type” of every piece of data, considering the following:

What does the following produce?:

1 + 2

It obviously produces 3. In this case Python knows that + means addtition.

What does this produce?:

`One` + `Two`

I think is should be obvious that it produces OneTwo. In this case Python knows that + means join.

Both of these intepretations of the + operator are valid. Python needs to know which one, because 1 + 2 = 12 would be a problem!

Let’s also introduce another one of Python’s built-in functions: type(). As you might guess, this function accepts a variable and returns its type.

a = 1
print("Result:", type(a))
Result: <class 'int'>
b = 2.0
print("Result:", type(b))
Result: <class 'float'>

As we will see below, this function can be used to identify the type of anything, not just variables.

3.1.1 Integers, Floats, and Complex Numbers

The three main numerical data types are:

1a = 1
2b = 2.
3c = 1 + 0.4j
1
This is an int because it has no decimal point
2
Adding a decimal point makes it a float whether we add trailing digits or not
3
Python also supports complex numbers, which can be invoked by appending j to a number. Python essentially treats the j like a trailing . in a float. Note that Python uses j instead of the more normal i because i gets used a lot in programming as a counter.

We can convert between different types using Python’s built-in int(), float(), and complex() functions:

float(1)
1.0
int(1.0)
1

The following case is quite instructive. When we convert from float to int, instead of rounding it to the nearest whole number, Python truncates the value and keeps only the integer part. The information contained in the decimals is lost forever.

This behavior is an important source of errors when doing numerical computing so we must be aware of it.
int(142.999)
142

We can also add imaginary parts to real numbers:

complex(1)
(1+0j)

In this case the imaginary part is 0.

Note that j by itself it not enough. Python will let us use j as a variable name. We can write 1j or 0j and a complex number will result:

type(0j)
complex

If we add an int to a complex, it assumes the imaginary component is 0:

1 + (2 + 2j)
(3+2j)

However, we cannot down-convert a complex number to a float or int:

float(2.0 + 3.0j)

Python is willing to drop decimal place when we convert a float to an int, but drop the imaginary part of a complex number is taking things too far! The above will produce an error message because Python is telling us we broke a rule (or tried to at least). We will explore errors and how to handle them later in this course.

3.1.2 Booleans

Boolean data is named after the mathematician George Boole who studied the properties of True/False logic. The concept of True and False is very useful in programming for controlling the flow of a program. As we shall see in the next chapter, a program can excute one statement if a condition is True and a different statement if it is False. In Python they are called bool.

1a = False
print("Result:", type(a))
1
The words True and False must be capitalized for Python to recognize them. false would just be treated like a regular variable name.
Result: <class 'bool'>

A more useful way to use bools is to check the result of a comparison like “is this less than that” or “is this equal to that”:

a = 3 < 6
print("Result:", a)
Result: True
a = 1 == 1.0
print("Result:", a)
Result: True

bools can be mathematically treated such that True = 1 and False = 0. For instance:

2*True
2

and

5.0*False
0.0

Of course we can have two bools operate on each other too:

True*True
1

Interestingly, this turns the result into an int. This is because Python “promotes” the bool to an int when it sees that multiplication is being applied. This is same as the conversion of int to float when doing 1 * 2.0.

And:

True*False
0

3.1.3 Strings

Anything between single or double quotes (i.e. ' ' or " ") is a str. The ability to use either type of quote is very handy if we wish to include quotes within the string itself…just use one style to define the str, and the other style inside the str.

1a = ''
2b = '1.0'
3c = 'To begin, begin'
1
Strings can be any length, including 0. In some languages a single letter is called a char for character, but in Python we only have strings.
2
Strings can be any characters we wish, including numbers. Don’t be fooled, this is still a str. Although, it can be converted to a float using float(b)!
3
Strings can, and often do, contain full sentences, including punctuation, spaces, and so on

Since we are talking about conversion between data types, it’s worth pointing out that we can convert a str to a float if it happens to be the text representation of a number:

a = '1.0'
b = float(a) 
print("Result:", type(b))
Result: <class 'float'>

And we can do the same for int:

a = 2
b = int(a)
print("Result:", type(b))
Result: <class 'int'>

3.1.4 None

The last type to discuss is Python’s None. This is a strange object whose name pretty much explains its purpose. This is used by Python when something is required, but it’s not sure what.

For example, the functions which we wrote in Chapter 2 all returned a value; however, if we omitted the return statement, Python would return None by default. This way, the act of calling a function always returns something.

Example 3.1 (Return None from a function) Write a function that prints a given str but does not explicitly return anything. When calling the function be sure to catch the returned result in a variable, the check its type.

Solution

def print_this(s):
    print(s)

result = print_this('Hello world')
print("Type of result is:", type(result))
Hello world
Type of result is: <class 'NoneType'>

Comments

So even though we didn’t include a return statement in our function, Python still returned something: It returned None.

3.2 Data Collections

So far we have created individual variables with statements like x = 1. But many times we would like to group variables together into a container, and store that container in its own variable. This way we can pass around all the values within the container by just passing the container. This is exactly the same motive that inspires us to put physical objects in boxes and bins to carry them around.

In Python these are called collections, and there are several kinds in Python:

  • Lists, or list
  • Dictionaries, or dict
  • Tuples, or tuple
  • Sets, or set

The relationship between data and collections is illustrated in Figure 3.1. It shows a schematic diagram of different data types and collection types, as well as some examples of how they interact. Specifically, how data can be stored inside collections. Also note that collections can be stored inside collections (inside collections (inside collections (…)))!

Figure 3.1: Schematic diagram illustrating the relationship between data and collections.

Let’s look at each type of collection in more detail.

3.2.1 Lists

A list is in Python is pretty much what is sounds like: multiple items grouped together in a linear list.

Figure 3.2: A typical city street is a linear collection of houses, each with a number which is usually in sequence. A list works in exactly the same way.

Lists are created by grouping values inside square brackets [ ]:

collection = [5, 'a', 2, True, 4.0]
print("Result:", collection)
Result: [5, 'a', 2, True, 4.0]

Python also has a built-in function for creating a list:

collection = list((5, 'a', 2, True))
print("Result:", collection)
Result: [5, 'a', 2, True]

Collections also have a type:

collection = list((5, 'a', 2, True))
print("Result:", type(collection))
Result: <class 'list'>

Each value is located at a numerical location in the list, starting from 0 and ending at 4. Perhaps you can think of these as street addresses, which we are all quite familiar with.

We access the value at a given address using syntax like collection[0].

collection = [5, 'a', 2, True, 4.0]
1print("Result:", collection[0])
1
This fetches the item stored at location 0
Result: 5
Zero-Indexing vs 1-Indexing

Python uses “0-indexing” meaning all address in things like lists and other number-based collections (i.e. numpy arrays). Humans on the otherhand tend to use 1-indexing, so this can be confusing at first.

In 0-indexing, the first item in the list is at location 0, the second item is at location 1, and so on. So instead of thinking about “first” or “second” item, think “item 0 is at location 0”.

We can do a lot of other things with lists too:

Replace items:

collection = [1, 2, 3]
1collection[0] = 23
print("Result:", collection)
1
We can assign to locations using the same syntax we used for fetching
Result: [23, 2, 3]

Join lists:

1collections = [1, 2, 3] + [4, 5, 6]
print("Result:", collections)
1
Note how the addition operator here did something total different than for ints! This is another illustration of why the “type” of a variable matters…because Python will do something different depending on the type.
Result: [1, 2, 3, 4, 5, 6]

Retrieve sub-lists:

collection = [5, 'a', 2, True, 4.0]
1print("Result:", collection[1:3])
1
This is called a slice and returns items 1 through 3, not including 3, in a new list. We will learn “a lot more” about slicing when we talk about numpy arrays later in the course.
Result: ['a', 2]

We have already seen a small section of Python’s built-in functions which work on single values in Table 2.5, but Python also includes other functions meant to operate on lists (and collections in general as we’ll see). The full list is given in Table B.1, but a partial list of functions that can be applied to a collection is given below:

Table 3.1: The built-in functions in Python relevant to collections
Function Description
all() Returns True if all items in an iterable object are True
any() Returns True if any item in an iterable object is True
len() Returns the length of an object
max() Returns the largest item in an iterable
min() Returns the smallest item in an iterable
reversed() Returns a reversed iterator
sorted() Returns a sorted list
sum() Sums the items of an iterator

Here we can see some of these in action:

collection = [4, 3, 1, 0, 2]
collection = sorted(collection)
print("Result:", collection)
Result: [0, 1, 2, 3, 4]
collection = [4, 3, 1, 0, 2]
value = sum(collection)
print("Result:", value)
Result: 10
collection = [4, 3, 1, 0, 2]
length = len(collection)
print("Result:", length)
Result: 5

Note that if we try a function like sum on a list containing incompatible data, Python will complain:

collection = [10, '10']
1value = sum(collection)
1
This will raise an error because Python does not know how to add an int and a str (and neither to I!)
Object Oriented Programming is Awesome, but Confusing at First

We can no longer avoid talking about objects and Object Oriented Programming (OOP).

The word object is meant to conjure up images of physical objects that have properties like colors, shapes, lids and handles. Thinking about piece of code or data as a physical objects actually helps us understand the code better, even though it’s confusing at first.

OOP has several defining features. For our current purposes, as an “end-user”, an object is a piece of data and it carries with it as set of functions that act on that data (i.e. itself). It works like this:

We can use Python’s sorted function:

collection = sorted([2, 3, 1, 0])
print("Result:", collection)
Result: [0, 1, 2, 3]

Or we can use the function, actually called a method, that comes with the list:

collection = [2, 3, 1, 0]
1collection.sort()
print("Result:", collection)
1
Several things happen in this step. Firstly, we access the sort method with a .. This is equivalent to typing math.acos to access the acos method of the math library. It also performs the sort in place meaning that it does not return a sorted list, it sorts itself.
Result: [0, 1, 2, 3]
Everything is an object

One of the most powerful features of Python is that everything is an object. This means that in addition to the traditional data-types described above, we can assign anything to a variable and pass it around our programs.

1def square_it(x):
    return x**2

2f = square_it
3val = f(2)
print("Result:", val)
1
Define a very simple function
2
Here was assign our function to the variable f. f is often said to be a “handle” to the function, refering to the literal act of holding on to something by the handle, which comes from the “object oriented” philosophy.
3
Here we run the function that is stored in f. “Running” something is denoted by the presence of the round brackets ().
Result: 4

The key point is that anything in Python can be assigned to a “variable” and treated equally, so we can pass around data, objects, collections, functions, class definitions, etc.

So in addition to supplying an object to a function, you can also call an object’s method to accomplish the same result.

You can see all the methods an object possesses by typing dir(collection). The methods that start and end with double-underscores __<method>__ are private methods used by Python, while the rest are public. The list has the following public methods:

Table 3.2: List of methods on list objects
Method Description
append Adds a new item to the end itself
clear Deletes all items from itself
copy Returns a copy of itself
count Returns the number of times the given value occurs itself
extend Joins another list to the end of itself
index Retuns the first location of the given value
insert Writes the given value to the given location
pop Removes the item at the given address and returns it
remove Removes the first occurance of the given value
reverse Reverse the order of itself
sort Sorts the values in place

3.2.2 Dictionaries

A dictionary, or dict, behaves nearly the same as a list except items are addressed by name instead of by number.

The most useful way to create a dictionary is to define an empty one, then fill it with items:

1d = dict()
2d['item 1'] = 'foo'
d['item 2'] = 'bar'
3d['item 1'] = 'baz'
print("The dictionary contains:", d)
1
We can create also create an empty dictionary using empty curly braces {}.
2
Once the dictionary is initialized we can add as many items as we wish using square brackets [] and a name, which is typically a string.
3
We can also overwrite items
The dictionary contains: {'item 1': 'baz', 'item 2': 'bar'}

The best part about using dicts is that we can use these descriptive names to store and access values:

d = dict()
d['temperature'] = 298.0
print("The temperature is:", d['temperature'])
The temperature is: 298.0

We can also create a dict on one line by providing the “key-value” pairs inside curly braces {}.

1d = {'a': 3, 'b': 5.0, 'c': 'text', 'd': None}
print(d)
1
Each item in this statement is called a “key: value” pair, since you must supply a name (“key”) for each piece of data (“value”).
{'a': 3, 'b': 5.0, 'c': 'text', 'd': None}
dicts are one of Python’s most useful features

dicts are used everywhere in Python programs. What makes them so useful is that you know what is stored in each location, as long as you give it a meaningful name (key). If you really wanted, you could even include units in the dictionary key, like d['pressure (Pa)'] = 101325. You can’t get much more self explanatory than that! Because they are so useful, they get used a lot in almost all Python projects of any complexity.

In fact, even Python itself uses dicts. All objects use dicts ‘under the hood’. All objects have a __dict__ private attribute that stores all of the object’s information.

You can apply Python’s built-in functions as we did with the list:

d = {'a': 3, 'b': 5.0, 'c': 'text', 'd': None}
length = len(d)
print("Result:", length)
Result: 4

Using the dir() function we can see a list of the methods that a dict has to offer:

Table 3.3: List of methods on dict objects
Method Description
clear Removes all items from itself
copy Returns a copy of itself
fromkeys Adds items to itself from separate lists of keys and value
get Retrieves the value stored under the given key
items Returns a list of the key:value pairs it contains
keys Returns a list of the keys that it has
popitem Returns the value at the given key and removes it from itself
pop Same as pop item but allows default for non-existant key
setdefault Specifies the default value to return for a non-existant key
update Incorporates another dictionary into itself
values Returns a list of the values it contains

Of the list of methods on dicts, the most useful is probably keys(). It returns a list of all the keys that have been created:

d = {}
d['T'] = 298.15
d['P'] = 101325
print("The keys are:", d.keys())
The keys are: dict_keys(['T', 'P'])

Example 3.2 (Using a dict in a function) Write a function that uses the ideal gas law to compute molar density given the needed arguments supplied in a dict.

Solution

def molar_density(vals):
    n_V = vals['P']/(vals['R']*vals['T'])
    return n_V

vals = {'P': 101325, 'T': 298, 'R': 8.314}
n_V = molar_density(vals)
print("The density is", n_V)
The density is 40.896894217403165

Comments

The ability to combine all related variables into a single object, with each variable given its own descriptive name is very powerful. When programs get more complicated, it really helps keep things organized and understandable.

3.2.3 Tuples

A tuple is almost identical to a list with one major difference: tuples are immutable, meaning you cannot change them once it’s created. The reason for this behavior is so that a program can be certain the data in a tuple has not been altered, either by accident or deliberately.

A tuple is created when we combine variables inside round brackets ( ):

tup = (1, 2.0, 'text')
print("Result:", type(tup))
Result: <class 'tuple'>

Python also has a built-in function for creating tuples:

1tup = tuple([1, 2.0, 'text'])
print("Result:", type(tup))
1
Here we have converted a list (defined by the square brackets) into a tuple
Result: <class 'tuple'>

However, Python creates tuples for us automatically every time we group variable together with commas even without the round brackets:

1tup = 1, 2.0, 'text'
print("Result:", type(tup))
1
Python just assumes we want to package these 3 variables together
Result: <class 'tuple'>

We can access the items in the tuple by their numerical location, just like with a list:

tup = 1, 2.0, 'text'
print("Result:", tup[0])
print("Result:", tup[1])
print("Result:", tup[2])
Result: 1
Result: 2.0
Result: text

But we cannot write to any locations. The following will result in a TypeError:

tup = 1, 2.0, 'text'
1tup[0] = 4
1
Item assignment to a tuple is not allowed since they are “immutable”

You can “unpack” a tuple into individual, separate variables using the following convenient syntax:

tup = (1, 2.0, 'text')
a, b, c = tup
print("Result:", a)
print("Result:", b)
print("Result:", c)
Result: 1
Result: 2.0
Result: text
Next level Tuple unpacking

And if you really want to get into the weeds you can partially unpack it:

tup = (1, 2.0, 'text')
1a, *b = tup
print("Result:", a)
print("Result:", b)
1
The *b tells Python to dump all the additional items into a new tuple and store it in b.
Result: 1
Result: [2.0, 'text']

3.2.4 Strings Are Basically Collections

Strings behave very similarly to collections. They obviously have a length:

s = 'a string'
print("Result:", len(s))
Result: 8

They can be indexed for reading data:

s = 'a string'
print(s[0])
a

And they can be “sliced” to extract sub-strings:

s = 'a string'
print(s[0:5])
a str

But they don’t support item assignment, so s[0] = 'A' won’t work. This is similar to tuples which also do not support item assignment, though strings are a not quite as immutable as tuples since you can use their attached methods to perform many alterations to strings.

3.3 Introduction to Loops

3.3.1 For-Loops

In Chapter 2 it was mentioned that computers are crazy fast at calculating things. One of the most common ways to use this ability is to have the computer scan through a list of items and perform some computation on each item. This is done using a “for-loop”.

For-Loops are the OG of programming

For-loops are one of the building blocks of computer programming. Each programming language may have or lack certain features, but they all have for-loops! One could argue that for-loops are the entire point of computer programming.

The for-loop has the following syntax:

for i in [0, 1, 2, 3, 4]:
    print(i)
0
1
2
3
4
Iterating

Scanning through a collection is called iterating in Python. The dictionary definition of “iterate” is “to do (something) over again or repeatedly”.

Objects such as lists and dicts are said to be iterable because they contain multiple items over which iteration can be performed. If you try to iterate over a non-iterable object, like an int, Python will complain.

First let’s unpack the first line:

  • The for-loop syntax in Python is almost equivalent to spoken English. for and in are predefined keywords in Python and they mean exactly what they say.
  • The loop will repeat for each item in the given list. In this case the list is [0, 1, 2, 3, 4] but this could contain anything.
  • The : at the end of this line indicates that the logic of the loop is about to begin.

Now let’s look at the second line:

  • The logic inside the for-loop is indented by 4 spaces according to the whitespace rules discussed here: Important 2.1.
  • The value of i will change on each loop. It will take on each of the value in the given list in order ([0, 1, 2, 3, 4] in this case).

Let’s look at a few more variations on the for-loop. Firstly, we can use i to index into a list:

values = ['for', 'loops', 'are', 'fun!']
for i in range(len(values)):
    print(values[i])
for
loops
are
fun!

Here we have used two built-in functions: range() and len(). len() returns the length of values, which is 4 in this case, so range(4) returns a list of values in the range of 0 to 4 (but not including 4). This means i will take on the values of 0, 1, 2 and 3, which we used to index into values to print each item.

We can make things a bit more concise. Instead of generating indices, let’s just “iterate” over the actual list:

values = ['for', 'loops', 'are', 'fun!']
for item in values:
    print(item)
for
loops
are
fun!

Here we can see that Python assigns each successive word in values to item on each time through the loop.

3.4 Excercises

Copy and paste the following code to a new .py file in Spyder and work through each cell until you get the desired result.


# %% Problem 1: Understanding Data Types
#
# Assign the following values to variables and determine their
# data types using the `type()` function. NOTE: You must
# replace the empty strings.
# a. 10
# b. 3.14
# c. "Python"
# d. True
# e. [1, 2, 3]
# f. 7 < 3
# g. (10 == (15-10)*2)
# h. None
#
# Fill in the code below:

a = 10
b = ''
c = ''
d = ''
e = ''
f = ''
g = ''
h = None

print("Type of a:", type(a))
print("Type of b:", )
print("Type of c:", )
print("Type of d:", )
print("Type of e:", )
print("Type of f:", )
print("Type of g:", )
print("Type of h:", )

# %% Problem 1.1
#
# Convert the following values to different types:
# a. Convert the string "five" to a float.
# b. Convert the string "cheese" to an integer.
#
# NOTE: This will result in an error.
#
# Complete the code below:

a ="five"
b = "cheese"

converted_a = None
converted_b = None

print("Converted a to float:", converted_a)
print("Converted b to integer:", converted_b)

# %% Problem 2: Type Conversion
#
# Convert the following values to different types:
# a. Convert the integer 10 to a float.
# b. Convert the string "123" to an integer.
# c. Convert the float 3.14 to an integer.
# d. Convert the boolean True to an integer.
#
#
# Complete the code below:

a = 10
b = "123"
c = 3.14
d = True

converted_a = ''
converted_b = ''
converted_c = ''
converted_d = ''

print("Converted a to float:", converted_a)
print("Converted b to integer:", converted_b)
print("Converted c to integer:", converted_c)
print("Converted d to integer:", converted_d)

# %% Problem 3: Working with Lists
#
# Create a list of the first 5 prime numbers. Then:
# a. Add the number 13 to the list.
# b. Remove the number 2 from the list.
# c. Replace the number 5 with the number 11.
# d. Reverse the order of the list
# e. Print the final list.
#
# Complete the code below:

primes = []

# Add 13
 # insert code here

# Remove 2
 # insert code here

# Replace 5 with 11
 # insert code here

# Reverse the order of the list
 # insert code here

print("Final list of primes:", primes)


# %% Problem 4: Working with Tuples
#
# Create a tuple containing the names of the four seasons
# (Spring, Summer, Fall, Winter). Then:
# a. Access and print the second season.
# b. Try to change the value of the first season to "Autumn"
#    (observe what happens).
#
# Fill in the code below:

seasons = ''

# Access the second season
second_season = ''

print("Second season:", second_season)


# %% Problem 5: Working with Dictionaries
#
# Create a dictionary with the following key-value pairs:
# a. "apple": 2
# b. "banana": 5
# c. "orange": 3
#
# Then:
# a. Add a new key "grape" with the value 7.
# b. Update the value of "banana" to 10.
# c. Remove the "apple" key from the dictionary.
# d. Print the final dictionary.
#
#
# Fill in the code below:

fruits = {}

# Add a new key
# Insert code here

# Update the value of "banana"
# Insert code here

# Remove the "apple" key
# Insert code here

print("Final dictionary of fruits:", fruits)


# %% Problem 6: Looping Through a List
#
# Given the list of numbers `[1, 2, 3, 4, 5]`, write a loop that:
# a. Prints each number in the list.
# b. Prints the square of each number in the list.
#
#
# Fill in the code below:

numbers = [1, 2, 3, 4, 5]

# Loop to print each number
for num in numbers:


# Loop to print the square of each number
for num in numbers:
    pass  # pass is a placeholder that should be replaced by your code

#%% Problem 7: Looping Through a Dictionary
#
# Given the dictionary `{"a": 1, "b": 2, "c": 3}`, write a loop that:
# a. Prints each key in the dictionary.
# b. Prints each value in the dictionary.
#
#
# Fill in the code below:

dictionary = {"a": 1, "b": 2, "c": 3}

# Loop to print each key
for key in dictionary:
    pass  # pass is a placeholder that should be replaced by your code

# Loop to print each value
for key in dictionary:
    pass  # pass is a placeholder that should be replaced by your code

#%% Problem 8: More For Loops
#
# Write a for loop to print the following pattern:
#
# *
# **
# ***
# ****
# *****
# ...
#
# Ensure this works for various numbers of rows
#
# Fill in the code below:

rows = 5

for i in range():  # this is incomplete

    print()  #  this is incomplete
    # Note: Print automatically moves to the next line every iteration


# %% Problem 9: Combining Collections and Loops
#
# Create a list of dictionaries, where each dictionary represents
# a student with keys "name" and "grade". Then:
# a. Write a loop to print the name and grade of each student.
# b. Write a loop to find the average grade of all students.
#
# Fill in the code below:

students = [
    {"name": "Alice", "grade": 85},
    {"name": "Bob", "grade": 90},
    {"name": "Charlie", "grade": 78}
]

# Loop to print each student's name and grade
for student in students:


# Loop to calculate the average grade
total_grade = 0

for student in students:

average_grade =

print("The average grade is:", average_grade)