= 'some text' s
5 Working with Text
Upon completion of this lesson you should:
- …know when and how to convert numerical data to and from strings
- …be able to perform operations to manipulate strings
- …know how to format strings to get desired output
- …be able to use f-strings to insert values into strings
- …know how to write multiline strings
- …know how string comparisons work
re
included in the standard library. Regex is outside the scope of this class, but it’s worth logging it’s existance in the back of your mind for some future use case.
5.1 Creating Strings
We can create a variable containing text in the same way we assign a number:
We can also convert numerical data to string format:
= str(11/7)
number print("Result:", number)
Result: 1.5714285714285714
Note that the evaluation of 11/7
occurs before the string conversion occurs.
In addition to assigning values using hand-written statements, it is also possible to ask the user for some text input
using the following built-in function:
= input() s
This features is rarely used in scientific Python code. However, it is often used in “utility” apps that do things like delete duplicate photos or upload files. Theses types of programs will often ask “Are you sure (y/n)”, which indicates that your two options are “y” and “n” (for yes and no, obviously).
5.2 Manipulating Strings
5.2.1 Strings are Containers
When working with strings it is often helpful to keep in mind that strings are containers, where each character is a “item” in a list.
For instance, subsets of strings can be accessed by index:
= 'this is a string'
s print(s[0])
t
Or:
= 'this is a string'
s print(s[0:4])
this
Or:
= 'this is a string'
s print(s[5:7])
is
It is not possible to write to a string using indexing unfortunately. In this regard they are more like tuples
than lists
(recall that tuples
are immutable).
5.2.2 Operations on Strings
As has been pointe out already, the reason Python insists on everything having a type is so that is knows how to perform operations on them.
The +
symbol is a very useful example of this. When placed between two ints
like 1 + 3
, we know that +
means addition in the mathmatical sense. However, when placed between two strs
like 'taylor' + 'travis'
we also know this means join in the physical sense.
Sure enough:
'taylor' + 'travis'
'taylortravis'
This is called concatenation.
We can use other operators on strings too:
'Nom'*5
'NomNomNomNomNom'
'worries'*0
''
Note that -
and /
don’t work, which also agrees with our understanding of operations you can do to text.
5.2.3 Using the String Object’s Methods
Of Python’s many built-in functions, only a few can be applied to strings in any useful way: sorted
and len()
, for example. Luckily, a str
object carries with it a rather large set of methods which can be used to alter the string such as removing trailing spaces or padding a number with 0’s.
Method | Description |
---|---|
capitalize() |
Converts the first character to upper case |
count() |
Returns the number of times a specified value occurs in a string |
endswith() |
Returns True if the string ends with the specified value |
find() |
Searches the string for a specified value and returns the position of where it was found |
join() |
Converts the elements of an iterable into a string |
lower() |
Converts a string into lower case |
replace() |
Returns a string where a specified value is replaced with a specified value |
split() |
Splits the string at the specified separator, and returns a list |
startswith() |
Returns True if the string starts with the specified value |
upper() |
Converts a string into upper case |
zfill() |
Fills the string with a specified number of 0 values at the beginning |
Several of these methods are especially useful. For instance, the replace()
method can be used to replace any characters which are causing problems.
For instance, perhaps we would to replace the \t
escape sequence with a ,
:
= '123\t456'
s = s.replace('\t', ',')
s print(s)
123,456
Another useful tool is split()
, which will turn a single string into multiple strings by splitting it at the given character:
= 'filename.txt'
s = s.split('.')
s print(s)
['filename', 'txt']
5.2.4 Joining Multiple Strings into a Single String
We have seen that Python will interpret operators like +
differently depending on data type. We can join two strings using:
= 'Hello'
a = 'World'
b = a + ' ' + b
c print(c)
Hello World
It is common to have many strings that need to be “joined”. We could do it the verbose way:
1= ['this', 'is', 'a', 'list', 'of', 'several', 'strings']
strings 2= ''
new_string 3for item in strings:
4= new_string + ' ' + item
new_string print(new_string)
- 1
-
Each word is an item in a
list
- 2
-
We need to initialize an empty
string
so we have something to add to - 3
-
item
will take on each values in the list - 4
-
Here we use the
+
operator to add each item to the new string, so it gets longer by 1 word on each loop
this is a list of several strings
But because it is so common Python has a shortcut for us. The join
method on a string object accepts a list
of strings as follows:
= ['this', 'is', 'a', 'list', 'of', 'several', 'strings']
strings 1= ' '.join(strings)
new_string print(new_string)
- 1
-
We still need to create an empty string to get started. Here use
' '
with a space between the quotes. We then call thejoin()
method of this string, and pass our list ofstrings
. Each word instrings
will be added to the growing string, separated by the space in the original string.
this is a list of several strings
We can convert a string
to a list
, then we can use indexing to change specific characters, then use the join()
method to convert back:
- 1
-
Converting a string to a list puts each character as a separate entry in the list.
- 2
-
We can then change any character as needed.
- 3
-
Converting back to a string is a bit tricky though. We can’t just use
str(s2)
. We must use thejoin()
method to “join” each character in thelist
back into a string.
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g']
this was a string
Another very common situation is to separate a sentence into separate words. We can use the split()
method for this, which returns a list
with each word as a separate item.
= 'this is a sentence'
s 1= s.split(' ')
s2 print(s2)
- 1
- We can use any character or set of characters as the splitting criteria. Here we have used a space which gets us a list of each isolated word.
['this', 'is', 'a', 'sentence']
And we can use the join()
method to put them all back together once we’re done:
- 1
-
We need to catch the result of the
split()
method in a new variable. - 2
-
s2
is a list already, with each word as an item, so we can uselist
methods; in this case we usedinsert()
to insert a new word. - 3
-
We can use the
join()
method to put things back into a sentence.
this is a short sentence
5.2.5 Converting Strings to Other Types
It may seem a bit silly to convert 1.0
to '1.0'
and back, but this is actually extremely common when reading and writing files. We will talk about reading and writing files later in the course, but the main issue is that we simple humans require files that are human readable, while computers prefer other more efficient formats. “Human readable” basically means a file filled with text. For this reason we need to convert all numerical values to str
before writing to a file that is intended to be human readable, and vice versa.
The following code will not work:
= 3
number_of_cookies = "There are " + number_of_cookies + " cookies." message
Why not? Because you cannot add a string and a numerical value. Instead we must do:
= 3
number_of_cookies = "There are " + str(number_of_cookies) + " cookies."
message print(message)
There are 3 cookies.
And of course we can convert in the reverse direction too:
= "4.56893"
input_string = float(input_string)
input_value print("Result:", input_value)
Result: 4.56893
Note that the conversion between text and numerical values requires that the item is eligible for conversion. An error will occur when doing int('a')
for instance.
5.2.6 Escape Characters
One very common task is writing data to a file, and a common subset of this task is writting tabular data, like:
Time | Temperature | Humidity |
---|---|---|
13:31.04 | 29.1 | 55 |
14:29.11 | 28.2 | 56 |
15:30.40 | 28.5 | 54 |
To make our file “human readable” we would insert “tabs” (or spaces, or commas) between values, and “new lines” after each row is complete. Python does not do this for us by default, so we need to insert some “markers” into the text. These are:
\n
for a new line\t
for a tab
The appearance of a \
tells Python that the next character is special. This \
is called an escape character because it tells Python to “escape” the act of reading the text as pure text.
\
in the text? You escape it with \\
!One more source of trouble comes from the use of '
and "
. If you start your string with '
, then Python will end the string when it sees the next '
. If you want to use '
in your text then you have two options:
- You can start your string with
"
, then you can use all the'
you like (and vice versa) - You can escape the
'
with\'
.
Escape Sequence | Effect |
---|---|
\' |
A single quote |
\" |
A double quote |
\\ |
A backslash |
\n |
A new line |
\t |
A tab |
5.2.7 Inserting Values Into Strings
Printing text that contains the value of variables is so common that Python has a shortcut for this. It is called an f-string
.
print(f"The value is {11/7}")
The value is 1.5714285714285714
The above line has several crucial features:
- The string is prefaced with an ‘f’ which tells Python this string is to given special treatment.
- The curly braces inside the f-string tells Python where active code is located. It will evaluate these expressions, and there can be more than one.
- We did not need to convert the results of
11/7
to a string, as Python knows this is already inside a string so does it for us.
The output of numericals value when written to text can often be ugly (i.e. way too many decimal places). Since this is such a common task, Python offers a way to control the conversion.
print(f"The value is {11/7:.2f}")
The value is 1.57
Here we have followed our value (11/7
) with a :
, then .2f
which means “print this value as a float (f
) with 2 decimal places”.
You can also have values written in scientific notation:
print(f"The value is {1111/7:.2e}")
The value is 1.59e+02
Table Table 5.2 gives examples of different types of output and how to get it.
Number | Format | Output | Description |
---|---|---|---|
3.1415926 | {:.2f} | 3.14 | Format float 2 decimal places |
3.1415926 | {:+.2f} | +3.14 | Format float 2 decimal places with sign |
-1 | {:+.2f} | -1.00 | Format float 2 decimal places with sign |
2.71828 | {:.0f} | 3 | Format float with no decimal places |
5 | {:0>2d} | 05 | Pad number with zeros (left padding, width 2) |
5 | {:x<4d} | 5xxx | Pad number with x’s (right padding, width 4) |
1000000 | {:,} | 1,000,000 | Number format with comma separator |
0.25 | {:.2%} | 25.00% | Format percentage |
1000000000 | {:.2e} | 1.00e+09 | Exponent notation |
And “f-strings” work with variables as well!
= 3.14159
pi print(f"The value of pi is {pi}")
The value of pi is 3.14159
And of course we can do operations on variables:
= 3.14159
pi = 1.5
D print(f"The area of a circle with a diameter of {D} is {pi/4*D**2}")
The area of a circle with a diameter of 1.5 is 1.767144375
“f-strings” were added to Python fairly recently and are a huge timesaver. The previous way to insert values into strings was verbose and hard to read:
print("The value is {a}".format(a=11/7))
The value is 1.5714285714285714
Or we could include the decimal place formatting information:
print("The value is {a:.2e}".format(a=11/7))
The value is 1.57e+00
And multiple variables were handled as:
= 3.14159
pi = 1.5
D print("The area of a circle with a diameter of {a} is {b:.2f}".format(a=D, b=pi/4*D**2))
The area of a circle with a diameter of 1.5 is 1.77
The introduction of “f-strings” was obviously very welcome! However, you will still often see code that uses the format()
style, since it still works just fine and people are either:
- used to the old approach so stick with it
- too busy to find the time to update old code
5.3 Multiline Strings
To write long strings which span multiple lines we could write each line as a separate string and join them using +
, as follows:
= ("this \n"
s + "text \n"
+ "spans \n"
+ "multiple \n"
+ "lines \n")
print(s)
this
text
spans
multiple
lines
but this is rather tedious. Instead we can put out text between triple quotes and write as many lines between them as we wish:
= """
s This
text
spans
multiple
lines
"""
print(s)
This
text
spans
multiple
lines
If we inspect s
we’ll find that \n
has been inserted for us: '\nThis\ntext\nspans\nmultiple\nlines\n'
5.4 Comparing Strings
Relational operators discussed in the previous chapter can also be applied to strings. This might seem strange at first, but the effect of relational operators on strings is quite intuitive…they evaluate based on alphabetical ordering.
'Fred' < 'Harry'
True
'Fred' < 'Fanny'
False
This type of ordering is also called lexicographic ordering and depends on case and length of strings,
Uppercase precedes lowercase:
'A' < 'a'
True
Comparisons are evaluated character by character:
'abc' < 'abd'
True
Numbers comparisons are possible. The following is useful for comparing the version number of a software package:
= "3.11.3"
v1 = "3.12.1"
v2 < v2 v1
True
String length also matters:
'abc' < 'abcd'
True
Python also provides functionality to search for a string withing another string using the in
operator:
'abc' in 'abcd'
True
'acb' in 'abcd'
False
The empty string is always a substring of another string:
'' in 'abc'
True
The list of string
methods also contains some tools for comparison. For instance, endswith
and startswith
are useful:
= 'email.address@uwaterloo.ca'
string 1= string.endswith('@uwaterloo.ca')
check 2= 'valid' if check else 'invalid'
result 3print(f"{string} is a {result} UW email address")
- 1
- The nice thing about this method is that we don’t need to worry about the length of the text we’re looking for.
- 2
- Here we have used a ‘one line if-statement’ to create the word ‘valid’ of ‘invalid’ as appropriate
- 3
- Note the use of “f-strings” here to insert both the queried string and the result into a final printout.
email.address@uwaterloo.ca is a valid UW email address
total_time
calculation is a bit naive. We assumed each row was 15 minutes apart in time, but what happens if a row is missing? We should have used a more robust logic to determine how much actual time had elapsed between rows.5.5 Excercises
Copy and paste the following code to a new .py
file in Spyder and work through each cell until you get the desired result.
# %% Problem 1: Combining variables of strings
# Given the following pieces of information about yourself:
= "Bob"
name = 33
age = "takoyaki"
favourite_food
# Output the following sentence:
"My name is Bob, and I'm 33 years old, and my favorite food is takoyaki"
# %% Problem 2: Using str methods
# Given the text of someone's name, like the following:
= 'mr. harry s truman'
name
# Convert it to this format:
"Mr. Harry S Truman"
# %% Problem 3: Using string formatting
# Given the following information:
= 12.85949333
a = 25.04383990
b
# Print the following:
"The values of a and b are 12.859 and 25.044."
# %% Problem 4: Check prefix and suffix
# Give a list of files names like those below, convert them all to 'my_file.py'
= 'my file.py'
a = 'my.file.py'
b = 'my_file.py.py'
c
= ""
a = ""
b = ""
c
print(a)
print(b)
print(c)
# %% Problem 5: String Length and Slicing
# Given the string `s = "Python Programming"`:
# a. Find the length of the string.
# b. Extract and print the substring "Python".
# c. Extract and print the substring "Programming".
#
# Fill in the code below:
= "Python Programming"
s
= # Get length of the string
length = # Slice the first word
substring_python = # Slice the second word
substring_programming
print("Length of the string:", length)
print("First word:", substring_python)
print("Second word:", substring_programming)
# %% Problem 6: Multiline Strings
# Write a function that:
# Given a string (s) and number (n), uses a multiline string to write s in n
# different lines.
#
# Example:
# Q6('I'm so excited for the NE121 Quiz!!!', 4)
# --> I'm so excited for the NE121 Quiz!!!
# I'm so excited for the NE121 Quiz!!!
# I'm so excited for the NE121 Quiz!!!
# I'm so excited for the NE121 Quiz!!!
#
# NOTE: Only 1 print statment and one variable are needed!
#
# Fill in the code below:
def Q6(s, n):
= # Do not use any loops or new lines
multiline_string print(multiline_string)
# %% Problem 7: String Comparisons
# Write a function that given two strings, a and b,
# uses a comparison operator to check if a is lexicographically smaller than b.
# If False, check if a is equal to b
# The function should print the description of the outcome
#
# Examples:
# Q7(a="abc", b="abd")
# --> a is lexicographically smaller? True.
# Q7("abc", "abc")
# --> a is lexicographically smaller? False.
# --> The strings are equal.
# Q7("abd", "abc")
# --> a is lexicographically smaller? False.
# --> The strings are not equal.
#
# It is okay if the print statements are not the exact same, as long as the
# message is clear.
#
# Fill in the code below:
def Q7(a, b):
# No help this time :)
# %% Problem 8: Creating a Table with Strings (Hard)
# Write a function `create_table(headers, data)` that takes two lists:
# a. `headers`: A list of column headers for the table (e.g., ["Name", "Age", "Country"])
# b. `data`: A list of lists containing rows of data (e.g., [["Alice", 25, "USA"], ["Bob", 30, "Canada"]])
#
# HINT: you will need to use a for loop, \t, and \n
#
# The function should:
# a. Print the headers, with each header separated by a tab.
# b. For each row in `data`, print the row's elements separated by tabs.
# c. Ensure that each row and the headers are printed on a new line.
#
# Example:
# If `headers = ["Name", "Age", "Country"]` and `data = [["Alice", 25, "USA"], ["Bob", 30, "Canada"]]`,
# the output should be:
#
# Name Age Country
# Alice 25 USA
# Bob 30 Canada
#
# Fill in the code below:
def create_table(headers, data):
# I believe in you
# -Bardia
print(table)