Data types in Python: String

Data types in Python: String

String manipulation and formatting using Python's standard library methods

·

8 min read

Now that we've reviewed variables, let's explore Python's data types.

Python supports the following data types out of the box:

DescriptionType
Textstr
Numericint, float, complex
Sequencelist, tuple, range
Mapdict
Setset, frozenset
Booleanbool
Binarybytes, bytearray, memoryview

In this article, we will focus on strings (str).

String

Strings are simply a sequence of characters.

You can create a string by including a chain of characters within a single quote (') or a double quote (")

my_string = "This is a string"
my_second_string = 'This is also a string'

Let's explore some methods that Python provides out of the box for strings manipulations and formatting.

This article is a kind of cheatsheet for string manipulations.

I tried to cover the most commonly used methods.

Feel free to only read the bits you are interested in!

Basics

Concatenate

You can concatenate (i.e. 'combine') strings simply by putting them side by side!

string = "Python" " " "is" " " "a nice language"
print(string)

# >> Python is a nice language

Replicate

You can replicate (i.e. 'combine' the same string multiple times) by simply using the multiplier operator

print("hip ..." * 2)
# >> hip ...hip ...
print("Hooray! 🎉")
# >> Hooray! 🎉

Printing strings

Escape character

You can escape special characters by using the prefix \.

For example, to print He said "hi" you would write the following program

say_hi = "He said \"hi\""
print(say_hi)

# >> He said "hi"

You can find a list of special characters here.

Multi-lines string

A special note to the newline character: if you wish to write a string over several lines for formatting reasons, you must escape the newline character.

multiline = "I am on the first line \
            And I am on the first line too"

print(multiline)

# >> I am on the first line And I am on the first line too

wrong_multilines = "This will not work
because Python isn't aware of our new line character."

# >> 🔴 SyntaxError: EOL while scanning string literal (i.e. you must escape the new line character.)

Raw string

By preceding your string with r you create a raw string, which doesn't translate the escape character (/)

say_hi = r"He said \"hi\""  

print(say_hi)

# >> He said \"hi\"

Triple quoted string

By enclosing a string into a triple quote, you automatically escape any quote ('), double quote (") and the new line character.

Because the newline character doesn't have to be escaped, we can print on several lines

triple_quoted_string = '''This string will print quotes ('), double quotes (")
and won't mind newline characters despite the lack of escape characters (\).
This string will be printed on three lines 🎉'''

print(triple_quoted_string)

# >> This string will print quotes ('), double quotes (")
# >> and won't mind new line characters despite the lack of escape characters (\).
# >> This string will be printed on three lines 🎉

Manipulating strings

Indexing

A string is simply a chain of characters. To access a certain character, we can simply use the index of the character. Python is '0 indexing', which means that the first index is 0.

my_string = "INDEXING"

#    Characters    I    N    D    E    X    I    N    G 
#    Index        0    1    2    3    4    5    6    7

print(my_string[3])

# >> E

Slicing

We can use string[start: end] to slice a string and obtain a sub-string. start is inclusive but end is exclusive.

my_string = "Hello Brisbane"

#     H    e    l    l    o        B    r    i    s    b    a    n    e
#    0    1    2    3    4    5    6    7    8    9    10    11    12    13

# Using start and end index

# i.e. give me a string of the characters at index 0,1,2,3 and 4.
print(my_string[0:5])
# >> Hello

# Using only end index

# The start index is considered to be 0
# This is a short hand version of the slice above
print(my_string[:5])
# >> Hello

# Using only the start index

# The end index is considered to be the last character (here, 13)
print(my_string[6:])
# >> Brisbane

# Using negative index, we count from the last characters
print(my_string[-8:])
# >> Brisbane
print(my_string[-14:-9])
# >> Hello

IN and NOT

in and not allow you to check for the existence of a sub-string within a string

my_animals = "🐶, 🐱, 🐭, 🐴"

print("🐶" in my_animals)
# >> True

print ("🐘" in my_animals)
# >> False

print ("🐘" not in my_animals)
# >> True

Case transformation: upper(), lower(), isupper(), islower()

Pretty straightforward, upper() and lower() allow you to change the casing of a string. isupper() and islower() allow you to check the casing of a string.

string = "i am not yelling 📣"  

print(string.upper())  
# >> I AM NOT YELLING 📣

print(string.islower())  
# >> True  

print(string.upper().isupper())  
# >> True

String validation: isX()

We have several methods available to validate strings, in a similar fashion to isupper() or islower():

  • isalpha(): only letters and no blank
  • isalnum(): only letters and numbers, no blank
  • isdecimal(): only numbers, no blank
  • isspace(): only spaces, tabs, newlines characters. No blank.
  • istitle(): only capitalised casing
alpha = "letters"

print(alpha.isalpha())
# >> True

alnum = "42characters"
print(alnum.isalnum())
# >> True

decimal="42"
print(decimal.isdecimal())
# >> True

space = "\t \n"
print(space.isspace())
# >> True

title = "Batman The Movie"
print(title.istitle())
# >> True

startswith() and endswith()

Self-explanatory, allow you to check if string starts or end with a sub-string

string = "starts and ends"

print(string.startswith("starts"))
# >> True

print(string.endswith("ends"))
# >> True

Map to and from lists

We haven't covered the lists yet, but there are two methods to turn a list into a string and vice-versa: join and split

With join, the str provided will be used as the spacer.

# A list of strings
# We will cover list in a future article
list_of_words = ["Hello", "my", "name", "is","Alo"]
list_of_animals = ["🦍", "🐘", "🦁"]

# Spacer is " "
string = " ".join(list_of_words)

print(string)
# >> Hello my name is Alo

# Spacer is ","
string = ", ".join(list_of_animals)

print(string)
# >> 🦍, 🐘, 🦁

With split, the str provided will be the delimiter.

my_string = "A storm is coming, watch out!"

print(my_string.split(","))
# >> ["A storm is coming", "watch out!"]

print(my_string.split(" "))
# >> ["A", "storm", "is", "coming,", "watch", "out!"]

Justify text: center, ljust and rjust

These three methods allow you to justify the text. The first argument defines the length of the new string, including the existing string. An optional second argument can be passed to the function to specify the fill character.

print("left".ljust(20))
# >> "left              "

print("right".rjust(20,"-"))
# >> "---------------right"

print("center", center(10,"="))
# >> "==center=="
# note the rounding to 9 characters in order to keep 'center' centered

Removing white spaces and other characters using lstrip, rstrip and strip

Use the three methods to strip white spaces from strings.

left_padded_string = "        Hello"
print(left_padded_string.lstrip())

# >> Hello

right_padded_string = "Hello         "
print(right_padded_string.rstrip())

# >> Hello

right_and_left_padded_string = "   Hello  "
print(right_padded_string.strip())

# >> Hello

You can also strip any characters by giving strip an argument:

string = "Hellow"  
print(string.strip("w"))

# >> Hello

Formatting strings

str.format

The preferred way to format string is by using str.format, introduced in Python 3.

str.format will replace any curly brackets {} by the arguments passed to format. Note that the position matters when the curly brackets are empty!

You can also give position numbers to the curly brackets to decide the order of the arguments.

name = "Alo"
profession = "developer"

print("Hello 👋 I'm {} and I work as a {}.".format(name, profession))
# >> Hello 👋 I'm Alo and I work as a developer.

print("Hello 👋 I'm {} and I work as a {}.".format(profession, name))
# >> Hello 👋 I'm developer and I work as a Alo. 

print("Hello 👋 I'm {1} and I work as a {0}.".format(profession, name))
# >> Hello 👋 I'm Alo and I work as a developer.

You can also name the arguments to avoid any confusion!

name = "Alo"
profession = "developer"

print("Hello 👋 I'm {fname} and I work as a {fprofession}.".format(fprofession = profession, fname=name))
# >> Hello 👋 I'm developer and I work as a Alo.

Formatted string literals

Python 3.6 introduced a new way to format strings: string literals.

All you have to do is precede your string by f!

word = "cool"
print(f"String literals are very {word}!")

# >> String literals are very cool!

# You can even do basic arithmetic with them

print(f"Two plus two is {2 + 2}.")

# >> Two plus two is 4.

Conclusion ✍️

Phew, that was more than expected wasn't it?

As you can see, Python gives you out of the box a lot of tools to manipulate and format strings.

These are only a few of the tools available and you can find the whole list of methods available for strings in the standard library here.

In my opinion, it's always worth checking the language documentation when learning a new language. As you can see, Python is pretty similar to other languages when it comes to strings - but sometimes, you can save yourself a bit of time by spending a few minutes reviewing the language's methods.

After all, it's provided with your language - you may as well use it!

Next time, we