1 Basics

Author

Ryan M. Moore, PhD

Published

February 4, 2025

Modified

February 10, 2026

Welcome to the first chapter of Applied Python Programming for Life Scientists: From Fundamentals to Algorithmic Thinking and Data-Driven Discovery! In this chapter, we’ll explore the fundamental building blocks of Python programming, including:

Variables and how to use them
Basic data types (numbers, text, and true/false values)
Common operators for calculations and comparisons
Essential built-in functions
How to control program flow with conditional statements

This is a comprehensive chapter that covers a lot of ground, but don’t feel pressured to master everything at once. We’ll be practicing these concepts throughout the course. This is your first exposure to thinking and programming in Python, and you will build up your skills throughout the course.

1.1 Introduction to Python

Let’s start with some high-level information about Python.

What is Python?

Python is a high-level, interpreted programming language known for its approachability and readability. Created by Guido van Rossum in 1991, it has become an incredibly popular languages for data science, scientific computing, and bioinformatics.

High level

Python is a high-level programming language, meaning it handles many complex computational details automatically. For example, rather than managing computer memory directly, Python does this for you. This allows biologists and researchers to focus on solving scientific problems rather than dealing with technical computing details.

Interpreted

Python is an interpreted language, which means you can write code and run it immediately without an extra compilation step. This makes it a great choice for bioinformatics and exploratory data analysis work where you often need to:

Test different approaches to data analysis
Quickly prototype analysis pipelines
Interactively explore datasets

Readable syntax

Python’s code is designed to be readable and clear, often reading almost like English. For example:

if dna_sequence.startswith(start_codon) and dna_sequence.endswith(stop_codon):
    potential_genes.append(dna_sequence)

Even if you’re new to programming, you can probably guess that this code is looking for potential genes by checking a DNA sequence for a start and a stop codon, and if found, adding the sequence to a list of potential genes.

This readability is particularly valuable in research settings where code needs to be shared and reviewed by collaborators.

Use cases

Python is a versatile language that can be used for a wide range of applications, including:

Artificial intelligence and machine learning (e.g., TensorFlow, PyTorch)
Web development (Django, Flask)
Desktop applications (PyQt, Tkinter)
Game development (Pygame)
Automation and scripting

And of course, bioinformatics and scientific computing:

Sequence analysis and processing (Biopython, pysam)
Phylogenetics (ETE Toolkit)
Data visualization (matplotlib, seaborn)
Pipeline automation (snakemake for reproducible workflows)
Microbial ecology and microbiome analysis (QIIME)

Why Python for bioinformatics?

Python has become a widely used tool in bioinformatics for several key reasons:

Rich ecosystem: Extensive libraries specifically for biological data analysis
Active scientific community: Regular updates and support for bioinformatics tools
Integration capabilities: Easily connects with other bioinformatics tools and databases
Data science support: Strong support for data manipulation and statistical analysis
Reproducibility: Excellent tools for creating reproducible research workflows

Whether you’re analyzing sequencing data, building analysis pipelines, or developing new computational methods, Python provides the tools and community support needed for modern biological research.

Good Entry to Other Languages

Python is a flexible language, that supports different programming styles and paradigms. The skills you master while learning Python are highly transferable to other languages. While Python has many specifics and quirks, the high-level concepts will take you a long way regardless of the language.

Summary

To summarize why we are using Python for this course:

It’s approachable
It has widespread usage across academia and industry
Highly used in data science and life science research
Skills gained from learning Python are transferable to other programming languages

1.2 Variables

Variables are like labeled containers for storing data in your program. Just as you might label test tubes in a lab to keep track of different samples, variables let you give meaningful names to your data, whether they’re numbers, text, true/false values, or more complex information.

For example, instead of working with raw values like this:

if 47 > 40:
    print("Temperature too high!")

Temperature too high!

You can use descriptive variables to make your code clearer:

temperature = 42.3
temperature_threshold = 40.0

if temperature > temperature_threshold:
    print("Temperature too high!")

Temperature too high!

In this section, we’ll cover:

Creating and using variables
Understanding basic data types (numbers, text, true/false values)
Following Python’s naming conventions
Converting between different data types
Best practices for using variables in scientific code

By the end, you’ll be able to use variables effectively to write clear, maintainable research code.

Creating variables

In Python, you create a variable by giving a name to a value using the = operator. Here’s a basic example:

sequence_length = 1000
species_name = "Escherichia coli"

You can then use these variables anywhere in your code by referring to their names. Variables can be combined to create new variables:

# Combining text (string) variables
genus = "Escherichia"
species = "coli"
full_name = genus + " " + species
print(full_name)

# Calculations with numeric variables
reads_forward = 1000000
reads_reverse = 950000
total_reads = reads_forward + reads_reverse
print(total_reads)

Escherichia coli
1950000

Notice how the + operator works differently depending on what type of data we’re using:

With text (strings), it joins them together
With numbers, it adds them

You can also use variables in more complex calculations:

gc_count = 2200
total_bases = 5000
gc_content = gc_count / total_bases
print(gc_content)

0.44

The ability to give meaningful names to values makes your code easier to understand and modify. Instead of trying to remember what the number 5000 represents, you can use a clear variable name like total_bases.

Reassigning variables

Python allows you to change what’s stored in a variable after you create it. Let’s see how this works:

read_depth = 100
print(f"Initial read depth: {read_depth}")

read_depth = 47
print(f"Updated read depth: {read_depth}")

Initial read depth: 100
Updated read depth: 47

Note

You will learn more about the f"..." syntax in the section on strings later in this chapter!

This flexibility extends even further. Python lets you change not just the value, but also the type of data a variable holds:

quality_score = 30
quality_score = "High quality"
print(quality_score)

High quality

While this flexibility can be useful, it can also lead to unexpected behavior if you’re not careful. Here’s an example that could cause problems in a sequence analysis pipeline:

# Correctly calculates and prints the total number of sequences.
sequences_per_sample = 1000
sample_count = 3
total_sequences = sequences_per_sample * sample_count
print(f"total sequences: {total_sequences}")

# This one produces an unexpected result!
sequences_per_sample = "1000 sequences "
sample_count = 3
total_sequences = sequences_per_sample * sample_count
print(f"total sequences: {total_sequences}")

total sequences: 3000
total sequences: 1000 sequences 1000 sequences 1000 sequences

In the second case, instead of performing multiplication, Python repeats the string "1000 sequences " 3 times! This is probably not what you wanted in your genomics pipeline!

This kind of type changing can be a common source of bugs, especially when:

Processing input from files or users
Handling missing or invalid data
Converting between different data formats

For this reason, it is often best is to be consistent with your variable types throughout your code, and explicitly convert between types when necessary. We will talk more about this throughout the course.

Augmented assignments

Let’s look at a common pattern when working with variables. Here’s one way to increment a counter:

read_count = 100
read_count = read_count + 50
print(f"Total reads: {read_count}")

Total reads: 150

Python provides a shorter way to write this using augmented assignment operators:

read_count = 100
read_count += 50
print(f"Total reads: {read_count}")

Total reads: 150

These augmented operators combine arithmetic with assignment. Common ones include:

+=: augmented addition (increment)
-=: augmented subtraction (decrement)
*=: augmented multiplication
/=: augmented division

These operators are particularly handy when updating running totals or counters, like when tracking how many sequences pass quality filters. We’ll explore more uses in the subsequent chapters.

Named constants

Sometimes you’ll want to define values that shouldn’t change throughout your program.

GENETIC_CODE_SIZE = 64
print(f"There are {GENETIC_CODE_SIZE} codons in the standard genetic code")

DNA_BASES = ['A', 'T', 'C', 'G']
print(f"The DNA bases are: {DNA_BASES}")

There are 64 codons in the standard genetic code
The DNA bases are: ['A', 'T', 'C', 'G']

In Python, we use ALL_CAPS names as a convention to indicate these values shouldn’t change. However, it’s important to understand that Python doesn’t actually prevent these values from being changed. For example:

MIN_QUALITY_SCORE = 30
print(f"Filtering sequences with quality scores below {MIN_QUALITY_SCORE}")

MIN_QUALITY_SCORE = 20  # We can change it, even though we shouldn't!
print(f"Filtering sequences with quality scores below {MIN_QUALITY_SCORE}")

Filtering sequences with quality scores below 30
Filtering sequences with quality scores below 20

In a way, Python variables are like labels on laboratory samples: you can always move a label from one test tube to another. When you write:

DNA_BASES = ['A', 'T', 'C', 'G']
DNA_BASES = ['A', 'U', 'C', 'G']  # Oops, switched to RNA bases!
print(f"These are now RNA bases: {DNA_BASES}")

These are now RNA bases: ['A', 'U', 'C', 'G']

You’re not modifying the original list of DNA bases. Instead, you’re creating a new list and moving the DNA_BASES label to point to it instead of the old one. The original list isn’t “protected” in any way. So, it’s more of a convention that ALL_CAPS variables be treated as constants in your code, even though Python won’t enforce this rule.

Dangerous assignments

Here’s a common pitfall when naming variables in Python: accidentally overwriting built-in functions.

Python has several built-in functions that are always available, including one called str that converts values to strings. For example:

sequence = str()  # Creates an empty string
sequence

Warning

If you convert this static code block to one that is runnable, and then actually run it, it would cause errors in the rest of the notebook in any place that uses the str function. If you do this, you will need to restart the notebook kernel.

However, Python will let you use these built-in names as variable names (though you shouldn’t!):

str = "ATCGGCTAA"  # Again, don't do this!

Now if you try to use the str function later in your code:

quality_score = 35
sequence_info = str(quality_score)  # This will fail!

You’ll get an error:

TypeError: 'str' object is not callable

This error occurs because we’ve “shadowed” the built-in str function with our own variable. Python now thinks we’re trying to use the string “ATCGGCTAA” as a function, which doesn’t work!

We’ll discuss errors in more detail in Chapter 6. For now, remember to avoid using Python’s built-in names (like str, list, dict, set, len) as variable names. You can find a complete list of built-ins in the Python documentation.

Naming variables

Clear, descriptive variable names are crucial for writing clear, understandable code. When you revisit your analysis scripts months later, good variable names will help you remember what each part of your code does.

Valid names

Python variable names can include:

Letters (A-Z, a-z)
Numbers (0-9, but not as the first character)
Underscores (_)

While Python allows Unicode characters (like Greek letters), it’s usually better to stick with standard characters:

π = 3.14  # Possible, but not recommended
pi = 3.14  # Better!

Case Sensitivity

Python treats uppercase and lowercase letters as different characters:

sequence = "ATCG"
Sequence = "GCTA"
print(f"{sequence} != {Sequence}")

ATCG != GCTA

To avoid confusion, stick with lowercase for most variable names.

Naming Conventions

For multi-word variable names, Python programmers typically use snake_case (lowercase words separated by underscores):

# Good -- snake case
read_length = 150
sequence_count = 1000
is_high_quality = True

# Avoid -- camelCase or PascalCase
readLength = 150
SequenceCount = 1000

Other languages may have different conventions, or sometimes a particular project might have its own conventions. In these cases, it’s probably best to go with the flow of the project you’re working on.

Guidelines for Good Names

Here are some best practices for naming variables in your code:

Use descriptive names that explain the variable’s purpose:

# Clear and descriptive
sequence_length = 1000
quality_threshold = 30

# Too vague
x = 1000
threshold = 30

Use nouns for variables that hold values:

read_count = 500
dna_sequence = "ATCG"

Boolean variables often start with is_, has_, or similar:

is_paired_end = True
has_adapter = False

Collections, which we’ll cover in Chapter 2, often use plural names:

sequences = ["ATCG", "GCTA"]
quality_scores = [30, 35, 40]

Sometimes, it’s actually nicer to use a short name for a variable. Some common exceptions where short names are okay include:

i, j, k for loop indices
x, y, z for coordinates
Standard abbreviations like msg for message, num for number

Another tip is to keep names reasonably short while still being clear:

# Too long
number_of_sequences_passing_quality_filter = 100
# Better
passing_sequences = 100

Remember: your code will be read more often than it’s written, both by others and by your (future) self. Clear, descriptive variable names make your code easier to understand and maintain.

For more detailed naming guidelines, check Python’s PEP 8 Style Guide.

1.3 Data Types

Python has many different types of data it can work with. Each data type has its own special properties and uses.

In this section, we’ll cover the basic data types you’ll use frequently in your code:

Numbers
- Integers (whole numbers, like sequence lengths or read counts)
- Floating-point numbers (decimal numbers, like expression levels or ratios)
Strings (text, like DNA sequences or gene names)
Boolean values (True/False, like whether a sequence passed quality control)

We’ll learn how to:

Identify what type of data you’re working with
Convert between different types when needed

Understanding these fundamental data types is crucial for handling data correctly in your programs.

Checking the type of a value

Python is a dynamically typed language, which means a variable’s type can change during your program. While this flexibility is useful, you will sometimes need to figure out the type of a variable to avoid errors in your analysis.

You can check a variable’s type using Python’s built-in type() function. Here’s how:

sequence_length = 150
print(type(sequence_length))

sequence = "ATCGGCTAA"
print(type(sequence))

is_valid = True
print(type(is_valid))

<class 'int'>
<class 'str'>
<class 'bool'>

As shown above, type() tells us exactly what kind of data we’re working with. This can be helpful when debugging calculations that aren’t working as expected, or verifying data is in the correct format.

Tip

Don’t worry too much about the class keyword in the output, we’ll cover classes in detail later. For now, focus on recognizing the basic types: int for integers, str for strings (text), and bool for True/False values.

Numeric types (int, float)

Python has two main types for handling numbers:

int: Integers (whole numbers) for counting things like:
- Number of sequences
- Read lengths
- Gene counts
float: Floating-point numbers (decimals) for measurements like:
- Expression levels
- P-values
- GC content percentages

For readability with large numbers, you can use underscores: 1_000_000 reads is clearer than 1000000 reads.

Numeric operations

The operators +, -, *, / are used to perform the basic arithmetic operations.

forward_reads = 1000
reverse_reads = 800
print(forward_reads + reverse_reads)
print(forward_reads - reverse_reads)
print(forward_reads * 2)
print((forward_reads + reverse_reads) / 100)

Float division (/) always returns a float, whereas integer division (//) returns an int by performing floor division. Check out the difference between the two kinds of division.

total_bases = 17
reads = 5
print(f"  float division: {total_bases / reads}")
print(f"integer division: {total_bases // reads}")

  float division: 3.4
integer division: 3

The operator ** is used for exponents.

print(2**8)
print(8**2)

256
64

Parentheses () can be used to group expressions and control the order of operations. You can use parenthesis to override the standard order of operations.

# Multiplication before addition
print(2 + 3 * 4)

# This forces addition to happen first
print((2 + 3) * 4)

14
20

Modulo (%) gives remainder of division. Here is an example that uses the position in a gene, and tells if it is the 1st, 2nd, or 3rd base of the codon.

# Position in the gene
position = 17

# Which position in codon? (0, 1, or 2)
codon_position = position % 3

print(codon_position)

Warning

Be careful about combining negative numbers with floor division or modulo. Here are some interesting examples showing how negative numbers behave with floor division and modulo in Python.

Here are some examples using floor division with negative numbers:

# Rounds down to nearest integer
print(17 // 5)
# Rounds down, not toward zero
print(-17 // 5)
print(17 // -5)
print(-17 // -5)

3
-4
-4
3

And the same examples, but this time using %:

print(17 % 5)
# Result is positive with positive divisor
print(-17 % 5)
# Result has same sign as divisor
print(17 % -5)
print(-17 % -5)

2
3
-3
-2

Don’t worry too much about the details of how negative numbers work with division and modulo operations. Just be aware that they can behave unexpectedly, and look up the specific rules if you need them.

Scientific notation

Scientific notation is helpful when working with very large or small numbers:

# 3.2 billion bases
genome_size = 3.2e9

# 0.00000001 mutations per base
mutation_rate = 1e-8

You can use these like any other numbers:

print(1.2e3 == 1200.0)
print(1.2e3 * 10 == 12000.0)

True
True

Precision Considerations

While we won’t get into this too much in this course, it’s good to now a bit about precision of Python’s number types.

Integers

Python can handle arbitrarily large integers, limited only by memory:

big_number = 125670495610435017239401723907559279347192756
print(big_number)

125670495610435017239401723907559279347192756

Not all languages have this; it’s a pretty nice feature!

Floats

Floating-point numbers have limited precision (about 15-17 decimal digits). This can affect calculations:

x = 0.1
y = 0.2

# Do you think this will be `0.3`?
print(x + y)

0.30000000000000004

While these precision errors are usually small, they can accumulate and cause problems depending on the kinds of calculations you’re doing. (One classic example is dealing with money.)

Strings

Strings are how Python handles text data, like sequences or gene names.

# Strings can use single or double quotes
sequence = 'ATCG'
gene_name = "nrdA"
print(sequence)
print(gene_name)

ATCG
nrdA

Strings are immutable: once created, they cannot be modified. For example, you can’t change individual bases in a sequence directly:

dna = "ATCG"
# This would raise an error:
# dna[0] = "G"

Try uncommenting that line (by deleting the # at the start of the line) and see what happens! If you do, you will see an error that looks something like this:

TypeError: 'str' object does not support item assignment

You can combine strings using the + operator:

# String concatenation
sequence_1 = "ATCG"
sequence_2 = "GCTA"
full_sequence = sequence_1 + sequence_2
print("the sequence is: " + full_sequence)

the sequence is: ATCGGCTA

Special characters can be included using escape sequences:

\n for new line
\t for tab
\\ for backslash

This example uses tabs and newlines to format the output in a specific way:

# Formatting sequence output
print("Sequence 1:\tATCG\nSequence 2:\tGCTA")

Sequence 1: ATCG
Sequence 2: GCTA

F-strings (format strings) are particularly useful for creating formatted output. They allow you to embed variables and expressions in strings using the {expression} syntax:

gene_id = "nrdJ"
position = 37_531

print(f"Gene {gene_id} is located at position {position}")

Gene nrdJ is located at position 37531

F-strings can also format numbers, which can be useful for scientific notation and precision control:

# Two decimal places
gc_content = 0.42857142857
print(f"GC content: {gc_content:.2f}")

# Scientific notation
p_value = 0.000000342
print(f"P-value: {p_value:.2e}")

GC content: 0.43
P-value: 3.42e-07

One other neat thing about format strings is that you can automatically include the name of a variable in the output. This is very useful for trying to figure out what is going on in your programs!

gc_content = 0.42857142857

# Print the name of the variable and its value
print(f"{gc_content=}")

# This time, only show two decimals places.
print(f"{gc_content=:.2f}")

gc_content=0.42857142857
gc_content=0.43

Strings can contain Unicode characters:

# Unicode characters
print("你好")
print("こんにちは")

你好
こんにちは

We mentioned above that unicode characters can be used as variable names, but its better to stick with basic ASCII characters for code:

# Possible, but not recommended
α = 0.05
β = 0.20

# Better!
alpha = 0.05
beta = 0.20

Don’t worry too much about Unicode characters for now. Just now that Python supports them.

Common string operations

String operations are useful for processing and manipulating textual data, formatting output, and cleaning up input in your applications and analysis pipelines.

String concatenation with `+`

The + operator joins strings together:

# Joining DNA sequences
sequence1 = "ATCG"
sequence2 = "GCTA"
combined_sequence = sequence1 + sequence2
print(combined_sequence)

# Adding labels to sequences
gene_id = "nrdA"
labeled_sequence = gene_id + ": " + combined_sequence
print(labeled_sequence)

ATCGGCTA
nrdA: ATCGGCTA

String repetition with `*`

The * operator repeats a string a specified number of times:

# Repeating DNA motifs
motif = "AT"
repeat = motif * 3
print(repeat)

# Creating alignment gap markers
gap = "-" * 6
print(gap)

ATATAT
------

String indexing

Python uses zero-based indexing to access individual characters in a string.

s = "Hello, world!"
print(s[0])
print(s[7])

H
w

You can also use negative indices to count from the end:

s = "Hello, world!"
print(s[-1])
print(s[-8])

!
,

Warning

Many life scientists are familiar with the R programming language.

Be aware that R uses one-based indexing, whereas Python uses zero-based indexing. That is, the first character in a string in Python is 0, but in R it is 1.

String slicing

Slicing lets you extract parts of a string using the format [start:end]. The end index is exclusive:

s = "Hello, World!"

print(s[0:5])
print(s[7:])
print(s[:5])

Hello
World!
Hello

Take note of how negative indexing works in the context of string slicing:

s = "Hello, World!"

print(s[-6:])
print(s[-12:-8])

World!
ello

Generally, you will want to avoid writing string slicing with two negative numbers…it get’s a bit tricky to read!

String methods

Python strings have lots of built-in methods for common operations. Here are a few common ones:

# Clean up sequence data with leading/trailing white space
raw_sequence = "  ATCG GCTA  "
clean_sequence = raw_sequence.strip()
print("|" + raw_sequence + "|")
print("|" + clean_sequence + "|")

# Convert between upper and lower case
mixed_sequence = "AtCg"
print(mixed_sequence.upper())
print(mixed_sequence.lower())

# Chaining methods
messy_sequence = "  AtCg  "
clean_upper = messy_sequence.strip().upper()
print("|" + clean_upper + "|")

|  ATCG GCTA  |
|ATCG GCTA|
ATCG
atcg
|ATCG|

(You will see more about method chaining in Chapter 5, for now, just make note of how it looks.)

Boolean values

Boolean values represent binary states (True/False) and are used to make decisions in code:

True represents a condition being met
False represents a condition not being met

Important

Note the capitalization. True and False both start with capital letters!

Boolean variables often use prefixes like is_, has_, or contains_ to more clearly indicate their purpose:

is_paired_end = True
has_adapter = False
contains_start_codon = True

Boolean values are used in control flow, that is, they drive decision-making in your code:

is_high_quality = True
if is_high_quality:
    print("Sequence passes quality check!")

has_ambiguous_bases = False
if has_ambiguous_bases:
    # This won't execute because condition is False
    print("Warning: Sequence contains N's")

Sequence passes quality check!

Tip

Later in this chapter (Section 1.4), we will go into more detail about control flow.

Boolean values are created through comparisons:

# Quality score checking
quality_score = 35
print(quality_score > 30)
print(quality_score < 20)
print(quality_score == 40)
print(quality_score != 35)

True
False
False
False

Logical operators (and, or, not) combine boolean values:

# Logical operations
print(True and False)
print(True or False)
print(not True)
print(not False)

False
True
False
True

For example, you could use logical operators to combine multiple logical statements:

is_long_enough = False
is_high_quality = True
print(is_long_enough and is_high_quality)

is_exempt = True
exceeds_threshold = False
print(is_exempt or exceeds_threshold)

False
True

Comparison Operators In Depth

Comparison operators are used to “compare” values. They return a boolean value (True or False) and are often used in conditional statements and loops to control program flow.

The basic comparison operators are:

==: equal to
!=: not equal to
<: strictly less than
<=: less than or equal to
>: strictly greater than
>=: greater than or equal to

Additional operators we’ll cover later:

is, is not: object identity
in, not in: sequence membership

Here are a couple examples:

# Basic boolean values
is_sunny = True
is_raining = False

print(f"Is it sunny? {is_sunny}")
print(f"Is it raining? {is_raining}")

# Comparison operations produce boolean results
temperature = 25
is_hot = temperature > 30
print(f"Is it hot? {is_hot}")

# Logical operations
is_good_weather = is_sunny and not is_raining
print(f"Is it good weather? {is_good_weather}")

Is it sunny? True
Is it raining? False
Is it hot? False
Is it good weather? True

Here is a list of most of them. Can you guess what the answers will be?

# Comparison operations
print(5 == 5)
print(5 != 5)
print(5 < 3)
print(5 <= 3)
print(5 <= 5)
print(5 > 3)
print(5 >= 3)
print(5 >= 5)

True
False
False
False
True
True
True
True

Chained Comparisons

Here is a pretty neat Python feature: comparisons can be chained together, e.g. 1 < 2 < 3 is equivalent to 1 < 2 and 2 < 3.

# Chained comparisons
print(1 < 2 < 3)
print(1 < 2 < 2)
print(1 < 2 <= 2)

# This one is a bit weird, but it's valid Python!
print(1 < 2 > 2)

True
False
True
False

The comparisons operators can also be used to compare the values of variables.

# Check if value is in valid range
coverage = 30
print(10 < coverage < 50)

quality_score = 35
print(20 < quality_score <= 40)

# Multiple range checks
temperature = 37.2
print(37.0 <= temperature <= 37.5)

True
True
True

Compare that to the same example without the chained comparisons:

# Check if value is in valid range
coverage = 30
print(10 < coverage and coverage < 50)

quality_score = 35
print(20 < quality_score and quality_score <= 40)

# Multiple range checks
temperature = 37.2
print(37.0 <= temperature and temperature <= 37.5)

True
True
True

Tip

Using chained comparisons judiciously can really tidy up your code!

Comparing Strings & Other Values

Python’s comparison operators work with many types of values, not just numbers, allowing comparisons between various types of data. Be careful though: while some comparisons may seem intuitive, others might require careful consideration or custom implementation.

# Comparison of different types
print("Hello" == "Hello")
print("Hello" == "World")
print("Hello" == 5)
print("Hello" == True)

# Some non-numeric types also have a natural ordering.
print("is 'a' < 'b'?", "a" < "b")
print("is 'a' < 'A'?", "a" < "A")

# This is a bit weird, but it's valid Python!
#
# You can look into what's going on here if you're interested,
# but you don't need to worry about it otherwise.
print([1, 2, 3] <= [10, 20, 30])

True
False
False
False
is 'a' < 'b'? True
is 'a' < 'A'? False
True

Logical Operators In Depth

Logical operators give you a way to combine or modify simple yes/no conditions in your code, much like how you might combine criteria when filtering data in Excel.

You can use logical operators to express conditions like:

“If a DNA sequence is both longer than 250 bases AND has no ambiguous bases, include it in the analysis”
“If a gene is either highly expressed OR shows significant differential expression, flag it for further study”
“If a sample is NOT properly labeled, skip it and log a warning”

These operators (and, or, not) work similarly to the way we combine conditions in everyday language. Just as you might say “I’ll go for a run if it’s not raining AND the temperature is above 60°F,” you can write code that makes decisions based on multiple criteria.

Here are a couple of examples:

# In a sequence quality filtering pipeline
#
# Both conditions must be true
if sequence_length >= 250 and quality_score >= 30:
    keep(sequence)

# In a variant calling pipeline
#
# Either condition being true is sufficient
if mutation_frequency > 0.01 or supporting_reads >= 100:
    report(variant)

# In a data validation step
#
# Triggers if the condition is false
if not sample_id.startswith('PROJ_'):
    warn_user(sample_id)

(If you try to run the above code, you will get errors, because we haven’t defined the functions keep, report, and warn_user. It’s just to illustrate the point.)

These operators are sort of like the digital equivalent of the decision-making process you use in the lab: checking multiple criteria before proceeding with an experiment, or having alternative procedures based on different conditions.

Behavior of logical operators

Let’s take a closer look at how Python’s logical operators (and, or, not) work. As I mentioned in the last section, these operators give us ways to check multiple conditions. In plain language:

and: Requires ALL criteria to be met (e.g., both proper staining AND correct cell count)
or: Accepts ANY of several criteria (e.g., either elevated temperature OR positive test result)
not: Reverses a condition (e.g., NOT contaminated)

Here’s a truth table showing all possible combinations.

A	B	A and B	A or B	not A
True	True	True	True	False
True	False	False	True	False
False	True	False	True	True
False	False	False	False	True

Here are the rules:

and only gives True if both conditions are True
or gives True if at least one condition is True
not flips True to False and vice versa

Python can also evaluate non-boolean values (values that aren’t strictly True or False) using these operators. We call values that Python treats as True “truthy” and values it treats as False “falsy”. This can be useful important when working with different types of data in your programs or analysis pipelines.

Understanding “Truthy” and “Falsy” Values

In Python, every value can be interpreted as either “true-like” (truthy) or “false-like” (falsy) when used in logical operations. We do something similar in biology, where we might categorize results as “positive” or “negative” even though the underlying data is more complex than a simple yes/no answer.

“Falsy” values as representing empty, zero, or null states, or, the absence of meaningful data. Python considers the following values as “falsy”:

False: The boolean False value
None: Python’s way of representing “nothing” or “no value” (e.g., like a blank entry in a spreadsheet)
Any form of zero (like 0, 0.0)
Empty containers:
- Empty string ("")
- Empty list ([])
- Empty set (set())
- Empty dictionary ({})

Everything else is considered “truthy”. That is, it represents the presence of some meaningful value or data.

I realize that the above is a bit abstract, so let’s check out a few examples to try and make it more clear. We can use Python’s built-in bool() function to explicitly check whether Python considers a value truthy or falsy.

Having no samples is falsy:

sample_count = 0
# False (no samples)
print(bool(sample_count))

False

Having an empty list of IDs is falsy:

sample_ids = []
# False (empty list of IDs)
print(bool(sample_ids))

False

Having an empty data table (represented as a dictionary), is falsy:

patient_data = {}
# False (empty data table)
print(bool(patient_data))

False

Having some samples is truthy:

sample_count = 5
# True (we have samples)
print(bool(sample_count))

True

Having some sample IDs is truthy:

sample_ids = ["A1", "B2"]
# True (we have some IDs)
print(bool(sample_ids))

True

Having some patient data is truthy:

patient_data = {"age": 45}
# True (we have some data)
print(bool(patient_data))

True

Just to repeat myself, the basic idea is: absence of data is falsy, presence of data is truthy.

Understanding truthy and falsy values becomes particularly useful when writing conditions in your code, like checking whether you have data before proceeding with analysis.

This example is sort of like saying, “If there are some sample IDs, then process them.”

if sample_ids:
    process_samples(sample_ids)
else:
    print("No samples to process")

You will often see code that relies of the truthiness or falsiness of values “in the wild” and as we work through the course.

Using `and` and `or` for Control Flow

One kind of neat thing about the logical operators is that you can directly use them as a type of control flow. (We talk more about control flow in the next section: Section 1.4.)

`and`

Given an expression a and b, the following steps are taken:

First, evaluate a.
If a is “falsy”, then return the value of a.
Otherwise, evaluate b and return its value.

Check it out:

a = "apple"
b = "banana"
result = a and b
print(result)

name = "Maya"
age = 45
result = age >= 18 and f"{name} is an adult"
print(result)

name = "Amira"
age = 15
result = age >= 18 and f"{name} is an adult"
print(result)

banana
Maya is an adult
False

`or`

Given an expression a or b, the following steps are taken:

First, evaluate a.
If a is “truthy”, then return the value of a.
Otherwise, evaluate b and return its value.

Let’s return to the previous example, but this time we will use or instead of and.

a = "apple"
b = "banana"
result = a or b
print(result)

name = "Maya"
age = 45
# Observe that this code isn't really doing what we want it to do.
# `result` will be True, rather than "Maya is an adult".
# That's because it should be using `and`
#   ...again, it's just for illustration.
result = age >= 18 or f"{name} is an adult"
print(result)

name = "Amira"
age = 15
# This code is a bit obscure, and you probably wouldn't
# write it like this in practice.  But it illustrates the
# point.
result = age >= 18 or f"{name} is not an adult"
print(result)

apple
True
Amira is not an adult

Short-Circuiting

The above behavior is known as short-circuiting. You can read a bit more about it in this section of The Python Tutorial.

Tip

Here’s a fun tip. I knew that The Python Tutorial covered short-circuiting somewhere in in, but couldn’t remember which section it was in. So, I used this book’s search app to find it.

Try searching for “short circuit evaluation” and see what you find.

(Turns out is was in the section on Data Structures.)

1.4 Control Flow

Control flow as the decision-making logic in your code. Just as you make decisions in the lab (“if the pH is too high, add buffer”), your code needs to make decisions about how to handle different situations.

In this section, we’ll cover several ways to build these decision points into your code:

Simple if statements (like “if the sequence quality is low, skip it”)
if-else statements (like “if the gene is expressed, mark it as active; otherwise, mark it as inactive”)
if-elif-else chains (for handling multiple possibilities, like taking different actions depending on a range of p-values)
Nested conditions (for more complex decisions – we will see some examples of these too)

Control flow is essential for writing programs that can:

Handle different scenarios
Make decisions based on data
Conditionally process data
Respond to user input

Now, enough waffling, let’s see some code!

`if` Statements

An if statement is essentially a basic yes/no check. Here, we print a message if the quality score is above a certain threshold.

quality_score = 35

if quality_score > 30:
    print("Sample passes QC")

Sample passes QC

`if-else` Statements

Rather than doing nothing if the if condition does not hold, if-else statements handle two alternative outcomes.

expression_level = 1.5

if expression_level > 1.0:
    print("Gene is upregulated")
else:
    print("Gene is not upregulated")

Gene is upregulated

In that example, we wanted to give the user different information depending if the gene was upregulated or not.

`if-elif-else` Chains

These are used for handling multiple possibilities, like categorizing p-values or expression levels:

p_value = 0.03
if p_value < 0.01:
    print("Highly significant")
elif p_value < 0.05:
    print("Significant")
else:
    print("Not significant")

Significant

Multiple Conditions

Sometimes you need to check multiple criteria, like filtering sequencing data:

read_length = 100
gc_content = 0.45
quality_score = 35

if read_length >= 100 and quality_score > 30 and 0.4 <= gc_content <= 0.6:
    print("Read passes all quality filters")
else:
    print("Read filtered out")

Read passes all quality filters

Tip

Pay attention to this example…it may come in handy in your homework assignments!

Nested Conditional Statements

Conditional statements can also be nested. Here is some code that is checking if someone can go to the beach. If they are not at work, and the weather is sunny, then they can go to the beach.

Let’s look at a couple of different ways we might express this. Remember that plain English can be ambiguous, so there may be multiple ways we can interpret the task.

at_work = False
weather = "sunny"

if weather == "sunny" and not at_work:
    print("It's sunny and you are not at work, let's go to the beach!")
else:
    print("We can't go to the beach today for some reason.")

It's sunny and you are not at work, let's go to the beach!

Let’s move the check for at_work nested inside the if statement that checks the weather. Note that this code isn’t equivalent to the previous code, but since we’re in the section about nested conditionals, it will give you a good idea about how to subtly change the meaning of the code!

if weather == "sunny":
    if at_work:
        print("You are at work and can't go to the beach.")
    else:
        print("It's sunny and you are not at work, let's go to the beach!")
else:
    print("It's not sunny, so we can't go to the beach regardless.")

It's sunny and you are not at work, let's go to the beach!

Just so that it is very clear, let’s write code that behaves the same, but un-nested.

if weather == "sunny" and at_work:
    print("You are at work and can't go to the beach.")
elif weather == "sunny":
    print("It's sunny and you are not at work, let's go to the beach!")
else:
    print("It's not sunny, so we can't go to the beach regardless.")

It's sunny and you are not at work, let's go to the beach!

Tip 1.1: Stop & Think

Which of the above two examples do think is more clear and why?

General Control Flow Tips

Control flow provides a way to insert decision points into your code. Here are a few tips to keep in mind:

Conditions are checked in order from top to bottom
Only the first matching condition’s code block will execute
Keep your conditions clear and logical
Try to avoid deeply nested conditions as they can become confusing

A Note on Keeping Things Simple

Just as you want to keep your experimental protocols clear and straightforward, the same principle applies to writing conditional statements in your code. Trying to follow along with a set of deeply nested if-statements is like trying to follow a complicated diagnostic flowchart: the more branches and decision points you add, the easier it is to lose track of where you are.

For example, imagine designing a PCR troubleshooting guide where each problem leads to three more questions, each with their own set of follow-up questions. While technically complete, it would be challenging for anyone to follow correctly. The same goes for code. When we stack too many decisions inside other decisions, we’re setting ourselves up for confusion.

Here’s why keeping conditions simple matters:

Each decision point is an opportunity for something to go wrong
Complex nested conditions are harder to debug
Simple, clear code is easier for you and your colleagues to review and understand

If you find yourself writing deeply nested conditions, it’s often a sign to step back and consider whether there’s a clearer way to structure your code.

1.5 Basic Built-in Functions

Python’s built-in functions are your basic toolkit. They’re always there when you need them, no special setup required. You will use them a ton, and get to know them well!

Here are some of commonly used built-in functions:

print(): Displays your data or results
len(): Counts the length of something
abs(): Gives you the absolute value
round(): Tidies up decimal numbers
min() and max(): Find the lowest and highest values
sum(): Adds up a collection of numbers
type(): Tells you what kind of data you’re working with (helpful for debugging)

Let’s check out some examples!

print()

# Printing experimental results
print("Gene expression analysis complete!")

Gene expression analysis complete!

len()

# Checking sequence length
dna_sequence = "ATCGATCGTAGCTAGCTAG"
length = len(dna_sequence)
print(f"This DNA sequence is {length} base pairs long.")

This DNA sequence is 19 base pairs long.

abs()

# Working with expression fold changes
fold_change = -2.5
absolute_change = abs(fold_change)
print(f"The absolute fold change is {absolute_change}x.")

The absolute fold change is 2.5x.

round()

# Cleaning up p-values
p_value = 0.0000234567
rounded_p = round(p_value, 6)
print(f"p-value = {rounded_p}")

p-value = 2.3e-05

min() and max()

# Analyzing multiple expression values
expression_levels = [10.2, 5.7, 8.9, 12.3, 6.8]
lowest = min(expression_levels)
highest = max(expression_levels)
print(f"Expression range: {lowest} to {highest}")

Expression range: 5.7 to 12.3

sum()

# Calculating average coverage
coverage_values = [15, 22, 18, 20, 17]
average_coverage = sum(coverage_values) / len(coverage_values)
print(f"Average sequencing coverage: {average_coverage}x")

Average sequencing coverage: 18.4x

type()

# Checking data types
gene_name = "nrdA"
data_type = type(gene_name)
print(f"The variable gene_name is of type: {data_type}")

The variable gene_name is of type: <class 'str'>

To use these functions, just type the function name followed by parentheses containing your data (the “arguments”). Some functions, like min() and max(), can handle multiple inputs, which is handy when comparing several values at once.

1.6 Wrap-Up

In this chapter, we covered some fundamentals of Python programming that you’ll use throughout your bioinformatics work:

Variables help you store and manage data with meaningful names
Data types like numbers, strings, and booleans let you work with different kinds of data
Control flow statements help your programs make decisions based on data
Built-in functions provide essential tools for common programming tasks

Remember:

Choose clear, descriptive variable names
Be mindful of data types when performing operations
Keep conditional logic as simple as possible
Make use of Python’s built-in functions for common tasks

These basics form the foundation for more advanced programming concepts we’ll explore in future chapters. Practice working with these fundamentals! They are the tools you will come back to again and again.

Finally, don’t expect everything to click for you at once. Programming is a skill that develops over time with continued practice. Focus on understanding one concept at a time, and remember that you can always refer back to this chapter as a reference.

Next up, we’ll build on these basics to work with more complex data structures and write functions of our own!

1.7 Practice Problems

Take a look at Appendix A for some practice problems. Try working through them. Applying the concepts from this chapter is one of the most effective ways to learn!

1.1 Introduction to Python

What is Python?

High level

Interpreted

Readable syntax

Use cases

Why Python for bioinformatics?

Good Entry to Other Languages

Summary

1.2 Variables

Creating variables

Reassigning variables

Augmented assignments

Named constants

Dangerous assignments

Naming variables

Valid names

Case Sensitivity

Naming Conventions

Guidelines for Good Names

1.3 Data Types

Checking the type of a value

Numeric types (int, float)

Numeric operations

Scientific notation

Precision Considerations

Integers

Floats

Strings

Common string operations

String concatenation with +

String repetition with *

String indexing

String slicing

String methods

Boolean values

Comparison Operators In Depth

Chained Comparisons

Comparing Strings & Other Values

Logical Operators In Depth

Behavior of logical operators

Understanding “Truthy” and “Falsy” Values

Using and and or for Control Flow

and

or

Short-Circuiting

1.4 Control Flow

if Statements

if-else Statements

if-elif-else Chains

Multiple Conditions

Nested Conditional Statements

General Control Flow Tips

A Note on Keeping Things Simple

1.5 Basic Built-in Functions

print()

len()

abs()

round()

min() and max()

sum()

type()

1.6 Wrap-Up

1.7 Practice Problems

String concatenation with `+`

String repetition with `*`

Using `and` and `or` for Control Flow

`and`

`or`

`if` Statements

`if-else` Statements

`if-elif-else` Chains