Modeling Epidemics¶
Due: Friday, October 12th at 4pm
The goal of this assignment is to give you practice with the basics of Python and to get you to think about how to translate a few simple algorithms into code. You will be allowed to work in pairs on some of the later assignments, but you must work alone on this assignment.
Epidemics and contagion are incredibly complex phenomena, involving both biological and social factors. Computer models, though imperfect, can offer insight into disease spread, and can represent infection with varying degrees of complexity.
The SIR epidemic model is simple but is commonly used. In the SIR model, a person can be in one of three states: Susceptible to the disease, Infected with the disease, or Recovered from the disease after infection (the model is named after these three states: S-I-R). In this model, we focus on a network of people, such as a community that could be experiencing an epidemic. Although simple, the SIR model captures that both social factors (like the shape of the network, e.g., how often people in the network interact with each other) and biological factors (like the probability and duration of infection) mediate disease spread.
In this assignment, you will write code to model a simplified version of the SIR epidemic model. Your code will model how infection spreads through a city over time, where time is measured in days. At a high level, your code will iteratively update the disease states in a city, keeping track of the disease states of each person within a city as needed until the end of the simulation. In addition, you will see how to use functions that build on one another to simplify a complex modeling process.
To begin building our model, we must specify the model’s details:
- the health of each person during the simulation, which we will call a person’s disease state,
- the starting state of a community of people or city,
- the neighbors of a given individual in a city,
- the transmission rules for disease spread within the city,
- the rules for acquiring immunity to disease,
- the method for keeping track of timing in a city,
- and the stopping conditions for the model.
We specify each of these details below.
Disease state: all people in the simulation can exist in one of three states, Susceptible, Infected, or Recovered.
- Susceptible: the individual is healthy but may become infected. We will use
'S'
to represent susceptible individuals. - Infected: the individual has an infection currently. We will represent these
individuals as
'I1'
or'I0'
(we expand on the meaning of these two values when we explain the immunity rules). - Recovered: the individual has recovered from an infection and will be
immune to the infection for the rest of the simulation. We represent these
individuals with
'R'
. Note that we will not be removing recovered people from our city.
Cities: a city in this simulation is represented as a list of people, each
represented by a disease state. For example, a city of ['S', 'I1', 'R']
is
composed of three people, the first of whom is susceptible, the second of whom
is infected (and specifically, in their first day of infection), and the third of
whom is recovered.
You can assume that every city has at least three people.
Neighbors: every person in our simplified model has exactly two
neighbors, the person immediately before them in the list (known as their
left neighbor) and the person immediately after them in the list
(known as their right neighbor). The last and first people are also
neighbors. It may be helpful to imagine each city as a ring being
represented by a list to determine neighbors. For example, consider
the following list of people: ['Mark', 'Sarah', 'Lorraine',
'Marshall']
. Sarah has two neighbors: Mark and Lorraine. Likewise,
Mark also has two neighbors: Marshall and Sarah. And, even Marshall
has two neighbors: Lorraine and Mark.
Transmission rules: infection spreads from infected people ('I1'
or 'I0'
) to
susceptible people ('S'
) based on infection rate r, the disease
states of the susceptible person’s neighbors, and the person’s immune level.
- Infection rate r: infection rate r is a value between 0.0 and 1.0 that models how quickly a given infection rate spreads through a city. A high infection rate r indicates that the infection is highly contagious, while a low infection rate r indicates that the infection is not very contagious.
- Neighbors: a susceptible person will only become infected if at least one of their neighbors is infected.
You can think about infection transmission as being similar to flipping a weighted coin. If a susceptible person has at least one infected neighbor, we flip a coin to determine the person’s immune level. This value and the infection rate will be used to determine whether the susceptible person will get infected as well. It does not matter which neighbor (the left or right neighbor) is infected. Note that, in general, the coin will not be fair (unless r is 0.5). For example, an infection rate of 1.0 can be thought of as a coin that always lands on one side.
Immunity rules: For simplicity, we will assume that an infected
person remains infected for exactly two days (in a more complex
model, we could make this value a parameter of our model, to investigate how
an epidemic progresses with shorter or longer infection times). When
infected, an individual will be in the 'I1'
state; the next day
they will be in the 'I0'
state; the day after that, they will
recover from the infection and go into the 'R'
state. At that
point, they are immune to the disease and cannot become re-infected.
Stopping conditions: the simulation should stop after a given
number of days or when simulating an additional day yields a city
with only people who are susceptible ('S'
) or recovered
('R'
), at which point the infection can no longer spread.
Getting started¶
In the first lab, you learned the basics of how to use git and our git server. We will use git for all the programming assignments and labs in this course.
We have seeded your repository with a directory for this assignment.
To pick it up, change to your capp30121-aut-18-username
directory
(where the string username
should be replaced with your username)
and then run the command git pull upstream master
. You should
also run the command git pull
to make sure your local copy of your
repository is in sync with the server.
At the first lab, you already ran this command, and it pulled the
pa1
sub-directory into your capp30121-aut-18-username
directory. However, it is
good practice to always run git pull upstream master
before you
start working, since we may occasionally update files (e.g., if we
notice bugs in our code, etc.). For example, some of the files for
this assignment may have changed since you downloaded the initial
distribution. After you’ve run git pull upstream master
, you can
proceed as described in the lab: work on your code and then run git
add <filename>
for each file you change, followed by git commit
-m"some message"
and git push
to upload your changes to the
server before you log out.
You should always upload the latest version of your work to the server
using the commands described above before you log out, then run git
pull
and git pull upstream master
before you resume work to
retrieve the latest files. This discipline will guarantee that you
always have the latest version, no matter which machine you are using.
Also, it will be easier for us to help you recover from git and
chisubmit problems if you consistently push updates to the server.
As you will see below, we strongly encourage you to experiment
with library functions and try out your own functions by hand in
ipython3
. Let’s get you set up to do that before we describe your
tasks for this assignment. Open up a new terminal window and navigate to
your pa1
directory. Then, fire up ipython3
from the
Linux command-line, set up autoreload, and import your code
as follows:
$ ipython3
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import sir
In [4]: import util
In [5]: import random
(Note: In [<number>]
represents the ipython3
prompt. Your
prompts may look different. Do not type the prompt when issuing
commands.)
The commands %load_ext autoreload
and %autoreload 2
tell
ipython3
to reload your code automatically whenever it changes.
We encourage you to use this package whenever you are developing and
testing code.
Getting help¶
If, after carefully reading the details of any part of the assignment, you are still confused about how to get started or make progress:
- post a question on Piazza to ask for help, or
- come to office hours
Before you post a question on Piazza, please check to see if someone else has already asked the same question. We especially encourage you to check the “Must read posts for PA #1” post, which we will update over time to be a compendium of important questions and answers. Also, please read the pinned post on “Asking effective questions.” Finally, please add, commit, and push the most recent version of your code to the server (as described above) before you ask your question. Syncing your code will allow us to look at it, which may speed up the process of helping you.
Style¶
Following a consistent style is important because it makes it easier for others to read your code; imagine if you were collaborating on a large software project with 30 other developers, and everyone used their own style of writing code!
To help you understand what constitutes good style, we have put together a style guide for the course: Python Style Guide for Computer Science with Applications.. We expect you to use good style (that is, style that matches this guide), and will take this expectation into account when grading
For this assignment, you may assume that the input passed to your functions has the correct format. You may not change any of the input that is passed to your functions. In general, it is bad style to modify a data structure passed as input to a function, unless that is the explicit purpose of the function. Your function’s client might have other uses for the data they pass to your function and should not be surprised by unexpected changes.
Your tasks¶
For this assignment, we will specify a set of functions that you must implement. You will start with basic functions and work your way up to more complex tasks. We will also supply extensive test code. Over the course of the term, we will provide less and less guidance on the appropriate structure for your code.
Task 1: Count the number of infected people in a city¶
In Python, it is common to write helper functions that encapsulate key
definitions and are only a few lines long. Your first task is to
complete one such function: count_infected
.
This function should take a city as input and return the number of infected
people in that city. For example, given city ['I0', 'I0', 'I1', 'S']
, the
function would return 3
(notice how we have to account for the fact
that there are two infected states: 'I1'
and 'I0'
). On the other
hand, given city ['S', 'R', 'S', 'S']
, the function would return
0
.
Testing for Task 1
We have provided an extensive suite of automated tests for this
assignment. You may be tempted to do the following: write some code,
run the automated tests to find a test that fails, modify your code,
and then repeat the process until all of the tests pass. This is a very
bad way to debug your code, because it typically takes much longer
than taking a methodical step-by-step approach and often yields messy
code that passes the tests without actually matching the specification
of the problem. Instead, you should try your code out on some of the
test cases by hand in ipython3
to get a sense of how your code works
before you try the automated tests.
Here, for example, are some sample calls to
count_infected
:
In [6]: sir.count_infected(['I0', 'I0', 'I1', 'S'])
Out[6]: 3
In [7]: sir.count_infected(['S', 'R', 'S', 'S'])
Out[7]: 0
If you get the wrong answer for some sample input, stop to reason why
your code is behaving the way it is and think about how to modify it to
get the correct result. If you still can’t determine the problem after
reasoning about the code, use print
statements to
print out key values.
Now on to the automated tests. The file test_sir.py
contains
automated test code for the tasks in this assignment. The names of
the tests for a given function share a common pattern: a prefix that
includes the word test_
and the function name followed by the test
number. For example, the second test for
count_infected
is named
test_count_infected_2
.
For count_infected
, we have provided four tests:
test_count_infected_1
: Cities with no infected people.test_count_infected_2
: Cities with some'I1'
(but no'I0'
) infected people.test_count_infected_3
: Cities with some'I0'
(but no'I1'
) infected people.test_count_infected_4
: Cities with both'I1'
and'I0'
infected people.
Note that each test actually checks multiple cities but, within a given test,
they are all of the same type (e.g., in Test 2, all the cities have at least
one I1
person, and no I0
persons).
The reason we test for these four cases is to ensure that our tests have sufficient coverage, meaning that they account for as different many cases as possible in our code. For example, we could be tempted to write tests just for the following two cities:
['S', 'I0', 'I0', 'S']
['S', 'R', 'S', 'S']
However, what if we wrote a solution that forgot to account for the I1
state?
The above tests would not cover that case.
We will be using the pytest Python testing framework for this and
subsequent assignments. To run our automated tests, you will use the
py.test
command from the Linux command line (not from within
ipython3
). We recommend opening a new terminal window for running
this command, which will allow you to go back and forth easily between
testing code by hand in ipython3
and running the test suite using
py.test
. (When we work on assignments, we usually have three
windows open: one for editing, one for experimenting in ipython3
,
and one for running the automated tests.)
Pytest, which is available on both the lab machines and your VM, has many
options. We’ll use three of them: -v
, which means run in
verbose mode, -x
, which means that pytest should stop running
tests after a single test failure, and -k
, which allows you to
describe a subset of the test functions to run. You can see the rest
of the options by running the command py.test -h
.
For example, running the following command from the Linux command-line:
$ py.test -v -x -k test_count_infected test_sir.py
will run the functions in test_sir.py
that have names that start
with test_count_infected
until they’ve all been run or one fails.
(Recall that the $
represents the prompt and is not included in
the command.)
Here is (slightly-modified) output from using this command to test our
reference implementation of test_count_infected
:
$ py.test -v -x -k test_count_infected test_sir.py
collected 47 items / 43 deselected
test_sir.py::test_count_infected_1 PASSED
test_sir.py::test_count_infected_2 PASSED
test_sir.py::test_count_infected_3 PASSED
test_sir.py::test_count_infected_4 PASSED
==== 4 passed, 43 deselected in 0.05 seconds ====
This output shows that our code passed all four tests in the
test_count_infected
suite. It also shows that
there were 43 tests that were deselected (that is, were not run)
because they did not match the test selection criteria specified by
the argument to -k
.
If you fail a test, pytest will tell you the name of the test function
that failed and the line in the test code at which the failure was
detected. This information can help you determine what is
wrong with your program. Read it carefully to understand the test
inputs and why the test failed! Then, switch back to testing your
function in ipython3
until you have fixed the problem.
For example, if you wrote a solution that did not account for the I0
state, you would pass Tests 1 and 2, but would fail Test 3 like this:
test_sir.py::test_count_infected_1 PASSED
test_sir.py::test_count_infected_2 PASSED
test_sir.py::test_count_infected_3 FAILED
generated json report: ...
====================================== FAILURES ======================================
_______________________________ test_count_infected_3 ________________________________
def test_count_infected_3():
'''
Cities with some I0 (but no I1) infected people.
'''
> helper_count_infected(["I0", "S", "S", "S"], 1)
test_sir.py:129:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
city = ['I0', 'S', 'S', 'S'], expected = 1
def helper_count_infected(city, expected):
n = sir.count_infected(city)
if n != expected:
s = "City {} has {} infected people, but count_infected returned {}"
> pytest.fail(s.format(city, expected, n))
E Failed: City ['I0', 'S', 'S', 'S'] has 1 infected people, but count_infected returned 0
test_sir.py:89: Failed
!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!
================================ 43 tests deselected =================================
================= 1 failed, 2 passed, 43 deselected in 0.09 seconds ==================
The amount of output can be a bit overwhelming, but you should focus on two things:
Towards the end of the output, there is a line starting with
E
. This will usually contain a helpful message telling you why the test failed:Failed: City ['I0', 'S', 'S', 'S'] has 1 infected people, but count_infected returned 0
This information can help us narrow down the issue with our code. In particular, this error message suggests that we forgot to count the
'I0'
people in the city.While the above error message is probably enough for us to fix the issue, sometimes we may need to dig a bit deeper by looking at the exact test that fails. pytest helpfully tells us that the test that failed is
test_count_infected_3
, and even includes part of the code for that test, so you don’t have to look it up intest_sir.py
. The line that starts with a greater-than symbol (>
) tells you the line in the test that failed:helper_count_infected(["I0", "S", "S", "S"], 1)
helper_count_infected
is a function intest_sir.py
that callscount_infected
with a given city, and checks whether the return value matches the expected value. You will see this use of helper functions frequently in our tests.
Take into account that, because we specified the -x
option, pytest
exited as soon as Test 3 failed (without running Test 4). Omitting the
-x
option makes sense when you want to get a sense of what tests
are passing and which ones aren’t; however, when debugging your code,
you should always use the -x
option so that you can focus on one
error at a time.
Finally, pytest will run any function that starts with test_
. You
can limit the tests that get run using the -k
option along with a
string that identifies the desired tests. The string is not required
to be a prefix. For example, if you specify -k count
, pytest will
run all test functions that start with test_
and include the word
count
.
Also, by default, if you do not supply the name of a specific test
file, pytest will look in the current directory tree for Python files
that have names that start with test_
.
In subsequent examples, we will leave out the name of the file with
the test code (test_sir.py
) and we will use short substrings to
describe the desired tests. For example, the test above could have
been run with the following command:
$ py.test -v -x -k count
Debugging suggestions and hints for Task 1
Remember to save any changes you make to your code in your editor as
you are debugging. Skipping this step is a common error.
Fortunately, we’ve eliminated another common error – forgetting to
reload code after it changes – by using the autoreload
package.
(If you skipped the Getting started section, please go back and
follow the instructions to set up autoreload
and import
random
, sir
, etc.)
Task 2: Is one of our neighbors infected?¶
Next, you will write a function called has_an_infected_neighbor
that
will determine whether a susceptible person at a given position in a
list has at least one neighbor who is infected.
More specifically, given the city and the person’s position, your code will compute the positions of the specified person’s left and right neighbors in the city and determine whether either one is in an infected state.
When you look at the code, you will see that we included the following line:
assert city[position] == "S"
to verify that the function has been called on a person who is susceptible to infection. In general, assertions have the following form:
assert <boolean expression>
Assertions are a useful way to check that your code is receiving
valid inputs: if the boolean expression specified as the assertion’s
condition evaluates to False
, the assertion will make the function
fail. Simple assertions can greatly simplify the debugging
process by highlighting cases where a function is being called
incorrectly.
Testing for Task 2
As in the previous task, we suggest you start by trying out your code
in ipython3
before you run the automated tests. Here, for example,
are some sample calls to has_an_infected_neighbor
:
In [8]: sir.has_an_infected_neighbor(['I1', 'S', 'S'], 1)
Out[8]: True
In [9]: sir.has_an_infected_neighbor(['S', 'I1', 'IO'], 0)
Out[9]: True
In [10]: sir.has_an_infected_neighbor(['S', 'S', 'S'], 2)
Out[10]: False
In the first sample call, we want to check whether the susceptible
person in position 1 has an infected neighbor. Since their left
neighbor (at position 0) is infected, the result should be True
.
The next call checks whether the susceptible person in position 0
has an infected neighbor. Both of this person’s neighbors (left at
position 2, and right at position 0) are infected, so again the
result should be True
.
The third call checks the person at position 0. Neither of this
person’s neighbors are infected and so, the expected result is
False
.
The table below provides information about the tests for
has_an_infected_neighbor
. Each row contains a test
number, the values that will be passed for the city
and
position
, arguments for that test, and the expected result.
Test | City | Position | Expected result |
---|---|---|---|
1 | [‘I1’, ‘S’, ‘S’] | 1 | True |
1 | [‘I0’, ‘S’, ‘S’] | 1 | True |
1 | [‘S’, ‘S’, ‘I0’] | 0 | True |
1 | [‘S’, ‘S’, ‘I1’] | 0 | True |
2 | [‘R’, ‘S’, ‘I0’] | 1 | True |
2 | [‘R’, ‘S’, ‘I1’] | 1 | True |
2 | [‘I1’, ‘S’, ‘S’] | 2 | True |
3 | [‘I0’, ‘S’, ‘S’] | 2 | True |
3 | [‘R’, ‘S’, ‘S’, ‘I1’] | 2 | True |
3 | [‘R’, ‘I1’, ‘S’, ‘S’] | 2 | True |
3 | [‘I1’, ‘S’, ‘S’, ‘S’] | 3 | True |
3 | [‘S’, ‘R’, ‘S’, ‘I1’] | 0 | True |
4 | [‘S’, ‘S’, ‘S’] | 0 | False |
4 | [‘R’, ‘S’, ‘R’] | 1 | False |
4 | [‘S’, ‘S’, ‘R’] | 0 | False |
4 | [‘I0’, ‘S’, ‘S’, ‘R’] | 2 | False |
5 | [‘S’, ‘I0’, ‘I1’] | 0 | True |
5 | [‘I1’, ‘S’, ‘I0’] | 1 | True |
5 | [‘I0’, ‘I1’, ‘S’] | 2 | True |
6 | Large city | 26 | True |
6 | Large city | 43 | True |
6 | Large city | 0 | False |
Test 1 (the first four rows) checks that you are correctly identifying an infected left neighbor. Test 2 checks that you are correcting identifying an infected right neighbor. Test 3 does the same checks on slight larger cities. Tests 4 checks cases where neither of the neighbors are infected. Test 5 checks cases where both neighbors are infected. And finally, Test 6 uses a large city that has infected people at positions 27 and 42.
You can run all of these tests by running the following command from the Linux command-line:
$ py.test -v -x -k neighbor
Debugging suggestions and hints for Task 2
There is a lot going on in this function and, when you are debugging,
it can be helpful to know exactly what is happening inside
the function. print
statements are among the most intuitive ways to identify what
your code is actually doing and will become your go-to debugging method.
If you are struggling to get started or
to return the correct values from your function, consider the following debugging
suggestions:
- Print the positions you calculated for the neighbors.
- Print the values you extracted for the neighbors.
- Make sure that you are returning, not printing, the desired value.
Is your code behaving as expected given these values?
Don’t forget to remove your debugging code (i.e., the print statements) before you submit your solution.
Task 3: Determine infection for a given person¶
Your next task is to complete the function gets_infected_at_position
.
This function will determine whether someone at a given position in a list will become infected on the next day of the simulation. More specifically, given a city, a specified susceptible person’s location within that city, and an infection rate r, your code should:
- Determine whether the person has an infected neighbor.
- If and only if the person has an infected neighbor, compute the immunity level of the person and determine whether they will become infected.
- Return whether the person becomes infected as a boolean.
You must use your has_an_infected_neighbor
function to determine
whether a susceptible person has an infected neighbor. Do not
repeat the logic for determining infection transmission from a
neighbor in this function!
Earlier, we described infection transmission as being similar to
flipping a weighted coin. In this function, if (and only if) the
person has an infected neighbor, you will compute the person’s current
immune level, a value between 0.0 and 1.0, that is the result of
flipping that weighted coin. We will use a random number generator to
obtain that value and, more specifically, you will call
random.random()
, a function that returns a random floating point
number between 0.0 and 1.0. If the resulting immune level is less
than the infection rate, the person will become infected. Another way
to think about it is that having an immune level greater than the
infection rate allows a person to fight off the infection.
Unfortunately, using a random number generator means that every call
to gets_infected_at_position
can produce a different result,
even if we use the exact same parameters when calling the function.
This is because each time a random number generator like
random.random()
is called, it returns a new random number. This
complicates debugging because the sequence of random numbers generated
will impact the simulation.
Fortunately, we can ensure that random.random()
returns the same
sequence of numbers when it is called by initializing it with a seed
value. It is common to set the seed value for a random number
generator when debugging. If we do not provide the random number
generator with a seed, it will usually derive the seed from the time
at which it is called.
Since many of our tests use the same seed (20170217
), we have
defined a constant, TEST_SEED
, with this value in sir.py
for
your convenience.
Let’s try out setting the seed using the value of sir.TEST_SEED
and then making some calls to the random number generator in ipython3
:
In [11]: sir.TEST_SEED
Out[11]: 20170217
In [12]: random.seed(sir.TEST_SEED)
In [13]: random.random()
Out[13]: 0.48971492504609215
In [14]: random.random()
Out[14]: 0.23010566619210782
In [15]: random.seed(sir.TEST_SEED)
In [16]: random.random()
Out[16]: 0.48971492504609215
In [17]: random.random()
Out[17]: 0.23010566619210782
(If your attempt to try out these commands in ipython3
fails with
a name error, you probably skipped the set up steps described in the
Getting started section. Exit ipython3
and restart it
following the instructions above.)
Notice that the third and fourth calls to random.random()
generate
exactly the same values as the first two calls. Why? Because we set
the seed to the exact same value before the first and third calls.
This has another implication for testing our code: it is crucial that you only compute a person’s immune level when they have at least one infected neighbor. If you call the random number generator more often than necessary, your code may generate different answers than ours on subsequent tasks.
Testing for Task 3
As in Task 1 and 2, we strongly encourage you to do some testing by
hand in ipython3
before you start using the automated
tests. Unlike previous tasks, you have to be careful to initialize the
random seed before calling gets_infected_at_position
, to
make sure you get the expected results. For example:
In [18]: random.seed(sir.TEST_SEED)
In [19]: sir.gets_infected_at_position(['S', 'I1', 'I1'], 0, 0.5)
Out[19]: True
In [20]: random.seed(sir.TEST_SEED)
In [21]: sir.gets_infected_at_position(['S', 'I1', 'I1'], 0, 0.3)
Out[21]: False
The table below provides information about the automated tests for
gets_infected_at_position
. Each row contains a test number,
the seed used to initialize the random number generator, the
values that will be passed for the city
, position
, and
infection_rate
arguments for that test, and the expected result.
The last column indicates who the infected neighbors are.
Test | Seed | City | Position | Infection rate | Expected result | Infected neighbor(s) |
---|---|---|---|---|---|---|
1 | 20170217 | [‘S’, ‘S’, ‘I1’] | 0 | 0.5 | True | Left |
2 | 20170217 | [‘S’, ‘S’, ‘I1’] | 0 | 0.2 | False | Left |
3 | 20170217 | [‘S’, ‘I1’, ‘S’] | 0 | 1.0 | True | Right |
4 | 20170217 | [‘S’, ‘I1’, ‘S’] | 0 | 0.2 | False | Right |
5 | 20170217 | [‘S’, ‘I1’, ‘I1’] | 0 | 1.0 | True | Both |
6 | 20170217 | [‘S’, ‘I1’, ‘I1’] | 0 | 0.2 | False | Both |
7 | 20170217 | [‘S’, ‘R’, ‘S’] | 0 | 1.0 | False | None |
8 | 20170217 | [‘I1’, ‘S’, ‘S’, ‘S’] | 2 | 1.0 | False | None |
There are an additional three tests that test multiple cases at once:
- Test 9: Similar to tests 1-8 but using position 1
- Test 10: Similar to tests 1-8 but using position 2
- Test 11: Tests all three positions of city
['S', 'S', 'S']
You can run these tests by executing the following command from the Linux command-line:
$ py.test -v -x -k position
Debugging suggestions and hints for Task 3
If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:
- Print the result you are getting from
has_an_infected_neighbor
. - Print the immune level (if needed).
- Make sure that you are making the right number of calls to
random.random
(zero or one). - When testing in
ipython3
, ensure that you have reset the seed for the random number generator before each test call togets_infected_at_position
.
Task 4: Move the simulation forward a single day¶
Your fourth task is to complete the function simulate_one_day
. In
layman’s terms, this function will model one day in a simulation and
will act as a helper function to run_simulation
. More concretely,
simulate_one_day
should take the city’s state at the start of the day,
and the infection rate r and return a new list of disease states (i.e.,
the state of the city after one day).
To do this work, you will iterate over all possible locations in the city. If the person at a location is:
- Susceptible (
'S'
): you need to determine whether they will become infected ('I1'
) or remain susceptible ('S'
) for another day using yourgets_infected_at_position
function. - Infected (
'I1'
,'I0'
): people in state'I1'
go to state'I0'
, and people in state'I0'
go to state'R'
- Recovered (
'R'
): people in this state remain in that state.
As an example, consider the following call to simulate_one_day
:
In [22]: sir.simulate_one_day(['I0', 'I1', 'R'], 0.3)
Out[22]: ['R', 'I0', 'R']
Since this city doesn’t have any susceptible people, the states just advance
automatically according to the rules we described above ( 'I0'
goes to
'R'
, 'I1'
goes to 'R'
, and 'R'
stays the same).
However, when you do encounter a susceptible person, you will need to call
gets_infected_at_position
, which will involve calls to the
random number generator. So, when testing simulate_one_day
in
ipython3
, you must make sure to reset the seed (random.seed()
)
between calls to simulate_one_day
.
For example:
In [23]: random.seed(sir.TEST_SEED)
In [24]: sir.simulate_one_day(['I0', 'I1', 'S'], 0.5)
Out[24]: ['R', 'I0', 'I1']
In this case, the first two positions advanced the same way as before,
but the third position required a call to gets_infected_at_position
(which, in this particular case, with this random seed, will return True
).
On the other hand, if we use a lower infection rate, the third position
is not infected:
In [25]: random.seed(sir.TEST_SEED)
In [26]: sir.simulate_one_day(['I0', 'I1', 'S'], 0.1)
Out[26]: ['R', 'I0', 'S']
Testing for Task 4
The table below provides information about the tests for
simulate_one_day
. Each row contains a test number, the seed, the
values that will be passed for the city
and
infection_rate
arguments for that test, and the expected result.
Test | Seed | City | Infection rate | Result |
---|---|---|---|---|
1 | 20170217 | [‘I1’, ‘I1’, ‘I1’] | 0.0 | [‘I0’, ‘I0’, ‘I0’] |
2 | 20170217 | [‘I0’, ‘I0’, ‘I0’] | 0.0 | [‘R’, ‘R’, ‘R’] |
3 | 20170217 | [‘R’, ‘R’, ‘R’] | 0.0 | [‘R’, ‘R’, ‘R’] |
4 | 20170217 | [‘I1’, ‘I1’, ‘S’] | 0.2 | [‘I0’, ‘I0’, ‘S’] |
5 | 20170217 | [‘I1’, ‘I1’, ‘S’] | 0.5 | [‘I0’, ‘I0’, ‘I1’] |
6 | 20170217 | [‘I0’, ‘I0’, ‘S’] | 0.5 | [‘R’, ‘R’, ‘I1’] |
7 | 20170217 | [‘S’, ‘I0’, ‘S’] | 0.9 | [‘I1’, ‘R’, ‘I1’] |
8 | 20170217 | [‘S’, ‘I0’, ‘S’] | 0.3 | [‘S’, ‘R’, ‘I1’] |
9 | 20170217 | [‘S’, ‘S’, ‘S’] | 1.0 | [‘S’, ‘S’, ‘S’] |
The first three tests check that the rules that involve no randomness are being applied correctly. Notice how, in this case, the infection rate has no effect on the outcome, and we arbitrarily set it to 0.0.
The fourth, fifth, and sixth tests have one susceptible person (who does not become infected in the fourth case, but does become infected in the fifth and sixth test). The seventh and eighth tests have two susceptible persons (both become infected in the seventh test, only one in the eighth case). The ninth test case has no infected people, which means no susceptible persons should become infected (even if we use an infection rate of 1.0). Notice that these tests not only verify basic functionality but also examine trickier edge cases. For example, the sixth test case checks that the susceptible person becomes infected even when the infected neighbors recover in the same day.
You can run these tests by executing the following command from the Linux command-line:
$ py.test -v -x -k one
Debugging suggestions for Task 4
If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:
- Use simple infection rates that will not rely on the random number generator (like 0.0 and 1.0) to verify that the states change as expected.
- Print out each person’s old and new disease states. Ensure that the new disease states are correct in all cases.
Detour: data files¶
Before moving on to your next task, we need to take a short detour and
talk about the parameter files used to test the remaining tasks.
These files can be found in the configs
subdirectory of your
pa1
directory and are numbered. As an example, the file
1.json
(json
is pronounced jay-sawn) contains the starting
parameters for simulation 1:
{
"starting_state" : ["S", "I1", "I1"],
"starting_seed" : 20170217,
"max_num_days" : 100,
"infection_rate" : 0.5,
"num_trials" : 10
}
Here, the starting_state
refers to the initial disease state of
the city, the random_seed
is the seed for the random number
generator, max_num_days
refers to the maximum number of days to
simulate, infection_rate
refers to the infection rate of the
simulation, and num_trials
is the number of trials to run, which
will be used in Task 5.
You can view the contents of a text file using the Linux
commands cat
, more
, or less
. For example, you can run:
$ cat configs/1.json
to see the contents of the file configs/1.json
.
We have provided code in util.py
that will read a configuration
file and return the parameters for a simulation. You can use the
function util.get_config()
to extract the parameters for
each trial from the JSON files provided, as shown below:
In [27]: starting_state, random_seed, max_num_days, \
...: infection_rate, num_trials = util.get_config('./configs/1.json')
In [28]: starting_state
Out[28]: ['S', 'I1', 'I1']
In [29]: infection_rate
Out[29]: 0.5
In [30]: random_seed
Out[30]: 20170217
Once you have loaded these values, you can pass them to a function as needed. For example:
In [31]: random.seed(random_seed)
In [32]: sir.simulate_one_day(starting_state, infection_rate)
Out[32]: ['I1', 'I0', 'I0']
For the remaining tasks, we will list the parameter filename rather than the values of the parameters for each test.
Task 5: Run the simulation¶
Your fifth task is to complete the function run_simulation
, which
takes the starting state of the city, the random seed, the maximum
number of days to simulate (max_num_days
), and the infection rate
as arguments and returns both the final state of the city and the
number days simulated as a tuple.
To clarify:
- Your function should run one whole simulation, including setting the seed exactly once before simulating any days.
- Your simulation should start on day 0 and count the numbers of the days
simulated. For example, if your simulation starts on day 0 and reaches
the stopping conditions after simulating one day, it should return
1
as the number of days. On the other hand, if your simulation starts on day 0 and runs for day 0 and day 1, it should return2
as the number of days simulated. - Recall that there are two stopping conditions for this simulation: that
max_num_days
days have passed or that no one in the city is infected after simulating a given day. You should use thecount_infected
function from Task 1 to check the second stopping condition, and you should check this condition after you simulate a day (not before). Thus, as long asmax_num_days
is greater than zero, you should always simulate at least one day, even if no person in the city is infected at the start of the simulation.
Here is an example use of this function:
In [33]: starting_state, random_seed, max_num_days, \
...: infection_rate, num_trials = util.get_config('./configs/2.json')
In [34]: sir.run_simulation(starting_state, random_seed, max_num_days, infection_rate)
Out[34]: ['R', 'R', 'R'], 3
Notice that our sample use did not include a call to set the random
seed. Your run_simulation
function should set the seed based on the
random seed parameter, so you will not need to reset the seed
manually to test this function.
Testing Task 5
We have provided six tests for this task. The first three test basic functionality and the last three explicitly check stopping conditions.
Test | Configuration file | Expected Result: city, number of days simulated |
---|---|---|
1 | 3.json |
[‘S’, ‘R’, ‘R’], 3 |
2 | 4.json |
[‘S’, ‘S’, ‘R’], 2 |
3 | 5.json |
[‘R’, ‘S’, ‘R’, ‘R’, ‘S’], 2 |
4 | 6.json |
[‘R’, ‘R’, ‘R’], 2 |
5 | 7.json |
[‘R’, ‘S’, ‘S’], 1 |
6 | 8.json |
[‘R’, ‘R’, ‘R’, ‘I0’, ‘I1’, ‘S’, ‘S’], 3 |
You can run these tests by executing the following command from the Linux command-line.
$ py.test -v -x -k run_simulation
Debugging hints for Task 5
If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:
- If your function returns one fewer or one more day than our test function, please reread the directions and ensure that you are counting the days properly.
- If you are generating the wrong final state for the city, try printing the
day (
0
,1
,2
, etc.), the disease states before the call tosimulate_one_day
, and the disease states after the call tosimulate_one_day
.
From this point on, we will not be providing explicit debugging hints. In general, it is a good idea to use print statements to uncover what your code is doing.
Task 6: Determining average infection spread¶
Your sixth task is to complete the function
compute_average_num_infected
, which computes the average number of
infected people over num_trials trials for a given city and infection rate. This
function takes the starting state of the city, the random seed,
the maximum number of days to simulate, the infection rate,
and the number of trials to run as arguments and
returns the average number of infected people over the num_trials trials. The number of
infected people per trial is simply the number of people who end the
simulation in state 'I1'
, 'I0'
or 'R'
. The average number of infected
people over time is the average number of people per trial who get infected.
Each time you run a simulation, you should increase the random seed by 1. It is important that you carefully increment your random seed. If you forget to increment your seed, all trials will be identical, and if you increment your seed differently than specified, your code may not pass our tests.
Your implementation should call run_simulation
, which sets the
seed, so unlike some of the earlier tasks, you do not need to call
random.seed
before running this function.
Here’s a sample use of this function:
In [35]: starting_state, random_seed, max_num_days, infection_rate, num_trials = \
...: util.get_config('./configs/3.json')
In [36]: starting_state
Out[36]: ['S', 'S', 'I1']
In [37]: num_trials
Out[37]: 5
In [38]: sir.compute_average_num_infected(starting_state,
...: random_seed, max_num_days, infection_rate, num_trials)
Out[38]: 2.4
How did the function arrive at an average of 2.4 infected people? We first consider the number of infected people per simulation. Notice how the random seed changes for each trial, while the starting state is the same for every trial.
Simulation number | Seed | Starting state for simulation run | Final state for simulation run | Number of people infected (Ix + R) |
---|---|---|---|---|
1 | 20170217 | [‘S’, ‘S’, ‘I1’] | [‘S’, ‘R’, ‘R’] | 2 |
2 | 20170218 | [‘S’, ‘S’, ‘I1’] | [‘R’, ‘R’, ‘R’] | 3 |
3 | 20170219 | [‘S’, ‘S’, ‘I1’] | [‘R’, ‘S’, ‘R’] | 2 |
4 | 20170220 | [‘S’, ‘S’, ‘I1’] | [‘R’, ‘R’, ‘R’] | 3 |
5 | 20170221 | [‘S’, ‘S’, ‘I1’] | [‘S’, ‘R’, ‘R’] | 2 |
Then, we must average the number of people infected per trial to arrive at: \(12/5 = 2.4\).
Testing Task 6
We have provided four tests for this task. The first can be checked easily with print statements, the second and third test basic functionality, and the final one tests an edge case (num_trials = 1).
Test | JSON file | N | Average number infected |
---|---|---|---|
1 | 9.json |
20 | 4.0 |
2 | 10.json |
10 | 4.6 |
3 | 11.json |
100 | 23.03 |
4 | 12.json |
1 | 8.0 |
You can run these tests by executing the following command from the Linux command-line.
$ py.test -v -x -k average
Task 7: Determining the impact of infection rate¶
Your seventh and final task is to complete the function
infection_rate_param_sweep
, which shows how the number of people
infected over a specified number of trials varies with infection rate.
This function takes the starting state, random seed, the maximum
number of days to simulate, a list of infection rates, and a number of
trials as parameters and returns a list of the average number of
people infected for each infection rate. Your function should iterate
through the list of infection rates and call
compute_average_infection_rate
for each infection rate. You should
store the resulting averages in a list that your function will return.
We have provided a list of infection rates in sir.py
called
INFECTION_RATE_LIST
. You may find this list useful while you are
writing and testing your function. In addition, you may find it
helpful to consider edge cases of infection rates. For example, at an
infection rate of 0, the disease will never spread from those infected
to those susceptible. Similarly, at an infection rate of 1, the
disease will spread each day from those infected to all of their
susceptible neighbors.
Here is an example use of this function:
In [39]: starting_state, random_seed, max_num_days, infection_rate, num_trials = \
...: util.get_config('./configs/3.json')
In [40] num_trials
Out[40] 5
In [41] sir.INFECTION_RATE_LIST
Out[41] [0, 0.25, 0.5, 0.75, 1.0]
In [42]: sir.infection_rate_param_sweep(starting_state,
...: random_seed, max_num_days, sir.INFECTION_RATE_LIST, num_trials)
Out[42]: [1.0, 1.6, 2.8, 3.0, 3.0]
Recall that the infection rate measures how contagious a given disease is. A higher infection rate, then, should increase the average number of people infected in a city over a number of trials. As we would expect, the number of people infected per trial increases as the infection rate increases.
Testing Task 7
We have provided seven tests for this task. The first five tests use
the infection rate list defined by sir.INFECTION_RATE_LIST
. Of
these, the first three end quickly and can be checked using print
statements The next two use much bigger values for the number of
trials to run and cannot be computed by hand easily. The last two
tests check your code on corner cases for the infection rate list: the
sixth test uses []
as the infection rate list, while the seventh
test uses [1.0]
. (“Zero, one, many” is a good rule of thumb when
testing a function that works on lists. That is, in most cases, you
should test such functions on a list of size zero, a list of size one,
and a list with many entries.)
Test | JSON file | Average number infected list |
---|---|---|
1 | 13.json |
[2.0, 2.4, 3.5, 4.3, 5.0] |
2 | 14.json |
[2.0, 2.4, 3.7, 4.6, 5.0] |
3 | 15.json |
[2.0, 2.45, 2.77, 2.945, 3.0] |
4 | 16.json |
[16.0, 21.824, 29.252, 35.078, 37.0] |
5 | 17.json |
[38.0, 72.7, 106.2, 180.0, 397.0] |
6 | 1.json |
[] |
7 | 1.json |
[3.0] |
You can run these tests by executing the following command from the Linux command-line.
$ py.test -v -x -k sweep
Putting it all together¶
We have included code in sir.py
that calls your functions to
compute and then analyze various simulations. It takes a single
argument: the name of the input starting state JSON file.
Here is a sample use of this program:
$ python3 sir.py ./configs/1.json
and here is the output that it should return:
Running initial simulation...
The starting state of the simulation was ['S', 'I1', 'I1'].
The final state of the simulation is ['R', 'R', 'R'].
The simulation ended after day 3.
Running multiple trials...
Over 10 trial(s), on average, 2.8 people were infected
Varying infection parameter...
Rate | Infected
0.0 | 2.00
0.2 | 2.40
0.5 | 2.80
0.8 | 3.00
1.0 | 3.00
Grading¶
Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our PA Rubric page.)
The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:
- Completeness: 75%
- Correctness: 15%
- Design: 0%
- Style: 10%
Obtaining your test score¶
The completeness part of your score will be determined using automated
tests. To get your score for the automated tests, simply run the
following from the Linux command-line. (Remember to leave out the
$
prompt when you type the command.)
$ py.test
$ ../common/grader.py
Notice that we’re running py.test
without the -k
or -x
options: we want it to run all the tests. If you’re still failing
some tests, and don’t want to see the output from all the failed
tests, you can add the --tb=no
option when running py.test
:
$ py.test --tb=no
$ ../common/grader.py
Take into account that the grader.py
program will look at the
results of the last time you ran py.test
so, if you make any
changes to your code, you need to make sure to re-run py.test
. You
can also just run py.test
followed by the grader on one line by
running this:
$ py.test --tb=no; ../common/grader.py
After running the above, you should see something like this (of course, your actual scores may be different!):
Category Passed / Total Score / Points
----------------------------------------------------------------------------------------------------
Task 1: Count the number of infected people in a city 2 / 4 2.50 / 5.00
Task 2: Is one of our neighbors infected? 6 / 6 7.50 / 7.50
Task 3: Determine infection for a given person 11 / 11 7.50 / 7.50
Task 4: Move the simulation forward a single day 9 / 9 15.00 / 15.00
Task 5: Run the simulation 2 / 6 5.00 / 15.00
Task 6: Determining average infection spread 1 / 4 3.75 / 15.00
Task 7: Determining the impact of infection rate 1 / 7 1.43 / 10.00
----------------------------------------------------------------------------------------------------
TOTAL = 42.68 / 75
====================================================================================================
Submission¶
To submit your assignment, make sure that you have:
- put your name at the top of your file,
- registered for the assignment using chisubmit (if you have not done so already),
- added, committed, and pushed your code to the git server, and
- run the chisubmit submission command.
Here are the relevant commands to run on the Linux command-line.
(Remember to leave out the $
prompt when you type the command.)
$ chisubmit student assignment register pa1
$ git add sir.py
$ git commit -m"final version of PA #1 ready for submission"
$ git push
$ chisubmit student assignment submit pa1
We recommend copying and pasting these commands rather than re-typing them!
Remember to push your code to the server early and often!
Acknowledgments: This assignment was inspired by a discussion of the SIR model in the book Networks, Crowds, and Markets by Easley and Kleinberg. Emma Nechamkin wrote the original version of this assignment.