Modeling Epidemics

Due: Friday, October 12th at 4pm

The goal of this assignment is to give you practice with the basics of Python and to get you to think about how to translate a few simple algorithms into code. You will be allowed to work in pairs on some of the later assignments, but you must work alone on this assignment.

Epidemics and contagion are incredibly complex phenomena, involving both biological and social factors. Computer models, though imperfect, can offer insight into disease spread, and can represent infection with varying degrees of complexity.

The SIR epidemic model is simple but is commonly used. In the SIR model, a person can be in one of three states: Susceptible to the disease, Infected with the disease, or Recovered from the disease after infection (the model is named after these three states: S-I-R). In this model, we focus on a network of people, such as a community that could be experiencing an epidemic. Although simple, the SIR model captures that both social factors (like the shape of the network, e.g., how often people in the network interact with each other) and biological factors (like the probability and duration of infection) mediate disease spread.

In this assignment, you will write code to model a simplified version of the SIR epidemic model. Your code will model how infection spreads through a city over time, where time is measured in days. At a high level, your code will iteratively update the disease states in a city, keeping track of the disease states of each person within a city as needed until the end of the simulation. In addition, you will see how to use functions that build on one another to simplify a complex modeling process.

To begin building our model, we must specify the model’s details:

  • the health of each person during the simulation, which we will call a person’s disease state,
  • the starting state of a community of people or city,
  • the neighbors of a given individual in a city,
  • the transmission rules for disease spread within the city,
  • the rules for acquiring immunity to disease,
  • the method for keeping track of timing in a city,
  • and the stopping conditions for the model.

We specify each of these details below.

Disease state: all people in the simulation can exist in one of three states, Susceptible, Infected, or Recovered.

  • Susceptible: the individual is healthy but may become infected. We will use 'S' to represent susceptible individuals.
  • Infected: the individual has an infection currently. We will represent these individuals as 'I1' or 'I0' (we expand on the meaning of these two values when we explain the immunity rules).
  • Recovered: the individual has recovered from an infection and will be immune to the infection for the rest of the simulation. We represent these individuals with 'R'. Note that we will not be removing recovered people from our city.

Cities: a city in this simulation is represented as a list of people, each represented by a disease state. For example, a city of ['S', 'I1', 'R'] is composed of three people, the first of whom is susceptible, the second of whom is infected (and specifically, in their first day of infection), and the third of whom is recovered.

You can assume that every city has at least three people.

Neighbors: every person in our simplified model has exactly two neighbors, the person immediately before them in the list (known as their left neighbor) and the person immediately after them in the list (known as their right neighbor). The last and first people are also neighbors. It may be helpful to imagine each city as a ring being represented by a list to determine neighbors. For example, consider the following list of people: ['Mark', 'Sarah', 'Lorraine', 'Marshall']. Sarah has two neighbors: Mark and Lorraine. Likewise, Mark also has two neighbors: Marshall and Sarah. And, even Marshall has two neighbors: Lorraine and Mark.

Transmission rules: infection spreads from infected people ('I1' or 'I0') to susceptible people ('S') based on infection rate r, the disease states of the susceptible person’s neighbors, and the person’s immune level.

  • Infection rate r: infection rate r is a value between 0.0 and 1.0 that models how quickly a given infection rate spreads through a city. A high infection rate r indicates that the infection is highly contagious, while a low infection rate r indicates that the infection is not very contagious.
  • Neighbors: a susceptible person will only become infected if at least one of their neighbors is infected.

You can think about infection transmission as being similar to flipping a weighted coin. If a susceptible person has at least one infected neighbor, we flip a coin to determine the person’s immune level. This value and the infection rate will be used to determine whether the susceptible person will get infected as well. It does not matter which neighbor (the left or right neighbor) is infected. Note that, in general, the coin will not be fair (unless r is 0.5). For example, an infection rate of 1.0 can be thought of as a coin that always lands on one side.

Immunity rules: For simplicity, we will assume that an infected person remains infected for exactly two days (in a more complex model, we could make this value a parameter of our model, to investigate how an epidemic progresses with shorter or longer infection times). When infected, an individual will be in the 'I1' state; the next day they will be in the 'I0' state; the day after that, they will recover from the infection and go into the 'R' state. At that point, they are immune to the disease and cannot become re-infected.

Stopping conditions: the simulation should stop after a given number of days or when simulating an additional day yields a city with only people who are susceptible ('S') or recovered ('R'), at which point the infection can no longer spread.

Getting started

In the first lab, you learned the basics of how to use git and our git server. We will use git for all the programming assignments and labs in this course.

We have seeded your repository with a directory for this assignment. To pick it up, change to your capp30121-aut-18-username directory (where the string username should be replaced with your username) and then run the command git pull upstream master. You should also run the command git pull to make sure your local copy of your repository is in sync with the server.

At the first lab, you already ran this command, and it pulled the pa1 sub-directory into your capp30121-aut-18-username directory. However, it is good practice to always run git pull upstream master before you start working, since we may occasionally update files (e.g., if we notice bugs in our code, etc.). For example, some of the files for this assignment may have changed since you downloaded the initial distribution. After you’ve run git pull upstream master, you can proceed as described in the lab: work on your code and then run git add <filename> for each file you change, followed by git commit -m"some message" and git push to upload your changes to the server before you log out.

You should always upload the latest version of your work to the server using the commands described above before you log out, then run git pull and git pull upstream master before you resume work to retrieve the latest files. This discipline will guarantee that you always have the latest version, no matter which machine you are using. Also, it will be easier for us to help you recover from git and chisubmit problems if you consistently push updates to the server.

As you will see below, we strongly encourage you to experiment with library functions and try out your own functions by hand in ipython3. Let’s get you set up to do that before we describe your tasks for this assignment. Open up a new terminal window and navigate to your pa1 directory. Then, fire up ipython3 from the Linux command-line, set up autoreload, and import your code as follows:

$ ipython3

In [1]: %load_ext autoreload

In [2]: %autoreload 2

In [3]: import sir

In [4]: import util

In [5]: import random

(Note: In [<number>] represents the ipython3 prompt. Your prompts may look different. Do not type the prompt when issuing commands.)

The commands %load_ext autoreload and %autoreload 2 tell ipython3 to reload your code automatically whenever it changes. We encourage you to use this package whenever you are developing and testing code.

Getting help

If, after carefully reading the details of any part of the assignment, you are still confused about how to get started or make progress:

  • post a question on Piazza to ask for help, or
  • come to office hours

Before you post a question on Piazza, please check to see if someone else has already asked the same question. We especially encourage you to check the “Must read posts for PA #1” post, which we will update over time to be a compendium of important questions and answers. Also, please read the pinned post on “Asking effective questions.” Finally, please add, commit, and push the most recent version of your code to the server (as described above) before you ask your question. Syncing your code will allow us to look at it, which may speed up the process of helping you.

Style

Following a consistent style is important because it makes it easier for others to read your code; imagine if you were collaborating on a large software project with 30 other developers, and everyone used their own style of writing code!

To help you understand what constitutes good style, we have put together a style guide for the course: Python Style Guide for Computer Science with Applications.. We expect you to use good style (that is, style that matches this guide), and will take this expectation into account when grading

For this assignment, you may assume that the input passed to your functions has the correct format. You may not change any of the input that is passed to your functions. In general, it is bad style to modify a data structure passed as input to a function, unless that is the explicit purpose of the function. Your function’s client might have other uses for the data they pass to your function and should not be surprised by unexpected changes.

Your tasks

For this assignment, we will specify a set of functions that you must implement. You will start with basic functions and work your way up to more complex tasks. We will also supply extensive test code. Over the course of the term, we will provide less and less guidance on the appropriate structure for your code.

Task 1: Count the number of infected people in a city

In Python, it is common to write helper functions that encapsulate key definitions and are only a few lines long. Your first task is to complete one such function: count_infected.

This function should take a city as input and return the number of infected people in that city. For example, given city ['I0', 'I0', 'I1', 'S'], the function would return 3 (notice how we have to account for the fact that there are two infected states: 'I1' and 'I0'). On the other hand, given city ['S', 'R', 'S', 'S'], the function would return 0.

Testing for Task 1

We have provided an extensive suite of automated tests for this assignment. You may be tempted to do the following: write some code, run the automated tests to find a test that fails, modify your code, and then repeat the process until all of the tests pass. This is a very bad way to debug your code, because it typically takes much longer than taking a methodical step-by-step approach and often yields messy code that passes the tests without actually matching the specification of the problem. Instead, you should try your code out on some of the test cases by hand in ipython3 to get a sense of how your code works before you try the automated tests.

Here, for example, are some sample calls to count_infected:

In [6]: sir.count_infected(['I0', 'I0', 'I1', 'S'])
Out[6]: 3

In [7]: sir.count_infected(['S', 'R', 'S', 'S'])
Out[7]: 0

If you get the wrong answer for some sample input, stop to reason why your code is behaving the way it is and think about how to modify it to get the correct result. If you still can’t determine the problem after reasoning about the code, use print statements to print out key values.

Now on to the automated tests. The file test_sir.py contains automated test code for the tasks in this assignment. The names of the tests for a given function share a common pattern: a prefix that includes the word test_ and the function name followed by the test number. For example, the second test for count_infected is named test_count_infected_2.

For count_infected, we have provided four tests:

  • test_count_infected_1: Cities with no infected people.
  • test_count_infected_2: Cities with some 'I1' (but no 'I0') infected people.
  • test_count_infected_3: Cities with some 'I0' (but no 'I1') infected people.
  • test_count_infected_4: Cities with both 'I1' and 'I0' infected people.

Note that each test actually checks multiple cities but, within a given test, they are all of the same type (e.g., in Test 2, all the cities have at least one I1 person, and no I0 persons).

The reason we test for these four cases is to ensure that our tests have sufficient coverage, meaning that they account for as different many cases as possible in our code. For example, we could be tempted to write tests just for the following two cities:

  • ['S', 'I0', 'I0', 'S']
  • ['S', 'R', 'S', 'S']

However, what if we wrote a solution that forgot to account for the I1 state? The above tests would not cover that case.

We will be using the pytest Python testing framework for this and subsequent assignments. To run our automated tests, you will use the py.test command from the Linux command line (not from within ipython3). We recommend opening a new terminal window for running this command, which will allow you to go back and forth easily between testing code by hand in ipython3 and running the test suite using py.test. (When we work on assignments, we usually have three windows open: one for editing, one for experimenting in ipython3, and one for running the automated tests.)

Pytest, which is available on both the lab machines and your VM, has many options. We’ll use three of them: -v, which means run in verbose mode, -x, which means that pytest should stop running tests after a single test failure, and -k, which allows you to describe a subset of the test functions to run. You can see the rest of the options by running the command py.test -h.

For example, running the following command from the Linux command-line:

$ py.test -v -x -k test_count_infected test_sir.py

will run the functions in test_sir.py that have names that start with test_count_infected until they’ve all been run or one fails. (Recall that the $ represents the prompt and is not included in the command.)

Here is (slightly-modified) output from using this command to test our reference implementation of test_count_infected:

$ py.test -v -x -k test_count_infected test_sir.py
collected 47 items / 43 deselected

test_sir.py::test_count_infected_1 PASSED
test_sir.py::test_count_infected_2 PASSED
test_sir.py::test_count_infected_3 PASSED
test_sir.py::test_count_infected_4 PASSED

==== 4 passed, 43 deselected in 0.05 seconds ====

This output shows that our code passed all four tests in the test_count_infected suite. It also shows that there were 43 tests that were deselected (that is, were not run) because they did not match the test selection criteria specified by the argument to -k.

If you fail a test, pytest will tell you the name of the test function that failed and the line in the test code at which the failure was detected. This information can help you determine what is wrong with your program. Read it carefully to understand the test inputs and why the test failed! Then, switch back to testing your function in ipython3 until you have fixed the problem.

For example, if you wrote a solution that did not account for the I0 state, you would pass Tests 1 and 2, but would fail Test 3 like this:

test_sir.py::test_count_infected_1 PASSED
test_sir.py::test_count_infected_2 PASSED
test_sir.py::test_count_infected_3 FAILED

 generated json report: ...

====================================== FAILURES ======================================
_______________________________ test_count_infected_3 ________________________________

    def test_count_infected_3():
        '''
        Cities with some I0 (but no I1) infected people.
        '''
>       helper_count_infected(["I0", "S", "S", "S"], 1)

test_sir.py:129:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

city = ['I0', 'S', 'S', 'S'], expected = 1

    def helper_count_infected(city, expected):
        n = sir.count_infected(city)

        if n != expected:
            s = "City {} has {} infected people, but count_infected returned {}"
>           pytest.fail(s.format(city, expected, n))
E           Failed: City ['I0', 'S', 'S', 'S'] has 1 infected people, but count_infected returned 0

test_sir.py:89: Failed
!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!
================================ 43 tests deselected =================================
================= 1 failed, 2 passed, 43 deselected in 0.09 seconds ==================

The amount of output can be a bit overwhelming, but you should focus on two things:

  • Towards the end of the output, there is a line starting with E. This will usually contain a helpful message telling you why the test failed:

    Failed: City ['I0', 'S', 'S', 'S'] has 1 infected people, but count_infected returned 0
    

    This information can help us narrow down the issue with our code. In particular, this error message suggests that we forgot to count the 'I0' people in the city.

  • While the above error message is probably enough for us to fix the issue, sometimes we may need to dig a bit deeper by looking at the exact test that fails. pytest helpfully tells us that the test that failed is test_count_infected_3, and even includes part of the code for that test, so you don’t have to look it up in test_sir.py. The line that starts with a greater-than symbol (>) tells you the line in the test that failed:

    helper_count_infected(["I0", "S", "S", "S"], 1)
    

    helper_count_infected is a function in test_sir.py that calls count_infected with a given city, and checks whether the return value matches the expected value. You will see this use of helper functions frequently in our tests.

Take into account that, because we specified the -x option, pytest exited as soon as Test 3 failed (without running Test 4). Omitting the -x option makes sense when you want to get a sense of what tests are passing and which ones aren’t; however, when debugging your code, you should always use the -x option so that you can focus on one error at a time.

Finally, pytest will run any function that starts with test_. You can limit the tests that get run using the -k option along with a string that identifies the desired tests. The string is not required to be a prefix. For example, if you specify -k count, pytest will run all test functions that start with test_ and include the word count.

Also, by default, if you do not supply the name of a specific test file, pytest will look in the current directory tree for Python files that have names that start with test_.

In subsequent examples, we will leave out the name of the file with the test code (test_sir.py) and we will use short substrings to describe the desired tests. For example, the test above could have been run with the following command:

$ py.test -v -x -k count

Debugging suggestions and hints for Task 1

Remember to save any changes you make to your code in your editor as you are debugging. Skipping this step is a common error. Fortunately, we’ve eliminated another common error – forgetting to reload code after it changes – by using the autoreload package. (If you skipped the Getting started section, please go back and follow the instructions to set up autoreload and import random, sir, etc.)

Task 2: Is one of our neighbors infected?

Next, you will write a function called has_an_infected_neighbor that will determine whether a susceptible person at a given position in a list has at least one neighbor who is infected.

More specifically, given the city and the person’s position, your code will compute the positions of the specified person’s left and right neighbors in the city and determine whether either one is in an infected state.

When you look at the code, you will see that we included the following line:

assert city[position] == "S"

to verify that the function has been called on a person who is susceptible to infection. In general, assertions have the following form:

assert <boolean expression>

Assertions are a useful way to check that your code is receiving valid inputs: if the boolean expression specified as the assertion’s condition evaluates to False, the assertion will make the function fail. Simple assertions can greatly simplify the debugging process by highlighting cases where a function is being called incorrectly.

Testing for Task 2

As in the previous task, we suggest you start by trying out your code in ipython3 before you run the automated tests. Here, for example, are some sample calls to has_an_infected_neighbor:

In [8]: sir.has_an_infected_neighbor(['I1', 'S', 'S'], 1)
Out[8]: True

In [9]: sir.has_an_infected_neighbor(['S', 'I1', 'IO'], 0)
Out[9]: True

In [10]: sir.has_an_infected_neighbor(['S', 'S', 'S'], 2)
Out[10]: False

In the first sample call, we want to check whether the susceptible person in position 1 has an infected neighbor. Since their left neighbor (at position 0) is infected, the result should be True.

The next call checks whether the susceptible person in position 0 has an infected neighbor. Both of this person’s neighbors (left at position 2, and right at position 0) are infected, so again the result should be True.

The third call checks the person at position 0. Neither of this person’s neighbors are infected and so, the expected result is False.

The table below provides information about the tests for has_an_infected_neighbor. Each row contains a test number, the values that will be passed for the city and position, arguments for that test, and the expected result.

Tests for has_an_infected_neighbor
Test City Position Expected result
1 [‘I1’, ‘S’, ‘S’] 1 True
1 [‘I0’, ‘S’, ‘S’] 1 True
1 [‘S’, ‘S’, ‘I0’] 0 True
1 [‘S’, ‘S’, ‘I1’] 0 True
2 [‘R’, ‘S’, ‘I0’] 1 True
2 [‘R’, ‘S’, ‘I1’] 1 True
2 [‘I1’, ‘S’, ‘S’] 2 True
3 [‘I0’, ‘S’, ‘S’] 2 True
3 [‘R’, ‘S’, ‘S’, ‘I1’] 2 True
3 [‘R’, ‘I1’, ‘S’, ‘S’] 2 True
3 [‘I1’, ‘S’, ‘S’, ‘S’] 3 True
3 [‘S’, ‘R’, ‘S’, ‘I1’] 0 True
4 [‘S’, ‘S’, ‘S’] 0 False
4 [‘R’, ‘S’, ‘R’] 1 False
4 [‘S’, ‘S’, ‘R’] 0 False
4 [‘I0’, ‘S’, ‘S’, ‘R’] 2 False
5 [‘S’, ‘I0’, ‘I1’] 0 True
5 [‘I1’, ‘S’, ‘I0’] 1 True
5 [‘I0’, ‘I1’, ‘S’] 2 True
6 Large city 26 True
6 Large city 43 True
6 Large city 0 False

Test 1 (the first four rows) checks that you are correctly identifying an infected left neighbor. Test 2 checks that you are correcting identifying an infected right neighbor. Test 3 does the same checks on slight larger cities. Tests 4 checks cases where neither of the neighbors are infected. Test 5 checks cases where both neighbors are infected. And finally, Test 6 uses a large city that has infected people at positions 27 and 42.

You can run all of these tests by running the following command from the Linux command-line:

$ py.test -v -x -k neighbor

Debugging suggestions and hints for Task 2

There is a lot going on in this function and, when you are debugging, it can be helpful to know exactly what is happening inside the function. print statements are among the most intuitive ways to identify what your code is actually doing and will become your go-to debugging method. If you are struggling to get started or to return the correct values from your function, consider the following debugging suggestions:

  • Print the positions you calculated for the neighbors.
  • Print the values you extracted for the neighbors.
  • Make sure that you are returning, not printing, the desired value.

Is your code behaving as expected given these values?

Don’t forget to remove your debugging code (i.e., the print statements) before you submit your solution.

Task 3: Determine infection for a given person

Your next task is to complete the function gets_infected_at_position.

This function will determine whether someone at a given position in a list will become infected on the next day of the simulation. More specifically, given a city, a specified susceptible person’s location within that city, and an infection rate r, your code should:

  1. Determine whether the person has an infected neighbor.
  2. If and only if the person has an infected neighbor, compute the immunity level of the person and determine whether they will become infected.
  3. Return whether the person becomes infected as a boolean.

You must use your has_an_infected_neighbor function to determine whether a susceptible person has an infected neighbor. Do not repeat the logic for determining infection transmission from a neighbor in this function!

Earlier, we described infection transmission as being similar to flipping a weighted coin. In this function, if (and only if) the person has an infected neighbor, you will compute the person’s current immune level, a value between 0.0 and 1.0, that is the result of flipping that weighted coin. We will use a random number generator to obtain that value and, more specifically, you will call random.random(), a function that returns a random floating point number between 0.0 and 1.0. If the resulting immune level is less than the infection rate, the person will become infected. Another way to think about it is that having an immune level greater than the infection rate allows a person to fight off the infection.

Unfortunately, using a random number generator means that every call to gets_infected_at_position can produce a different result, even if we use the exact same parameters when calling the function. This is because each time a random number generator like random.random() is called, it returns a new random number. This complicates debugging because the sequence of random numbers generated will impact the simulation.

Fortunately, we can ensure that random.random() returns the same sequence of numbers when it is called by initializing it with a seed value. It is common to set the seed value for a random number generator when debugging. If we do not provide the random number generator with a seed, it will usually derive the seed from the time at which it is called.

Since many of our tests use the same seed (20170217), we have defined a constant, TEST_SEED, with this value in sir.py for your convenience.

Let’s try out setting the seed using the value of sir.TEST_SEED and then making some calls to the random number generator in ipython3:

In [11]: sir.TEST_SEED
Out[11]: 20170217

In [12]: random.seed(sir.TEST_SEED)

In [13]: random.random()
Out[13]: 0.48971492504609215

In [14]: random.random()
Out[14]: 0.23010566619210782

In [15]: random.seed(sir.TEST_SEED)

In [16]: random.random()
Out[16]: 0.48971492504609215

In [17]: random.random()
Out[17]: 0.23010566619210782

(If your attempt to try out these commands in ipython3 fails with a name error, you probably skipped the set up steps described in the Getting started section. Exit ipython3 and restart it following the instructions above.)

Notice that the third and fourth calls to random.random() generate exactly the same values as the first two calls. Why? Because we set the seed to the exact same value before the first and third calls.

This has another implication for testing our code: it is crucial that you only compute a person’s immune level when they have at least one infected neighbor. If you call the random number generator more often than necessary, your code may generate different answers than ours on subsequent tasks.

Testing for Task 3

As in Task 1 and 2, we strongly encourage you to do some testing by hand in ipython3 before you start using the automated tests. Unlike previous tasks, you have to be careful to initialize the random seed before calling gets_infected_at_position, to make sure you get the expected results. For example:

In [18]: random.seed(sir.TEST_SEED)

In [19]: sir.gets_infected_at_position(['S', 'I1', 'I1'], 0, 0.5)
Out[19]: True

In [20]: random.seed(sir.TEST_SEED)

In [21]: sir.gets_infected_at_position(['S', 'I1', 'I1'], 0, 0.3)
Out[21]: False

The table below provides information about the automated tests for gets_infected_at_position. Each row contains a test number, the seed used to initialize the random number generator, the values that will be passed for the city, position, and infection_rate arguments for that test, and the expected result. The last column indicates who the infected neighbors are.

Tests for gets_infected_at_position
Test Seed City Position Infection rate Expected result Infected neighbor(s)
1 20170217 [‘S’, ‘S’, ‘I1’] 0 0.5 True Left
2 20170217 [‘S’, ‘S’, ‘I1’] 0 0.2 False Left
3 20170217 [‘S’, ‘I1’, ‘S’] 0 1.0 True Right
4 20170217 [‘S’, ‘I1’, ‘S’] 0 0.2 False Right
5 20170217 [‘S’, ‘I1’, ‘I1’] 0 1.0 True Both
6 20170217 [‘S’, ‘I1’, ‘I1’] 0 0.2 False Both
7 20170217 [‘S’, ‘R’, ‘S’] 0 1.0 False None
8 20170217 [‘I1’, ‘S’, ‘S’, ‘S’] 2 1.0 False None

There are an additional three tests that test multiple cases at once:

  • Test 9: Similar to tests 1-8 but using position 1
  • Test 10: Similar to tests 1-8 but using position 2
  • Test 11: Tests all three positions of city ['S', 'S', 'S']

You can run these tests by executing the following command from the Linux command-line:

$ py.test -v -x -k position

Debugging suggestions and hints for Task 3

If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:

  • Print the result you are getting from has_an_infected_neighbor.
  • Print the immune level (if needed).
  • Make sure that you are making the right number of calls to random.random (zero or one).
  • When testing in ipython3, ensure that you have reset the seed for the random number generator before each test call to gets_infected_at_position.

Task 4: Move the simulation forward a single day

Your fourth task is to complete the function simulate_one_day. In layman’s terms, this function will model one day in a simulation and will act as a helper function to run_simulation. More concretely, simulate_one_day should take the city’s state at the start of the day, and the infection rate r and return a new list of disease states (i.e., the state of the city after one day).

To do this work, you will iterate over all possible locations in the city. If the person at a location is:

  1. Susceptible ('S'): you need to determine whether they will become infected ('I1') or remain susceptible ('S') for another day using your gets_infected_at_position function.
  2. Infected ('I1', 'I0'): people in state 'I1' go to state 'I0', and people in state 'I0' go to state 'R'
  3. Recovered ('R'): people in this state remain in that state.

As an example, consider the following call to simulate_one_day:

In [22]: sir.simulate_one_day(['I0', 'I1', 'R'], 0.3)
Out[22]: ['R', 'I0', 'R']

Since this city doesn’t have any susceptible people, the states just advance automatically according to the rules we described above ( 'I0' goes to 'R', 'I1' goes to 'R', and 'R' stays the same).

However, when you do encounter a susceptible person, you will need to call gets_infected_at_position, which will involve calls to the random number generator. So, when testing simulate_one_day in ipython3, you must make sure to reset the seed (random.seed()) between calls to simulate_one_day.

For example:

In [23]: random.seed(sir.TEST_SEED)

In [24]: sir.simulate_one_day(['I0', 'I1', 'S'], 0.5)
Out[24]: ['R', 'I0', 'I1']

In this case, the first two positions advanced the same way as before, but the third position required a call to gets_infected_at_position (which, in this particular case, with this random seed, will return True). On the other hand, if we use a lower infection rate, the third position is not infected:

In [25]: random.seed(sir.TEST_SEED)

In [26]: sir.simulate_one_day(['I0', 'I1', 'S'], 0.1)
Out[26]: ['R', 'I0', 'S']

Testing for Task 4

The table below provides information about the tests for simulate_one_day. Each row contains a test number, the seed, the values that will be passed for the city and infection_rate arguments for that test, and the expected result.

Tests for simulate_one_day
Test Seed City Infection rate Result
1 20170217 [‘I1’, ‘I1’, ‘I1’] 0.0 [‘I0’, ‘I0’, ‘I0’]
2 20170217 [‘I0’, ‘I0’, ‘I0’] 0.0 [‘R’, ‘R’, ‘R’]
3 20170217 [‘R’, ‘R’, ‘R’] 0.0 [‘R’, ‘R’, ‘R’]
4 20170217 [‘I1’, ‘I1’, ‘S’] 0.2 [‘I0’, ‘I0’, ‘S’]
5 20170217 [‘I1’, ‘I1’, ‘S’] 0.5 [‘I0’, ‘I0’, ‘I1’]
6 20170217 [‘I0’, ‘I0’, ‘S’] 0.5 [‘R’, ‘R’, ‘I1’]
7 20170217 [‘S’, ‘I0’, ‘S’] 0.9 [‘I1’, ‘R’, ‘I1’]
8 20170217 [‘S’, ‘I0’, ‘S’] 0.3 [‘S’, ‘R’, ‘I1’]
9 20170217 [‘S’, ‘S’, ‘S’] 1.0 [‘S’, ‘S’, ‘S’]

The first three tests check that the rules that involve no randomness are being applied correctly. Notice how, in this case, the infection rate has no effect on the outcome, and we arbitrarily set it to 0.0.

The fourth, fifth, and sixth tests have one susceptible person (who does not become infected in the fourth case, but does become infected in the fifth and sixth test). The seventh and eighth tests have two susceptible persons (both become infected in the seventh test, only one in the eighth case). The ninth test case has no infected people, which means no susceptible persons should become infected (even if we use an infection rate of 1.0). Notice that these tests not only verify basic functionality but also examine trickier edge cases. For example, the sixth test case checks that the susceptible person becomes infected even when the infected neighbors recover in the same day.

You can run these tests by executing the following command from the Linux command-line:

$ py.test -v -x -k one

Debugging suggestions for Task 4

If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:

  • Use simple infection rates that will not rely on the random number generator (like 0.0 and 1.0) to verify that the states change as expected.
  • Print out each person’s old and new disease states. Ensure that the new disease states are correct in all cases.

Detour: data files

Before moving on to your next task, we need to take a short detour and talk about the parameter files used to test the remaining tasks. These files can be found in the configs subdirectory of your pa1 directory and are numbered. As an example, the file 1.json (json is pronounced jay-sawn) contains the starting parameters for simulation 1:

{
    "starting_state" : ["S", "I1", "I1"],
    "starting_seed" : 20170217,
    "max_num_days" : 100,
    "infection_rate" : 0.5,
    "num_trials" : 10
}

Here, the starting_state refers to the initial disease state of the city, the random_seed is the seed for the random number generator, max_num_days refers to the maximum number of days to simulate, infection_rate refers to the infection rate of the simulation, and num_trials is the number of trials to run, which will be used in Task 5.

You can view the contents of a text file using the Linux commands cat, more, or less. For example, you can run:

$ cat configs/1.json

to see the contents of the file configs/1.json.

We have provided code in util.py that will read a configuration file and return the parameters for a simulation. You can use the function util.get_config() to extract the parameters for each trial from the JSON files provided, as shown below:

In [27]: starting_state, random_seed, max_num_days, \
    ...:   infection_rate, num_trials = util.get_config('./configs/1.json')

In [28]: starting_state
Out[28]: ['S', 'I1', 'I1']

In [29]: infection_rate
Out[29]: 0.5

In [30]: random_seed
Out[30]: 20170217

Once you have loaded these values, you can pass them to a function as needed. For example:

In [31]: random.seed(random_seed)

In [32]: sir.simulate_one_day(starting_state, infection_rate)
Out[32]: ['I1', 'I0', 'I0']

For the remaining tasks, we will list the parameter filename rather than the values of the parameters for each test.

Task 5: Run the simulation

Your fifth task is to complete the function run_simulation, which takes the starting state of the city, the random seed, the maximum number of days to simulate (max_num_days), and the infection rate as arguments and returns both the final state of the city and the number days simulated as a tuple.

To clarify:

  • Your function should run one whole simulation, including setting the seed exactly once before simulating any days.
  • Your simulation should start on day 0 and count the numbers of the days simulated. For example, if your simulation starts on day 0 and reaches the stopping conditions after simulating one day, it should return 1 as the number of days. On the other hand, if your simulation starts on day 0 and runs for day 0 and day 1, it should return 2 as the number of days simulated.
  • Recall that there are two stopping conditions for this simulation: that max_num_days days have passed or that no one in the city is infected after simulating a given day. You should use the count_infected function from Task 1 to check the second stopping condition, and you should check this condition after you simulate a day (not before). Thus, as long as max_num_days is greater than zero, you should always simulate at least one day, even if no person in the city is infected at the start of the simulation.

Here is an example use of this function:

In [33]: starting_state, random_seed, max_num_days, \
    ...:   infection_rate, num_trials = util.get_config('./configs/2.json')

In [34]: sir.run_simulation(starting_state, random_seed, max_num_days, infection_rate)
Out[34]: ['R', 'R', 'R'], 3

Notice that our sample use did not include a call to set the random seed. Your run_simulation function should set the seed based on the random seed parameter, so you will not need to reset the seed manually to test this function.

Testing Task 5

We have provided six tests for this task. The first three test basic functionality and the last three explicitly check stopping conditions.

Tests for run_simulation
Test Configuration file Expected Result: city, number of days simulated
1 3.json [‘S’, ‘R’, ‘R’], 3
2 4.json [‘S’, ‘S’, ‘R’], 2
3 5.json [‘R’, ‘S’, ‘R’, ‘R’, ‘S’], 2
4 6.json [‘R’, ‘R’, ‘R’], 2
5 7.json [‘R’, ‘S’, ‘S’], 1
6 8.json [‘R’, ‘R’, ‘R’, ‘I0’, ‘I1’, ‘S’, ‘S’], 3

You can run these tests by executing the following command from the Linux command-line.

$ py.test -v -x -k run_simulation

Debugging hints for Task 5

If you are struggling to get started or to return the correct values in your function, consider the following suggestions to debug your code:

  • If your function returns one fewer or one more day than our test function, please reread the directions and ensure that you are counting the days properly.
  • If you are generating the wrong final state for the city, try printing the day (0, 1, 2, etc.), the disease states before the call to simulate_one_day, and the disease states after the call to simulate_one_day.

From this point on, we will not be providing explicit debugging hints. In general, it is a good idea to use print statements to uncover what your code is doing.

Task 6: Determining average infection spread

Your sixth task is to complete the function compute_average_num_infected, which computes the average number of infected people over num_trials trials for a given city and infection rate. This function takes the starting state of the city, the random seed, the maximum number of days to simulate, the infection rate, and the number of trials to run as arguments and returns the average number of infected people over the num_trials trials. The number of infected people per trial is simply the number of people who end the simulation in state 'I1', 'I0' or 'R'. The average number of infected people over time is the average number of people per trial who get infected.

Each time you run a simulation, you should increase the random seed by 1. It is important that you carefully increment your random seed. If you forget to increment your seed, all trials will be identical, and if you increment your seed differently than specified, your code may not pass our tests.

Your implementation should call run_simulation, which sets the seed, so unlike some of the earlier tasks, you do not need to call random.seed before running this function.

Here’s a sample use of this function:

In [35]: starting_state, random_seed, max_num_days, infection_rate, num_trials = \
    ...: util.get_config('./configs/3.json')

In [36]: starting_state
Out[36]: ['S', 'S', 'I1']

In [37]: num_trials
Out[37]: 5

In [38]: sir.compute_average_num_infected(starting_state,
    ...: random_seed, max_num_days, infection_rate, num_trials)
Out[38]: 2.4

How did the function arrive at an average of 2.4 infected people? We first consider the number of infected people per simulation. Notice how the random seed changes for each trial, while the starting state is the same for every trial.

Intermediate values from compute_average_num_infected
Simulation number Seed Starting state for simulation run Final state for simulation run Number of people infected (Ix + R)
1 20170217 [‘S’, ‘S’, ‘I1’] [‘S’, ‘R’, ‘R’] 2
2 20170218 [‘S’, ‘S’, ‘I1’] [‘R’, ‘R’, ‘R’] 3
3 20170219 [‘S’, ‘S’, ‘I1’] [‘R’, ‘S’, ‘R’] 2
4 20170220 [‘S’, ‘S’, ‘I1’] [‘R’, ‘R’, ‘R’] 3
5 20170221 [‘S’, ‘S’, ‘I1’] [‘S’, ‘R’, ‘R’] 2

Then, we must average the number of people infected per trial to arrive at: \(12/5 = 2.4\).

Testing Task 6

We have provided four tests for this task. The first can be checked easily with print statements, the second and third test basic functionality, and the final one tests an edge case (num_trials = 1).

Tests for compute_average_num_infected
Test JSON file N Average number infected
1 9.json 20 4.0
2 10.json 10 4.6
3 11.json 100 23.03
4 12.json 1 8.0

You can run these tests by executing the following command from the Linux command-line.

$ py.test -v -x -k average

Task 7: Determining the impact of infection rate

Your seventh and final task is to complete the function infection_rate_param_sweep, which shows how the number of people infected over a specified number of trials varies with infection rate.

This function takes the starting state, random seed, the maximum number of days to simulate, a list of infection rates, and a number of trials as parameters and returns a list of the average number of people infected for each infection rate. Your function should iterate through the list of infection rates and call compute_average_infection_rate for each infection rate. You should store the resulting averages in a list that your function will return.

We have provided a list of infection rates in sir.py called INFECTION_RATE_LIST. You may find this list useful while you are writing and testing your function. In addition, you may find it helpful to consider edge cases of infection rates. For example, at an infection rate of 0, the disease will never spread from those infected to those susceptible. Similarly, at an infection rate of 1, the disease will spread each day from those infected to all of their susceptible neighbors.

Here is an example use of this function:

In [39]: starting_state, random_seed, max_num_days, infection_rate, num_trials = \
    ...: util.get_config('./configs/3.json')

In [40] num_trials
Out[40] 5

In [41] sir.INFECTION_RATE_LIST
Out[41] [0, 0.25, 0.5, 0.75, 1.0]

In [42]: sir.infection_rate_param_sweep(starting_state,
    ...: random_seed, max_num_days, sir.INFECTION_RATE_LIST, num_trials)
Out[42]: [1.0, 1.6, 2.8, 3.0, 3.0]

Recall that the infection rate measures how contagious a given disease is. A higher infection rate, then, should increase the average number of people infected in a city over a number of trials. As we would expect, the number of people infected per trial increases as the infection rate increases.

Testing Task 7

We have provided seven tests for this task. The first five tests use the infection rate list defined by sir.INFECTION_RATE_LIST. Of these, the first three end quickly and can be checked using print statements The next two use much bigger values for the number of trials to run and cannot be computed by hand easily. The last two tests check your code on corner cases for the infection rate list: the sixth test uses [] as the infection rate list, while the seventh test uses [1.0]. (“Zero, one, many” is a good rule of thumb when testing a function that works on lists. That is, in most cases, you should test such functions on a list of size zero, a list of size one, and a list with many entries.)

Tests for infection_rate_param_sweep
Test JSON file Average number infected list
1 13.json [2.0, 2.4, 3.5, 4.3, 5.0]
2 14.json [2.0, 2.4, 3.7, 4.6, 5.0]
3 15.json [2.0, 2.45, 2.77, 2.945, 3.0]
4 16.json [16.0, 21.824, 29.252, 35.078, 37.0]
5 17.json [38.0, 72.7, 106.2, 180.0, 397.0]
6 1.json []
7 1.json [3.0]

You can run these tests by executing the following command from the Linux command-line.

$ py.test -v -x -k sweep

Putting it all together

We have included code in sir.py that calls your functions to compute and then analyze various simulations. It takes a single argument: the name of the input starting state JSON file.

Here is a sample use of this program:

$ python3 sir.py ./configs/1.json

and here is the output that it should return:

Running initial simulation...
The starting state of the simulation was ['S', 'I1', 'I1'].
The final state of the simulation is ['R', 'R', 'R'].
The simulation ended after day 3.
Running multiple trials...
Over 10 trial(s), on average, 2.8 people were infected
Varying infection parameter...
Rate | Infected
 0.0 | 2.00
 0.2 | 2.40
 0.5 | 2.80
 0.8 | 3.00
 1.0 | 3.00

Grading

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our PA Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

  • Completeness: 75%
  • Correctness: 15%
  • Design: 0%
  • Style: 10%

Obtaining your test score

The completeness part of your score will be determined using automated tests. To get your score for the automated tests, simply run the following from the Linux command-line. (Remember to leave out the $ prompt when you type the command.)

$ py.test
$ ../common/grader.py

Notice that we’re running py.test without the -k or -x options: we want it to run all the tests. If you’re still failing some tests, and don’t want to see the output from all the failed tests, you can add the --tb=no option when running py.test:

$ py.test --tb=no
$ ../common/grader.py

Take into account that the grader.py program will look at the results of the last time you ran py.test so, if you make any changes to your code, you need to make sure to re-run py.test. You can also just run py.test followed by the grader on one line by running this:

$ py.test --tb=no; ../common/grader.py

After running the above, you should see something like this (of course, your actual scores may be different!):

Category                                                       Passed / Total       Score  / Points
----------------------------------------------------------------------------------------------------
Task 1: Count the number of infected people in a city          2      / 4           2.50   / 5.00
Task 2: Is one of our neighbors infected?                      6      / 6           7.50   / 7.50
Task 3: Determine infection for a given person                 11     / 11          7.50   / 7.50
Task 4: Move the simulation forward a single day               9      / 9           15.00  / 15.00
Task 5: Run the simulation                                     2      / 6           5.00   / 15.00
Task 6: Determining average infection spread                   1      / 4           3.75   / 15.00
Task 7: Determining the impact of infection rate               1      / 7           1.43   / 10.00
----------------------------------------------------------------------------------------------------
                                                                            TOTAL = 42.68  / 75
====================================================================================================

Submission

To submit your assignment, make sure that you have:

  • put your name at the top of your file,
  • registered for the assignment using chisubmit (if you have not done so already),
  • added, committed, and pushed your code to the git server, and
  • run the chisubmit submission command.

Here are the relevant commands to run on the Linux command-line. (Remember to leave out the $ prompt when you type the command.)

$ chisubmit student assignment register pa1

$ git add sir.py
$ git commit -m"final version of PA #1 ready for submission"
$ git push

$ chisubmit student assignment submit pa1

We recommend copying and pasting these commands rather than re-typing them!

Remember to push your code to the server early and often!

Acknowledgments: This assignment was inspired by a discussion of the SIR model in the book Networks, Crowds, and Markets by Easley and Kleinberg. Emma Nechamkin wrote the original version of this assignment.