Homework #2¶
Due: Friday, October 13th at 11:59pm
This homework is intended to provide additional practice with writing Bash scripts. Specifically, using more advanced bash syntax and filtering (i.e., text processing) commands to process text.
CS Linux Machine¶
You will need access to an Linux based machine when working on your homework assignments. You should not test your programs on macOS or Windows Linux because these operating systems do not provide all utility commands necessary for completing this and possibly future assignments. Additionally, if they do provide a command then it may not contain all options that a Unix-like system provides. We will use and grade all assignments on the CS Linux machines and all programming assignments must work correctly on these machines. However, you can work locally on a Unix or Unix-like machine but ensure that you test your final solutions on a CS Linux machine.
Please follow the instructions provided here
Creating Your Private Repository¶
For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.
To actually get your private repository, you will need this invitation URL:
HW2 invitation (Please check the Post “Homework #2 is ready” Ed)
When you click on an invitation URL, you will have to complete the following steps:
You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.
Note
If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.
You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
- You now need to clone your repository (i.e., download it to your machine).
Make sure you’ve set up SSH access on your GitHub account.
For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs51082-aut23/hw2-GITHUB-USERNAME.git.
If you do not know how to use
git clone
to clone your repository then follow this guide that Github provides: Cloning a Repository
If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.
S Each assignment may include a section dedicated to answering a few short answer questions. Please turn in a readable text file named saqs (e.g., saqs.pdf). Store this file inside the hw2/saqs directory. You will place your answers to the following questions in this document. The file is not required to be a PDF but any text document that is easy to open is fine.
IMPORTANT: Eligible Commands & Bash¶
For this homework assignment, you are only allowed to use the following commands with or without using their options in your scripts.
All eligible commands provided in homework 1.
seq
touch
mkdir
readarray
eval
rm
read
exit
comm
tr
grep
sed
awk
Using any other command not specified in this list will result in a major penalty for that specific problem. You are free to use the entirety of the bash scripting language but are restricted to using only the above commands within your script files.
If you have any questions about using a command ask on Ed before you use it!
Problem 1¶
Problem 1 is composed of a series of # subproblems (i.e., Problem 1a-1#). For each subproblem, you will do the following
Write a bash script that takes in an argument that is a single plain text file that contains some random text. The script must
exit 1
if the single argument is not supplied. No error message is printed.Each script will begin with the following skeleton code
#! /usr/bin/env bash set -o errexit set -o nounset set -o pipefail # TODO: Error-Check the command line argument is supplied. # ONE_LINE_COMMAND
however,
# ONE_LINE_COMMAND
will be replaced with a one line command (this could also be a pipe of commands) that accomplishes the task of the subproblem. You can only use the commands listed in the Eligible Commands section below. For example, if the subproblem askedFind the top ten most frequently used words in the supplied command line argument. Assume there is only one word on each line in the file. Place the solution in a file called example.sh
The solution would be:
#! /usr/bin/env bash set -o errexit set -o nounset set -o pipefail # TODO: Error-Check the command line argument is supplied. sort "$1" | uniq -c | sort -n -r | head -n 10
You can also use the syntax of using a backlash
\
to make the pipes line up on seperate lines#! /usr/bin/env bash set -o errexit set -o nounset set -o pipefail # TODO: Error-Check the command line argument is supplied. sort "$1" \ # Make sure there is no space/character after the backslash! | uniq -c \ | sort -n -r \ | head -n 10
and one sample run (using the
indep.txt
from module 1) would be:$ bash example.sh indep.txt 77 of 75 the 64 to 55 and 25 our 20 their 20 has 20 for 18 in 18 He
We will refer to the plain text file given as the single command line argument as
$1
in the problem descriptions.You will do you own testing for the problems. You should define your own plain text files that include data to test your script.
Problem 1a¶
Inside the file p1/a.sh
, display the total number of U.S. phone numbers inside the $1
. For this problem,
we will assume a U.S. phone number is defined to have the following structure: (AREA_CODE-FIRST_PART-SECOND_PART
),
where AREA_CODE
and FIRST_PART
must only contain three digits and SECOND_PART
must only contain four digits.
Each part is separated by a single -
. For purposes of this problem, the number must be wrapped in parenthesis.
For example, (702-567-4954)
amd (345-324-3124)
are valid phone numbers. Your output is a single integer, which is the
total number of valid U.S. phone numbers.
Problem 1b¶
Inside the file p1/b.sh
, Using $1
, find all the dates with format day/month/year
and display only the years between [1900-1999] (inclusive) in ascending order. Your output should be something like
1901
1902
...
Problem 1c¶
Inside the file p1/c.sh
, Using $1
, find all the dates with format day/month/year
and display all the years but removes the years 2020
and 2021
from the output.
Problem 1d¶
For this problem, assume that $1
contains a single line composed of two positive integers within the range [1,26]
. The integers are delimited by a comma (eg., 4,5
). The first integer represents a START
index
and the second integer represents an END
index. START
will always be greater than END
.
Inside the file p1/d.sh
, Using $1
, print out the lowercase letters of the alphabet starting with the letter at START
index and ending at the letter at the END
index. Assume a
begins at index 1 and z
is at index 26. The output should contain a single space between characters.
Sample runs
$ echo "2,4" > letters.txt ; bash d.sh letters.txt
b c d
$ echo "1,26" > letters.txt ; bash d.sh letters.txt
a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo "25,26" > letters.txt ; bash d.sh letters.txt
y z
Problem 1e¶
This problem requires you to learn about the eval
and column
commands. You will need to use them in this problem so take the take to research what they do. Ask questions on Ed if you need additional clarification.
For this problem, assume that $1
contains a single line composed of three positive integers that are always greater than zero. The integers are delimited by a comma (eg., 6,3,2
). The first integer
represents the total number of homework assignments, the second integer the total number of projects, and the last integer is the total number of exams.
Inside the file p1/e.sh
, Using $1
, output a three column table where the header will always be Homework Projects Exams
. The rows for each column will be filled as follows
Each row in the
Homework
column will be a string prefixed withhw
followed by the row number up until the total number of homework assignments specified in$1
.Each row in the
Projects
column will be a string prefixed withproj
followed by the row number up until the total number of projects specified in$1
.Each row in the
Exams
column will be a string prefixed withexam
followed by the row number up until the total number of exams specified in$1
.
For example, assume there is a file named course.txt
that contains the single line
7,5,2
Then running the script bash e.sh course.txt
will produce the following table
Homework Projects Exams
hw1 proj1 exam1
hw2 proj2 exam2
hw3 proj3
hw4
hw5
hw6
hw7
Problem 2¶
Imagine you are an intern working at the Chicago Transit Authority (CTA), and your boss provides you a file that lists all the L stops in Chicago that also includes
information about the CTA lines associated with each stop (i.e., Red Line, Purple Line, Green Line, etc.). Your boss wants you to write a bash script (inside p2/p2.sh
) that produces a
two column output where the first column is the stop name and the second column is the name of the line associated with that stop. The columns must be comma separated. For example, looking at
the first 10 lines inside p2/lstops.csv
, the first 10 lines of the output must look like this
Cicero,Pink
Central Park,Pink
Halsted,Green
Cumberland,Blue
Racine,Blue
Paulina,Brown
18th,Pink
Clark/Lake,Blue
Clark/Lake,Brown
Clark/Lake,Green
Clark/Lake,Orange
Clark/Lake,Purple
Clark/Lake,Pink
Jefferson Park,Blue
Unfortunately, the file format is not very computer/programmer friendly. You’ll need to manually look at the file to determine how to generate the
two column output (hint: regular expressions). If a stop has multiple lines associated with it (e.g., "Clark/Lake"
) then there must be a
line in the output for each CTA L line associated with that stop. The output must not have any duplicated lines. The output is not
required to be ordered in a specific way. The second column should only contain valid CTA Lines:
"Red"
, "Blue"
, "Brown"
, "Green"
, "Orange"
, "Pink"
, "Purple"
, and "Yellow"
, where each line begins with a capital letter.
Your script must execute based on the following command line arguments:
bash p2.sh
- list all stops with their CTA L linesbash p2.sh red
- list all CTA red line stopsbash p2.sh blue
- list all CTA blue line stopsbash p2.sh brown
- list all CTA brown line stopsbash p2.sh green
- list all CTA green line stopsbash p2.sh orange
- list all CTA orange line stopsbash p2.sh pink
- list all CTA pink line stopsbash p2.sh purple
- list all CTA purple line stopsbash p2.sh yellow
- list all CTA yellow line stops
For example, running bash p2.sh pink on the first 10 lines are
Cicero,Pink
Central Park,Pink
18th,Pink
Clark/Lake,Pink
You can assume that there will always be a file p2/lstops.csv
to retrieve the stops. This means the script does not take in the file as a command line argument but rather you can hardcode in the filename in the script file. Please note I”m only showing you the first 10 lines in the examples but your script should work on the entire file.
Error checking requirements:
The script can take in zero or one command line argument only. If a script receives more than one command line argument then
exit 1
. No error message is printed.If the script receives a single command line argument then it must be either:
red
,blue
,brown
,green
,orange
,pink
,purple
, oryellow
. Anything else should make the scriptexit 1
.
Grading¶
Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)
The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:
Problem 1: 50%
Problem 2: 50%
There are no automated-tests for this first homework assignment. We will combine completeness and correctness together and just verify manually that your code is working according to the specification of the problem.
Submission¶
Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #2” assignment page via two ways,
Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.
Note
For either option, you must upload the entire directory structure; otherwise, your automated test grade will not run correctly and you will be penalized if we have to manually run the tests. Going with the first option will do this automatically for you. You can always add additional directories and files (and even files/directories inside the stater directories) but the default directory/file structure must not change.
Depending on the assignment, once you submit your work, an “autograder” will run. This autograder should produce the same test results as when you run the code yourself; if it doesn’t, please let us know so we can look into it. A few other notes:
You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.