Homework #2

Due: Friday, October 13th at 11:59pm

This homework is intended to provide additional practice with writing Bash scripts. Specifically, using more advanced bash syntax and filtering (i.e., text processing) commands to process text.

CS Linux Machine

You will need access to an Linux based machine when working on your homework assignments. You should not test your programs on macOS or Windows Linux because these operating systems do not provide all utility commands necessary for completing this and possibly future assignments. Additionally, if they do provide a command then it may not contain all options that a Unix-like system provides. We will use and grade all assignments on the CS Linux machines and all programming assignments must work correctly on these machines. However, you can work locally on a Unix or Unix-like machine but ensure that you test your final solutions on a CS Linux machine.

Please follow the instructions provided here

Creating Your Private Repository

For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.

To actually get your private repository, you will need this invitation URL:

  • HW2 invitation (Please check the Post “Homework #2 is ready” Ed)

When you click on an invitation URL, you will have to complete the following steps:

  1. You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

Note

If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.

  1. You must click “Accept this assignment” or your repository will not actually be created.

  2. After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.

  3. You now need to clone your repository (i.e., download it to your machine).
    • Make sure you’ve set up SSH access on your GitHub account.

    • For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs51082-aut23/hw2-GITHUB-USERNAME.git.

    • If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

S Each assignment may include a section dedicated to answering a few short answer questions. Please turn in a readable text file named saqs (e.g., saqs.pdf). Store this file inside the hw2/saqs directory. You will place your answers to the following questions in this document. The file is not required to be a PDF but any text document that is easy to open is fine.

IMPORTANT: Eligible Commands & Bash

For this homework assignment, you are only allowed to use the following commands with or without using their options in your scripts.

  • All eligible commands provided in homework 1.

  • seq

  • touch

  • mkdir

  • readarray

  • eval

  • rm

  • read

  • exit

  • comm

  • tr

  • grep

  • sed

  • awk

Using any other command not specified in this list will result in a major penalty for that specific problem. You are free to use the entirety of the bash scripting language but are restricted to using only the above commands within your script files.

If you have any questions about using a command ask on Ed before you use it!

Problem 1

Problem 1 is composed of a series of # subproblems (i.e., Problem 1a-1#). For each subproblem, you will do the following

  1. Write a bash script that takes in an argument that is a single plain text file that contains some random text. The script must exit 1 if the single argument is not supplied. No error message is printed.

  2. Each script will begin with the following skeleton code

    #! /usr/bin/env bash
    
    set -o errexit
    set -o nounset
    set -o pipefail
    
    # TODO: Error-Check the command line argument is supplied.
    
    # ONE_LINE_COMMAND
    

    however, # ONE_LINE_COMMAND will be replaced with a one line command (this could also be a pipe of commands) that accomplishes the task of the subproblem. You can only use the commands listed in the Eligible Commands section below. For example, if the subproblem asked

    Find the top ten most frequently used words in the supplied command line argument. Assume there is only one word on each line in the file. Place the solution in a file called example.sh

    The solution would be:

    #! /usr/bin/env bash
    
    set -o errexit
    set -o nounset
    set -o pipefail
    
    # TODO: Error-Check the command line argument is supplied.
    
    sort "$1" | uniq -c | sort -n -r | head -n 10
    

    You can also use the syntax of using a backlash \ to make the pipes line up on seperate lines

    #! /usr/bin/env bash
    
    set -o errexit
    set -o nounset
    set -o pipefail
    
    # TODO: Error-Check the command line argument is supplied.
    
    sort "$1" \ # Make sure there is no space/character after the backslash!
    | uniq -c \
    | sort -n -r \
    | head -n 10
    

    and one sample run (using the indep.txt from module 1) would be:

    $ bash example.sh indep.txt
        77 of
        75 the
        64 to
        55 and
        25 our
        20 their
        20 has
        20 for
        18 in
        18 He
    
  3. We will refer to the plain text file given as the single command line argument as $1 in the problem descriptions.

  4. You will do you own testing for the problems. You should define your own plain text files that include data to test your script.

Problem 1a

Inside the file p1/a.sh, display the total number of U.S. phone numbers inside the $1. For this problem, we will assume a U.S. phone number is defined to have the following structure: (AREA_CODE-FIRST_PART-SECOND_PART), where AREA_CODE and FIRST_PART must only contain three digits and SECOND_PART must only contain four digits. Each part is separated by a single -. For purposes of this problem, the number must be wrapped in parenthesis. For example, (702-567-4954) amd (345-324-3124) are valid phone numbers. Your output is a single integer, which is the total number of valid U.S. phone numbers.

Problem 1b

Inside the file p1/b.sh, Using $1, find all the dates with format day/month/year and display only the years between [1900-1999] (inclusive) in ascending order. Your output should be something like

1901
1902
...

Problem 1c

Inside the file p1/c.sh, Using $1, find all the dates with format day/month/year and display all the years but removes the years 2020 and 2021 from the output.

Problem 1d

For this problem, assume that $1 contains a single line composed of two positive integers within the range [1,26]. The integers are delimited by a comma (eg., 4,5). The first integer represents a START index and the second integer represents an END index. START will always be greater than END.

Inside the file p1/d.sh, Using $1, print out the lowercase letters of the alphabet starting with the letter at START index and ending at the letter at the END index. Assume a begins at index 1 and z is at index 26. The output should contain a single space between characters.

Sample runs

$ echo "2,4" > letters.txt ; bash d.sh letters.txt
b c d
$ echo "1,26" > letters.txt ; bash d.sh letters.txt
a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo "25,26" > letters.txt ; bash d.sh letters.txt
y z

Problem 1e

This problem requires you to learn about the eval and column commands. You will need to use them in this problem so take the take to research what they do. Ask questions on Ed if you need additional clarification.

For this problem, assume that $1 contains a single line composed of three positive integers that are always greater than zero. The integers are delimited by a comma (eg., 6,3,2). The first integer represents the total number of homework assignments, the second integer the total number of projects, and the last integer is the total number of exams.

Inside the file p1/e.sh, Using $1, output a three column table where the header will always be Homework Projects Exams. The rows for each column will be filled as follows

  • Each row in the Homework column will be a string prefixed with hw followed by the row number up until the total number of homework assignments specified in $1.

  • Each row in the Projects column will be a string prefixed with proj followed by the row number up until the total number of projects specified in $1.

  • Each row in the Exams column will be a string prefixed with exam followed by the row number up until the total number of exams specified in $1.

For example, assume there is a file named course.txt that contains the single line

7,5,2

Then running the script bash e.sh course.txt will produce the following table

Homework  Projects  Exams
hw1       proj1     exam1
hw2       proj2     exam2
hw3       proj3
hw4
hw5
hw6
hw7

Problem 2

Imagine you are an intern working at the Chicago Transit Authority (CTA), and your boss provides you a file that lists all the L stops in Chicago that also includes information about the CTA lines associated with each stop (i.e., Red Line, Purple Line, Green Line, etc.). Your boss wants you to write a bash script (inside p2/p2.sh) that produces a two column output where the first column is the stop name and the second column is the name of the line associated with that stop. The columns must be comma separated. For example, looking at the first 10 lines inside p2/lstops.csv, the first 10 lines of the output must look like this

Cicero,Pink
Central Park,Pink
Halsted,Green
Cumberland,Blue
Racine,Blue
Paulina,Brown
18th,Pink
Clark/Lake,Blue
Clark/Lake,Brown
Clark/Lake,Green
Clark/Lake,Orange
Clark/Lake,Purple
Clark/Lake,Pink
Jefferson Park,Blue

Unfortunately, the file format is not very computer/programmer friendly. You’ll need to manually look at the file to determine how to generate the two column output (hint: regular expressions). If a stop has multiple lines associated with it (e.g., "Clark/Lake") then there must be a line in the output for each CTA L line associated with that stop. The output must not have any duplicated lines. The output is not required to be ordered in a specific way. The second column should only contain valid CTA Lines: "Red", "Blue", "Brown", "Green", "Orange", "Pink", "Purple", and "Yellow", where each line begins with a capital letter.

Your script must execute based on the following command line arguments:

  • bash p2.sh - list all stops with their CTA L lines

  • bash p2.sh red - list all CTA red line stops

  • bash p2.sh blue - list all CTA blue line stops

  • bash p2.sh brown - list all CTA brown line stops

  • bash p2.sh green - list all CTA green line stops

  • bash p2.sh orange - list all CTA orange line stops

  • bash p2.sh pink - list all CTA pink line stops

  • bash p2.sh purple - list all CTA purple line stops

  • bash p2.sh yellow - list all CTA yellow line stops

For example, running bash p2.sh pink on the first 10 lines are

Cicero,Pink
Central Park,Pink
18th,Pink
Clark/Lake,Pink

You can assume that there will always be a file p2/lstops.csv to retrieve the stops. This means the script does not take in the file as a command line argument but rather you can hardcode in the filename in the script file. Please note I”m only showing you the first 10 lines in the examples but your script should work on the entire file.

Error checking requirements:

  1. The script can take in zero or one command line argument only. If a script receives more than one command line argument then exit 1. No error message is printed.

  2. If the script receives a single command line argument then it must be either: red, blue, brown, green, orange, pink, purple, or yellow. Anything else should make the script exit 1.

Grading

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

  • Problem 1: 50%

  • Problem 2: 50%

There are no automated-tests for this first homework assignment. We will combine completeness and correctness together and just verify manually that your code is working according to the specification of the problem.

Submission

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #2” assignment page via two ways,

  1. Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.

  2. Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

Note

For either option, you must upload the entire directory structure; otherwise, your automated test grade will not run correctly and you will be penalized if we have to manually run the tests. Going with the first option will do this automatically for you. You can always add additional directories and files (and even files/directories inside the stater directories) but the default directory/file structure must not change.

Depending on the assignment, once you submit your work, an “autograder” will run. This autograder should produce the same test results as when you run the code yourself; if it doesn’t, please let us know so we can look into it. A few other notes:

  • You are allowed to make as many submissions as you want before the deadline.

  • Please make sure you have read and understood our Late Submission Policy.

  • Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).

  • Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.