Homework 5: BST and Memory
Due Monday, July 22, 2024 at 11:59pm
In this assignment, you will implement some functions in a binary search tree and use the data structures to implement a simple C program.
- Compile and test frequently
- Read the entire assignment first before you start
- Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
- Do not hesitate to seek help if you are stuck
Synopsis
In this homework, you will work in the hw5
directory.
map.c
: You will implement map_free
, map_get
and map_insert
functions. This file also contain other provided functions implementing the
header file at map.h
.
groups.c
: You will use lists and maps (a.k.a dictionaries) to implement a
C program grouping students with the same hometown together.
Written: You will answer some simple questions at the end.
Learning Objectives:
- Even pointer juggling
- Understand the structure of a binary search tree
- Practice heap allocation and manual memory management in a real program
Getting started
See homework 1.
What is provided?
The hw5
directory contains the following files:
Makefile
is a Makefile that specifies how to build this homework.groups.c
is a C file with starter template to implement thegroups
program.map.h
is the header file for a map implemented as a binary-search tree.map.c
is where you write your implementation of the three core functions in BST.map-test.c
contains unit tests for your implementation.tests/
is a directory containing sample inputs forgroups
.WRITTEN.txt
The directory also contains the following supporting files:
array-list.{h,c}
is a completed module implementing a dynamic-array list.linked-list.h
is the header file for the linked list module.linked-list.c
is currently empty. If you plan to use a linked list in your implementation, you should copy your implementation fromhw4
. I will release my solution once the late deadline has passed.tests.{c,h}
is a unit test framework that I use for this class. You can read them if you are curious.- Please ignore
memcheck.supp
.
Compiling
A Makefile is provided, containing the following rules.
make groups
makes the groups
program, depending on groups.c
,
array-list.c
, linked-list.c
, and map.c
. If you plan to use additional
files, you can do so by adding the name of the source in Makefile.
make map-test
makes the map-test
, the unit test for your map
implementation.
make all
, or simply make
, makes both of the above.
MacOS users who wish to run Address Sanitizer can also use the following two
commands to check for memory leaks in groups
and map-test
, respectively.
Specification
This homework is divided into two parts. You should implement map
first,
ensuring that all test cases pass before moving on to groups
.
Part I: BST
In this part, you will implement three core functions for a binary-search tree.
A map is represented by the following struct
.
struct map {
int (*cmp)(void *, void *); /* function pointer to compare keys */
struct tree_node *root; /* pointer to the root of the BST */
};
A variable whose type is struct map
contains a pointer to the root of the
tree (NULL
if the tree is empty), and a function that compares two keys.
Conventionally, a comparison function returns a negative value if the first
argument is less than the second, 0 if the first is equal to the second, and
positive if the first is greater than the second.
To use the comparison function stored in a struct
, you should use ordinary
field-access operators.
For example, the following code snippet branches based on the comparison results
between two keys key1
and key2
.
struct map *m /* = ... */;
void *key1, *key2;
if (m->cmp(key1, key2) < 0) {
// do something when key1 < key2
} else if (m->cmp(key1, key2) == 0) {
// do something when key1 == key2
} else {
// do something when key1 > key2
}
A tree node is represented as the following struct
.
struct tree_node {
void *key; /* the key in this node */
void *value; /* the value in this node */
struct tree_node *left; /* pointer to the left subtree */
struct tree_node *right; /* pointer to the right subtree */
};
key
s, a generic pointer. The ordering
between keys is given by the comparison function mentioned above.
Another generic pointer, value
, is associated with each key
.
Just like in linked-list
, the memory held by key
and value
is owned by the
user of the BST and should not be freed when the BST is freed.
map_free
deallocates all memory held by m
. The memory held by m
includes
the structure that m
points to (of type struct map
) as well as the entire
tree whose root is m->root
.
map_get
retrieves the value associated with a given key. If the key does not
exist in the binary search tree, map_get
should return NULL
.
map_insert
inserts a key-value pair into the tree. If the key is already in
the tree, the old value associated with the key is overwritten and map_insert
should return that old value. If the key is not in the tree, the new key-value
pair should be inserted in the tree, and map_insert
should return NULL
.
The detailed specifications of map_free
, map_get
, and map_insert
are
documented in the header file map.h
.
In addition, the starter code provides completed implementations for the following functions. Some of them are useful for debugging, and others provide hints for your implementation.
map_remove
removes a key from the tree, returning the value associated with
the key. If the key is not in the tree, this function does nothing and returns
NULL
.
map_walk
takes a function pointer that applies it to all key-value pairs
stored in the tree in ascending order of the keys. data
stores any
additional data that the function may need in its body. In pseudocode,
Finally, map
also implements two print functions.
map_print
prints all key-value pairs in the tree.
map_print_internal
additionally prints the structure of the tree for debugging
purposes.
Running unit tests
See homework 4.
Hints
A recursive solution is much easier than an iterative one, especially for
map_free
.
But you might need to define additional helper functions to perform recursion on
tree nodes --- all functions take struct map *
as arguments, which is not
helpful for a recursive definition.
For bonus points, implement map_free
, map_get
and map_insert
iteratively
(i.e. do not use recursion).
- Iterative
map_free
: +5% - Iterative
map_insert
: +3% - Iterative
map_get
: +2%
Part II. Groups
In this part, you will write a program called groups
, which when given a list
of names and their keys (e.g. hometowns) identifies groups of the names that
share the same key.
Your input is given by a command-line argument. If the input is stored in a file
input.txt
, we will invoke your program by ./groups input.txt
. The output
of your program should be on standard output stdout
.
The input is a sequence of lines. Each line begins with a key, followed by a
single tab character '\t'
, and followed by a name.
See files in tests/
directory for examples.
The following function is provided to read a line from a file.
false
.
The output of the program is specified as follows:
- If a key is associated with exactly one name, ignore the key (and the name);
- If a key is associated with two or more names, the key and the names constitute a group.
For each group, your program should print in the following format:
- The first line is the key of the group, immediately followed by a colon
:
. - Print each name in the group in its own line.
- Print one additional blank line at the end.
The groups are printed in ascending order of their keys, and names within a group are printed in the order in which they appear in the input file.
A reference implementation in Python is provided in groups.py
.
Example
An example input:
F Bolotin Ramsayer
C Defrancisco Nunn
F Worman Nilakantan
F Bradwell Viehweg
D Kensler Callaghan
A Donatelli Morocz
E Leisy Caskey
A Ruud Roots
A Stidham Kushnir
A Hamre Carrillo
A Applin Solman
Assuming the above content is stored in a file named example.txt
, running
./groups example.txt
should produce the following to the standard output:
A:
Donatelli Morocz
Ruud Roots
Stidham Kushnir
Hamre Carrillo
Applin Solman
F:
Bolotin Ramsayer
Worman Nilakantan
Bradwell Viehweg
Hints
Suggested Algorithm
You can implement this program in any way you like, as long as the program follows the above output specification. The Python implementation contains the same high-level algorithm that I used in my C implementation.
To summarize,
I use a map
with char *
as keys and a list of char *
s (either list would
work) as the values. Every time I see a key-value pair, I retrieve the list from
the map
, and append the name to the list. At the end, I use an in-order travel
to visit all key-value pairs, and print out the names if the list of name (i.e.
the "value" of a key-value pair) has more than one element.
Generic Pointers and Memory Management
All data structures in this course use a boxed approach, including the
provided array-list
module. (See the discussion in homework
4 for
details)
The data structures only manage the structure of data but not their lifetimes.
In other words, the data structures merely store pointers, and the users are
responsible to free the content behind those pointers if needed.
To avoid memory leaks, before starting your implementation, you should sketch out all heap-allocated data in your program and their lifetimes (Who allocates the data? Who is responsible for freeing them?)
In addition, the data structures do not know (or care) about the types of
pointers that you store in them --- almost all functions take or return void *
pointers.
But to access the memory, you must cast those pointers to the correct pointer
type.
struct llist *
, the above code casts the returned
pointer, whose type is void *
, to the correct type.
Casting void *
to any pointer type, or casting any pointer type to void *
,
is valid in C.
The compiler is not going to warn you when you cast a pointer to the "wrong"
type.
It is your responsibility to ensure that pointers have the correct types before
dereferencing them.
It is also a good idea to write down what all void *
pointers are supposed to
be before starting your implementation.
When in doubt, run valgrind
or Address Sanitizer.
Silencing unused variable
warnings
When you define a function with an unused argument, the compiler produces a warning. Sometimes, the argument needs to be there as a placeholder to match the type of a signature or a function pointer. You can let the compiler know that you acknowledge the unused variable by writing the following in the body of a function:
Written
You need to answer some questions in hw5/WRITTEN.txt
.
Submission checklist
In hw5/
:
make
produce no errors and produce executables,groups
andmap-test
.
All changes are committed and pushed to your github repository. Submit your program to Gradescope by selecting your coursework directory and the correct branch.
Grading
Percentage | |
---|---|
Correctness | 70% |
Style | 20% |
Written | 10% |
Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.