Skip to content

Homework 5: BST and Memory

Due Monday, July 22, 2024 at 11:59pm

In this assignment, you will implement some functions in a binary search tree and use the data structures to implement a simple C program.

  • Compile and test frequently
  • Read the entire assignment first before you start
  • Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
  • Do not hesitate to seek help if you are stuck

Synopsis

In this homework, you will work in the hw5 directory.

map.c: You will implement map_free, map_get and map_insert functions. This file also contain other provided functions implementing the header file at map.h.

groups.c: You will use lists and maps (a.k.a dictionaries) to implement a C program grouping students with the same hometown together.

Written: You will answer some simple questions at the end.

Learning Objectives:

  • Even pointer juggling
  • Understand the structure of a binary search tree
  • Practice heap allocation and manual memory management in a real program

Getting started

See homework 1.

What is provided?

The hw5 directory contains the following files:

  • Makefile is a Makefile that specifies how to build this homework.
  • groups.c is a C file with starter template to implement the groups program.
  • map.h is the header file for a map implemented as a binary-search tree.
  • map.c is where you write your implementation of the three core functions in BST.
  • map-test.c contains unit tests for your implementation.
  • tests/ is a directory containing sample inputs for groups.
  • WRITTEN.txt

The directory also contains the following supporting files:

  • array-list.{h,c} is a completed module implementing a dynamic-array list.
  • linked-list.h is the header file for the linked list module.
  • linked-list.c is currently empty. If you plan to use a linked list in your implementation, you should copy your implementation from hw4. I will release my solution once the late deadline has passed.
  • tests.{c,h} is a unit test framework that I use for this class. You can read them if you are curious.
  • Please ignore memcheck.supp.

Compiling

A Makefile is provided, containing the following rules.

make groups makes the groups program, depending on groups.c, array-list.c, linked-list.c, and map.c. If you plan to use additional files, you can do so by adding the name of the source in Makefile.

make map-test makes the map-test, the unit test for your map implementation.

make all, or simply make, makes both of the above.

MacOS users who wish to run Address Sanitizer can also use the following two commands to check for memory leaks in groups and map-test, respectively.

make memcheck-groups
make memcheck-test

Specification

This homework is divided into two parts. You should implement map first, ensuring that all test cases pass before moving on to groups.

Part I: BST

In this part, you will implement three core functions for a binary-search tree.

A map is represented by the following struct.

struct map {
    int (*cmp)(void *, void *); /* function pointer to compare keys */
    struct tree_node *root;     /* pointer to the root of the BST   */
};

A variable whose type is struct map contains a pointer to the root of the tree (NULL if the tree is empty), and a function that compares two keys. Conventionally, a comparison function returns a negative value if the first argument is less than the second, 0 if the first is equal to the second, and positive if the first is greater than the second.

To use the comparison function stored in a struct, you should use ordinary field-access operators. For example, the following code snippet branches based on the comparison results between two keys key1 and key2.

struct map *m /* = ... */;
void *key1, *key2;
if (m->cmp(key1, key2) < 0) {
    // do something when key1 < key2
} else if (m->cmp(key1, key2) == 0) {
    // do something when key1 == key2
} else {
    // do something when key1 > key2
}

A tree node is represented as the following struct.

struct tree_node {
    void *key;               /* the key in this node   */
    void *value;             /* the value in this node */
    struct tree_node *left;  /* pointer to the left subtree  */
    struct tree_node *right; /* pointer to the right subtree */
};
The binary-search tree is ordered by keys, a generic pointer. The ordering between keys is given by the comparison function mentioned above. Another generic pointer, value, is associated with each key. Just like in linked-list, the memory held by key and value is owned by the user of the BST and should not be freed when the BST is freed.

void map_free(struct map *m);
map_free deallocates all memory held by m. The memory held by m includes the structure that m points to (of type struct map) as well as the entire tree whose root is m->root.

void *map_get(struct map *m, void *key);
map_get retrieves the value associated with a given key. If the key does not exist in the binary search tree, map_get should return NULL.

void *map_insert(struct map *m, void *key, void *value);
map_insert inserts a key-value pair into the tree. If the key is already in the tree, the old value associated with the key is overwritten and map_insert should return that old value. If the key is not in the tree, the new key-value pair should be inserted in the tree, and map_insert should return NULL.

The detailed specifications of map_free, map_get, and map_insert are documented in the header file map.h.

In addition, the starter code provides completed implementations for the following functions. Some of them are useful for debugging, and others provide hints for your implementation.

void *map_remove(struct map *m, void *key);
map_remove removes a key from the tree, returning the value associated with the key. If the key is not in the tree, this function does nothing and returns NULL.

void map_walk(struct map *m,
                void (*visit)(void *key, void *value, void *data),
                void *data);
map_walk takes a function pointer that applies it to all key-value pairs stored in the tree in ascending order of the keys. data stores any additional data that the function may need in its body. In pseudocode,
map_walk(m, visit, data):
    for k, v in ascending order of keys in m:
        visit(k, v, data)

Finally, map also implements two print functions.

void map_print(struct map *m, FILE *fp);
map_print prints all key-value pairs in the tree.

void map_print_internal(struct map *m, FILE *fp);
map_print_internal additionally prints the structure of the tree for debugging purposes.

Running unit tests

See homework 4.

Hints

A recursive solution is much easier than an iterative one, especially for map_free. But you might need to define additional helper functions to perform recursion on tree nodes --- all functions take struct map * as arguments, which is not helpful for a recursive definition.

For bonus points, implement map_free, map_get and map_insert iteratively (i.e. do not use recursion).

  • Iterative map_free: +5%
  • Iterative map_insert: +3%
  • Iterative map_get: +2%

Part II. Groups

In this part, you will write a program called groups, which when given a list of names and their keys (e.g. hometowns) identifies groups of the names that share the same key.

Your input is given by a command-line argument. If the input is stored in a file input.txt, we will invoke your program by ./groups input.txt. The output of your program should be on standard output stdout.

The input is a sequence of lines. Each line begins with a key, followed by a single tab character '\t', and followed by a name. See files in tests/ directory for examples. The following function is provided to read a line from a file.

bool read_record(FILE *file, struct record *rec_p);
A reference is passed to receive the result. When there is no more lines to read in the file, the function returns false.

The output of the program is specified as follows:

  • If a key is associated with exactly one name, ignore the key (and the name);
  • If a key is associated with two or more names, the key and the names constitute a group.

For each group, your program should print in the following format:

  • The first line is the key of the group, immediately followed by a colon :.
  • Print each name in the group in its own line.
  • Print one additional blank line at the end.

The groups are printed in ascending order of their keys, and names within a group are printed in the order in which they appear in the input file.

A reference implementation in Python is provided in groups.py.

Example

An example input:

F   Bolotin Ramsayer
C   Defrancisco Nunn
F   Worman Nilakantan
F   Bradwell Viehweg
D   Kensler Callaghan
A   Donatelli Morocz
E   Leisy Caskey
A   Ruud Roots
A   Stidham Kushnir
A   Hamre Carrillo
A   Applin Solman

Assuming the above content is stored in a file named example.txt, running ./groups example.txt should produce the following to the standard output:

A:
Donatelli Morocz
Ruud Roots
Stidham Kushnir
Hamre Carrillo
Applin Solman

F:
Bolotin Ramsayer
Worman Nilakantan
Bradwell Viehweg


Hints

Suggested Algorithm

You can implement this program in any way you like, as long as the program follows the above output specification. The Python implementation contains the same high-level algorithm that I used in my C implementation.

To summarize, I use a map with char * as keys and a list of char *s (either list would work) as the values. Every time I see a key-value pair, I retrieve the list from the map, and append the name to the list. At the end, I use an in-order travel to visit all key-value pairs, and print out the names if the list of name (i.e. the "value" of a key-value pair) has more than one element.

Generic Pointers and Memory Management

All data structures in this course use a boxed approach, including the provided array-list module. (See the discussion in homework 4 for details) The data structures only manage the structure of data but not their lifetimes. In other words, the data structures merely store pointers, and the users are responsible to free the content behind those pointers if needed.

To avoid memory leaks, before starting your implementation, you should sketch out all heap-allocated data in your program and their lifetimes (Who allocates the data? Who is responsible for freeing them?)

In addition, the data structures do not know (or care) about the types of pointers that you store in them --- almost all functions take or return void * pointers. But to access the memory, you must cast those pointers to the correct pointer type.

struct llist *list = map_get(m, "alice");
As an example, if we know the values in the map are supposed to be struct llist *, the above code casts the returned pointer, whose type is void *, to the correct type.

Casting void * to any pointer type, or casting any pointer type to void *, is valid in C. The compiler is not going to warn you when you cast a pointer to the "wrong" type. It is your responsibility to ensure that pointers have the correct types before dereferencing them. It is also a good idea to write down what all void * pointers are supposed to be before starting your implementation.

When in doubt, run valgrind or Address Sanitizer.

Silencing unused variable warnings

When you define a function with an unused argument, the compiler produces a warning. Sometimes, the argument needs to be there as a placeholder to match the type of a signature or a function pointer. You can let the compiler know that you acknowledge the unused variable by writing the following in the body of a function:

(void) variable_name;

Written

You need to answer some questions in hw5/WRITTEN.txt.

Submission checklist

In hw5/:

  • make produce no errors and produce executables, groups and map-test.

All changes are committed and pushed to your github repository. Submit your program to Gradescope by selecting your coursework directory and the correct branch.

Grading

Percentage
Correctness 70%
Style 20%
Written 10%

Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.