Skip to content

Homework 6: Hash Table and Tagged Union

Due Monday, July 29, 2024 at 11:59pm

In this assignment, you will implement some functions in a hash table and implement a module that fuses two implementations of dictionaries together.

  • Compile and test frequently
  • Read the entire assignment first before you start
  • Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
  • Do not hesitate to seek help if you are stuck

Synopsis

In this homework, you will work in the hw6 directory.

table.c: You will implement table_free, table_get and table_insert functions. This file also contain other provided functions implementing the header file at table.h.

dict.c: You will implement an abstract interface in dict.h that allows the user to choose either implementation of a dictionary dynamically, using tagged unions.

groups.c: My implementation of groups from Homework 5 that uses the abstract interface dict.h. This file will be released after the late deadline for Homework 5.

Written: You will answer some simple questions at the end.

Getting started

See homework 1.

What is provided?

The hw6 directory contains the following files:

  • Makefile is a Makefile that specifies how to build this homework.
  • table.h is the header file for a map implemented as a hash table.
  • table.c is where you write your implementation of the three core functions in hash tables.
  • table-test.c contains unit tests for your implementation.
  • groups.c is my implementation of groups, released later.
  • hash.{h,c} contains functions that hash and compare strings.
  • tests/ is a directory containing sample inputs for groups.
  • WRITTEN.txt

The directory also contains the following supporting files:

  • array-list.{h,c} is a completed module implementing a dynamic-array list.
  • linked-list.{h,c} is a completed module implementing a linked list.
  • tests.{c,h} is a unit test framework that I use for this class. You can read them if you are curious.
  • Please ignore memcheck.supp.

Compiling

A Makefile is provided, containing the following rules.

make table-test makes the table-test, the unit test for your table implementation.

make groups makes the groups program.

make all, or simply make, makes both of the above.

MacOS users who wish to run Address Sanitizer can also use the following two commands to check for memory leaks in groups and table-test, respectively.

make memcheck-groups
make memcheck-test

Specification

This homework is divided into two parts. You should implement table first, ensuring that all test cases pass before moving on to dict.c.

Part I: Hash Table

You will implement a hash table that resolves collisions by chaining.

A table is represented by the following struct.

struct table {
    int size; /* the number of buckets in this table */
    int length; /* the number of key-value pairs */
    int (*cmp)(void *, void *); /* comparison between two keys */
    uint64_t (*hash)(void *);   /* hash a key */
    struct bucket *buckets[];   /* a variable array member for buckets */
};

A table contains two function pointers, cmp and hash, and two attributes size, which is the number of buckets, and length, which is the number of key-value pairs in the table. The table struct contains an array of buckets in the same memory block as the other fields as a variable array member.

A bucket is a linked list of key-value pairs.

struct bucket {
    void *key;
    void *value;
    struct bucket *next;
};

Given a key void *key; and table struct table *t;, the index of a bucket can be obtained by the following:

int idx = t->hash(key) % t->size;

To look up the value associated to a key, you should search through the list of buckets at index idx.

To insert a key-value, you should first check if the key is already in the list of buckets. If so, the old value is replaced with the new value, and the old value is returned. Otherwise, you should prepend the new key-value pair to the list of buckets. Note that NULL signifies an empty linked-list. Don't forget to update the length when inserting.

The functions table_free, table_get, and table_insert implement the same interface as map_free, map_get, and map_insert, respectively.

Variable Array Member

Since the buckets are not allocated separately from the table struct itself, you do not need to free t->buckets array separately. (However, you do need to free each linked-list in the buckets.) To access a bucket at index idx, you can just use the conventional array subscripts, i.e. t->buckets[idx].

Running unit tests

See homework 4.

Part II. Tagged Union

In this part, you will implement a generic dictionary interface using tagged unions, which handles multiple underlying data structures. This interface creates a unified structure, struct dict, which can either be a hash table or a map, based on which "constructor" creates it.

enum dict_type {
    TABLE,
    MAP,
};

union table_or_map {
    struct table *tbl;
    struct map *map;
};

struct dict {
    enum dict_type type;
    union table_or_map data;
};

To review, union allows a programmer to choose how to interpret the bits at run time, and enum informs the programmer which one of the variants in the union is the "correct" interpretation.

The dict_walk function is provided as an example of using a tagged union.

void dict_walk(struct dict *dict,
        void (*visit)(void *key, void *value, void *data),
        void *data)
{
    if (dict->type == TABLE) {
        table_walk(dict->data.tbl, visit, data);
    } else {
        map_walk(dict->data.map, visit, data);
    }
}
On entry, the function checks which tag dict has, and dispatches the correct, concrete function that implements walk with the correct interpretation in the union.

To construct a tagged union, you should set the appropriate tag and call the constructor of the requested implementation (map_create or table_create).

You should implement all functions in dict.h by invoking the correct concrete implementations of a dictionary. After finishing dict.c, you should read through groups.c and check if groups, which uses the abstract dictionary, compiles and works as expected.

All variants in a union occupy the same memory space. dict->data.map and dict->data.tbl is the same piece of memory, but the . selector tells the compiler how that piece of memory should be interpreted.

Written

You need to answer some questions in hw6/WRITTEN.txt.

Submission checklist

In hw6/:

  • make produce no errors and produce executables, groups and table-test.

All changes are committed and pushed to your github repository. Submit your program to Gradescope by selecting your coursework directory and the correct branch.

Grading

Percentage
Correctness 70%
Style 20%
Written 10%

Warning: If your program cannot be compiled using the commands above without error, you will receive 0 points in correctness since there is no executables for us to run.