Skip to content

Lecture 4 - Arrays and Strings

Target budget for a 150-minute session. HW1 (Build Your Own String Library) is released today - this lecture is its preparation, so strings are first contact, not review. Segment 5 (the <string.h> motivation) onward is lighter; exercise B2 and the stretch problems are the designated overflow. We end on a deliberate teaser - "why does scanf("%s", word) take no &?" - that opens Lecture 5 on pointers.

0. Recap and framing

  • So far every variable has held one value: an int, a char, a double.
  • Two new ideas today, and they are the same idea twice:
  • an array groups many values of the same type under one name;
  • a string is the special case of an array of char used to hold text.
  • This is exactly what HW1, released today, is built on: you will write your own small string library from scratch. Everything you need is in this lecture.

1. Arrays in C

An array is a fixed-length, contiguous run of elements that all have the same type. You get at the elements by index.

Declaring and indexing

int scores[5];          /* five ints, indices 0..4, values are garbage for now */
scores[0] = 90;
scores[1] = 75;
int first = scores[0];  /* read element 0 */
  • Indices start at 0. An array of size n has valid indices 0 through n - 1. The last element of scores is scores[4], not scores[5].
  • The elements sit next to each other in memory, in order. That is what makes indexing fast: the machine computes the address as base + index * (bytes per element).

Initializing

int a[5] = {10, 20, 30, 40, 50};   /* all five spelled out          */
int b[5] = {1, 2};                 /* rest are zero-filled: 1 2 0 0 0 */
int c[]  = {1, 2, 3};              /* size inferred as 3             */
int z[100] = {0};                  /* a quick way to zero everything */

Two rules that bite

  • The size is fixed at compile time. int a[5]; is always five ints. You cannot grow or shrink it at run time, and (for now) the size must be a constant the compiler knows. Resizable storage waits for dynamic allocation later in the course.
  • There is no bounds checking. Writing scores[5] (or scores[-1]) compiles and runs, scribbling on memory that is not yours. This is undefined behavior: maybe a crash, maybe silent corruption. Staying in 0 .. n-1 is your job.

Worked example - how indexing finds an element

The elements sit in consecutive memory, so the machine reaches any one of them by arithmetic, never by searching. Suppose int a[4] = {10, 20, 30, 40}; lands at address 1000, and each int takes 4 bytes:

index 0 1 2 3
address 1000 1004 1008 1012
value 10 20 30 40
  • To read a[2], the machine computes 1000 + 2 * 4 = 1008 and goes straight there. In general the address is base + index * (bytes per element).
  • That formula explains two things at once: indexing is fast (one multiply and add, no matter how big the array), and a[0] is the first element because its offset is 0 * 4 = 0.
  • It also shows why a[4] here is dangerous. The formula cheerfully computes 1000 + 4 * 4 = 1016, an address just past the array, and reads or writes whatever lives there. The hardware does no checking - that is segment 1's "no bounds checking" seen from the memory side.
  • Hold onto the word base address: an array's name turns out to be that address, which is the thread we pick up at the end of class.

2. Iterating over an array, and the length problem

The standard pattern is a counting for loop from 0 up to n - 1:

int a[5] = {10, 20, 30, 40, 50};
int n = 5;                 /* we declared 5 elements, so we know n is 5 */

for (int i = 0; i < n; i++) {
    printf("%d\n", a[i]);
}
  • The array does not know its own length. Nothing is stored that says "I have 5 elements." You have to carry n yourself and never let i reach it.
  • Foreshadowing: strings solve this differently. A text string puts a special marker, the null terminator '\0', at the end, so a loop can stop when it sees the marker instead of counting. We get there in segment 6.

What happens when we go out of bounds?:

#include <stdio.h>                                                              

int main(void){                                                                 

  int scores[5];                                                                

  for(int i=0;i<1000000;i++){                                                   
    printf("%d: %d\n",i,scores[i]);                                             
  }                                                                             

  return 0;                                                                     
}

In-class exercise break - largest and smallest

Handout: Part A, Exercise A1 - on the computer.

  • Given an array of integers, find the largest and smallest value in one pass. Seed min and max from element 0, then loop from index 1.

3. Passing arrays to functions

To work on an array in a function, you pass it as a parameter:

int sum(int a[], int n) {       /* "int a[]" - an array of int */
    int total = 0;
    for (int i = 0; i < n; i++) {
        total += a[i];
    }
    return total;
}

int main(void) {
    int nums[4] = {3, 1, 4, 1};
    printf("%d\n", sum(nums, 4));   /* 9 */
    return 0;
}
  • You must pass the length too. Notice sum takes both a and n. The function receives only a reference to where the array starts - the base address from segment 1 - not how many elements follow it. There is no way to recover the count from a alone, so the caller has to hand it over separately.
  • The function sees the original array, not a copy. If sum wrote a[0] = 0, the caller's array would change. This is unlike a plain int parameter, which is copied. (Arrays are special here - hold that thought for the end of class.)
  • int a[] and int a[100] as a parameter mean the same thing; the number in the brackets is ignored for parameters. Many people write int a[].

In-class exercise break - a sum function

Handout: Part A, Exercise A2 - on the computer. Use the Lecture 2 multi-file and Makefile workflow if you like.

  • Write int sum(int a[], int n) and call it from main. Then add double average(int a[], int n) that reuses sum. Watch the integer-division trap from Lecture 3: cast to get a real average.

4. Strings: arrays of characters

Now the special case that HW1 is about. A string in C is just an array of char that ends with the null terminator '\0' - a single byte whose value is 0.

  • "cat" is stored as four bytes, not three:
index 0 1 2 3
char c a t \0
  • The '\0' is the end marker. Every string operation relies on it to know where the text stops - there is no separate length stored anywhere. This is the answer to the "length problem" from segment 2.

Declaring space for a string

char word[256];                 /* room for up to 255 chars, plus the '\0' */
char greeting[] = "hi";         /* size inferred as 3: 'h' 'i' '\0'        */
  • A string literal in your source, like "hi", already includes the '\0' for you - that is why greeting needs three slots, not two.

Walking a string

The fundamental loop - go until the terminator:

for (int i = 0; word[i] != '\0'; i++) {
    printf("%c\n", word[i]);
}

This single pattern is how you implement essentially everything: length, search, transform. No length variable needed - the '\0' tells you when to stop.

Reading and printing

  • Print a whole string with %s: printf("%s\n", word);
  • Read a string one whitespace-delimited word at a time: scanf("%s", word);
  • scanf("%s", ...) stops at the first space, tab, or newline, so it reads a single word, not a whole line. That is all we need today.
  • Note: no & here, unlike scanf("%d", &n). Tuck that oddity away - we explain it at the very end of class.

char is a small integer (ASCII)

The one idea carried straight over from Lecture 3's "everything is numbers": characters are just small integers under the ASCII encoding.

  • 'A' is 65, 'a' is 97 - the two cases differ by exactly 32.
  • Idioms worth putting on the board:
  • Is it a lowercase letter? c >= 'a' && c <= 'z'
  • Lower to upper: c - 'a' + 'A' (or just c - 32)
  • Alphabet position, 0 to 25: c - 'a'
  • So you transform text with plain arithmetic and comparison - no special library required. That is the whole premise of HW1.

Live-coding demo - "measure a word"

Read a word, find its length by hand, then exercise the ASCII arithmetic on a single character.

#include <stdio.h>

int main(void) {
    char word[256];

    printf("Enter a word: ");
    scanf("%s", word);

    /* walk to the '\0' to find the length - this is your own strlen */
    int len = 0;
    while (word[len] != '\0') {
        len++;
    }
    printf("\"%s\" has %d characters\n", word, len);

    /* char arithmetic on the first letter - the ASCII idioms in action */
    char first = word[0];
    if (first >= 'a' && first <= 'z') {
        printf("first letter '%c' -> upper '%c', alphabet position %d\n",
               first, first - 'a' + 'A', first - 'a');
    }
    return 0;
}
Enter a word: hello
"hello" has 5 characters
first letter 'h' -> upper 'H', alphabet position 7
  • Point out: with nothing but a loop and arithmetic we found a string's length by hand (your own strlen) and converted between a letter, its uppercase form, and its 0..25 position. HW1 is these same two tools - the walk and the arithmetic - pushed across whole strings. We stop short of doing that here so there is something left to build.

In-class exercise break - count uppercase and lowercase

Handout: Part B, Exercise B1 - on the computer.

  • Read a word and count how many characters are uppercase letters and how many are lowercase. One pass, the case tests above.

5. The string library, and why HW1 hides it from you

C's standard library ships these in <string.h>:

Function Does
strlen(s) number of characters before the '\0'
strcpy(dst, src) copy src (including '\0') into dst
strcat(dst, src) append src onto the end of dst
strcmp(a, b) 0 if equal; negative/positive by first differing char
  • Each is just a loop over chars of the kind we wrote in the demo. There is no magic inside.
  • HW1 deliberately forbids <string.h> (and <ctype.h>). The point of the assignment is to build your own str_length, to_upper, reverse, and friends, so you understand what the library is actually doing. Knowing these names tells you what to build - now go build them.
  • One caution to flag for later: strcpy and strcat do no bounds checking - they happily write past the end of a too-small destination array. That class of bug (the buffer overflow) is a recurring theme later in the course.

In-class exercise break - redact the letters (overflow)

Handout: Part B, Exercise B2 - on the computer.

  • Walk a word and overwrite every letter with # in place, leaving digits and punctuation alone. The new move is assigning back into the array (word[i] = '#';) as you go - the same in-place shape HW1's transforms use.

6. The teaser - why no & on a string?

Back to the oddity from segment 4:

scanf("%d", &n);        /* an int needs the & */
scanf("%s", word);      /* a string does not  */
  • The one-line answer: an array's name, used in an expression, evaluates to the address of its first element. So word already is an address - there is nothing to take the address of. That is also why a function receiving int a[] can modify the caller's array (segment 3), and why it never learns the size.
  • This thing - "a value that is an address" - is called a pointer, and it is the entire subject of Lecture 5. Today you have already been using pointers without the name. Next time we give them the name, the * and & operators, and the memory model (scope, lifetime, and the call stack) that goes with them.

7. Wrap-up

  • Array: fixed-size, same-type, 0-indexed, contiguous; no bounds checking and it does not know its own length, so you carry n yourself.
  • Passing an array to a function passes a reference to the original (not a copy), so you also pass the length.
  • String: an array of char ending in '\0'; loop until the terminator; no stored length. chars are small integers, so text is transformed with plain arithmetic - the foundation for HW1.
  • The standard <string.h> functions are just such loops; HW1 has you write your own.
  • Cliffhanger: an array name is an address - a pointer - which is where Lecture 5 begins.