Lecture 4 - Arrays and Strings¶
Target budget for a 150-minute session. HW1 (Build Your Own String Library) is
released today - this lecture is its preparation, so strings are first contact,
not review. Segment 5 (the <string.h> motivation) onward is lighter; exercise
B2 and the stretch problems are the designated overflow. We end on a deliberate
teaser - "why does scanf("%s", word) take no &?" - that opens Lecture 5 on
pointers.
0. Recap and framing¶
- So far every variable has held one value: an
int, achar, adouble. - Two new ideas today, and they are the same idea twice:
- an array groups many values of the same type under one name;
- a string is the special case of an array of
charused to hold text. - This is exactly what HW1, released today, is built on: you will write your own small string library from scratch. Everything you need is in this lecture.
1. Arrays in C¶
An array is a fixed-length, contiguous run of elements that all have the same type. You get at the elements by index.
Declaring and indexing¶
int scores[5]; /* five ints, indices 0..4, values are garbage for now */
scores[0] = 90;
scores[1] = 75;
int first = scores[0]; /* read element 0 */
- Indices start at 0. An array of size
nhas valid indices0throughn - 1. The last element ofscoresisscores[4], notscores[5]. - The elements sit next to each other in memory, in order. That is what makes
indexing fast: the machine computes the address as
base + index * (bytes per element).
Initializing¶
int a[5] = {10, 20, 30, 40, 50}; /* all five spelled out */
int b[5] = {1, 2}; /* rest are zero-filled: 1 2 0 0 0 */
int c[] = {1, 2, 3}; /* size inferred as 3 */
int z[100] = {0}; /* a quick way to zero everything */
Two rules that bite¶
- The size is fixed at compile time.
int a[5];is always five ints. You cannot grow or shrink it at run time, and (for now) the size must be a constant the compiler knows. Resizable storage waits for dynamic allocation later in the course. - There is no bounds checking. Writing
scores[5](orscores[-1]) compiles and runs, scribbling on memory that is not yours. This is undefined behavior: maybe a crash, maybe silent corruption. Staying in0 .. n-1is your job.
Worked example - how indexing finds an element¶
The elements sit in consecutive memory, so the machine reaches any one of them by
arithmetic, never by searching. Suppose int a[4] = {10, 20, 30, 40}; lands
at address 1000, and each int takes 4 bytes:
| index | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| address | 1000 | 1004 | 1008 | 1012 |
| value | 10 | 20 | 30 | 40 |
- To read
a[2], the machine computes1000 + 2 * 4 = 1008and goes straight there. In general the address isbase + index * (bytes per element). - That formula explains two things at once: indexing is fast (one multiply and
add, no matter how big the array), and
a[0]is the first element because its offset is0 * 4 = 0. - It also shows why
a[4]here is dangerous. The formula cheerfully computes1000 + 4 * 4 = 1016, an address just past the array, and reads or writes whatever lives there. The hardware does no checking - that is segment 1's "no bounds checking" seen from the memory side. - Hold onto the word base address: an array's name turns out to be that address, which is the thread we pick up at the end of class.
2. Iterating over an array, and the length problem¶
The standard pattern is a counting for loop from 0 up to n - 1:
int a[5] = {10, 20, 30, 40, 50};
int n = 5; /* we declared 5 elements, so we know n is 5 */
for (int i = 0; i < n; i++) {
printf("%d\n", a[i]);
}
- The array does not know its own length. Nothing is stored that says "I have
5 elements." You have to carry
nyourself and never letireach it. - Foreshadowing: strings solve this differently. A text string puts a special
marker, the null terminator
'\0', at the end, so a loop can stop when it sees the marker instead of counting. We get there in segment 6.
What happens when we go out of bounds?:
#include <stdio.h>
int main(void){
int scores[5];
for(int i=0;i<1000000;i++){
printf("%d: %d\n",i,scores[i]);
}
return 0;
}
In-class exercise break - largest and smallest¶
Handout: Part A, Exercise A1 - on the computer.
- Given an array of integers, find the largest and smallest value in one
pass. Seed
minandmaxfrom element0, then loop from index1.
3. Passing arrays to functions¶
To work on an array in a function, you pass it as a parameter:
int sum(int a[], int n) { /* "int a[]" - an array of int */
int total = 0;
for (int i = 0; i < n; i++) {
total += a[i];
}
return total;
}
int main(void) {
int nums[4] = {3, 1, 4, 1};
printf("%d\n", sum(nums, 4)); /* 9 */
return 0;
}
- You must pass the length too. Notice
sumtakes bothaandn. The function receives only a reference to where the array starts - the base address from segment 1 - not how many elements follow it. There is no way to recover the count fromaalone, so the caller has to hand it over separately. - The function sees the original array, not a copy. If
sumwrotea[0] = 0, the caller's array would change. This is unlike a plainintparameter, which is copied. (Arrays are special here - hold that thought for the end of class.) int a[]andint a[100]as a parameter mean the same thing; the number in the brackets is ignored for parameters. Many people writeint a[].
In-class exercise break - a sum function¶
Handout: Part A, Exercise A2 - on the computer. Use the Lecture 2 multi-file and
Makefileworkflow if you like.
- Write
int sum(int a[], int n)and call it frommain. Then adddouble average(int a[], int n)that reusessum. Watch the integer-division trap from Lecture 3: cast to get a real average.
4. Strings: arrays of characters¶
Now the special case that HW1 is about. A string in C is just an array of
char that ends with the null terminator '\0' - a single byte whose value is
0.
"cat"is stored as four bytes, not three:
| index | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| char | c |
a |
t |
\0 |
- The
'\0'is the end marker. Every string operation relies on it to know where the text stops - there is no separate length stored anywhere. This is the answer to the "length problem" from segment 2.
Declaring space for a string¶
char word[256]; /* room for up to 255 chars, plus the '\0' */
char greeting[] = "hi"; /* size inferred as 3: 'h' 'i' '\0' */
- A string literal in your source, like
"hi", already includes the'\0'for you - that is whygreetingneeds three slots, not two.
Walking a string¶
The fundamental loop - go until the terminator:
This single pattern is how you implement essentially everything: length, search,
transform. No length variable needed - the '\0' tells you when to stop.
Reading and printing¶
- Print a whole string with
%s:printf("%s\n", word); - Read a string one whitespace-delimited word at a time:
scanf("%s", word); scanf("%s", ...)stops at the first space, tab, or newline, so it reads a single word, not a whole line. That is all we need today.- Note: no
&here, unlikescanf("%d", &n). Tuck that oddity away - we explain it at the very end of class.
char is a small integer (ASCII)¶
The one idea carried straight over from Lecture 3's "everything is numbers": characters are just small integers under the ASCII encoding.
'A'is65,'a'is97- the two cases differ by exactly32.- Idioms worth putting on the board:
- Is it a lowercase letter?
c >= 'a' && c <= 'z' - Lower to upper:
c - 'a' + 'A'(or justc - 32) - Alphabet position, 0 to 25:
c - 'a' - So you transform text with plain arithmetic and comparison - no special library required. That is the whole premise of HW1.
Live-coding demo - "measure a word"¶
Read a word, find its length by hand, then exercise the ASCII arithmetic on a single character.
#include <stdio.h>
int main(void) {
char word[256];
printf("Enter a word: ");
scanf("%s", word);
/* walk to the '\0' to find the length - this is your own strlen */
int len = 0;
while (word[len] != '\0') {
len++;
}
printf("\"%s\" has %d characters\n", word, len);
/* char arithmetic on the first letter - the ASCII idioms in action */
char first = word[0];
if (first >= 'a' && first <= 'z') {
printf("first letter '%c' -> upper '%c', alphabet position %d\n",
first, first - 'a' + 'A', first - 'a');
}
return 0;
}
- Point out: with nothing but a loop and arithmetic we found a string's length by
hand (your own
strlen) and converted between a letter, its uppercase form, and its0..25position. HW1 is these same two tools - the walk and the arithmetic - pushed across whole strings. We stop short of doing that here so there is something left to build.
In-class exercise break - count uppercase and lowercase¶
Handout: Part B, Exercise B1 - on the computer.
- Read a word and count how many characters are uppercase letters and how many are lowercase. One pass, the case tests above.
5. The string library, and why HW1 hides it from you¶
C's standard library ships these in <string.h>:
| Function | Does |
|---|---|
strlen(s) |
number of characters before the '\0' |
strcpy(dst, src) |
copy src (including '\0') into dst |
strcat(dst, src) |
append src onto the end of dst |
strcmp(a, b) |
0 if equal; negative/positive by first differing char |
- Each is just a loop over
chars of the kind we wrote in the demo. There is no magic inside. - HW1 deliberately forbids
<string.h>(and<ctype.h>). The point of the assignment is to build your ownstr_length,to_upper,reverse, and friends, so you understand what the library is actually doing. Knowing these names tells you what to build - now go build them. - One caution to flag for later:
strcpyandstrcatdo no bounds checking - they happily write past the end of a too-small destination array. That class of bug (the buffer overflow) is a recurring theme later in the course.
In-class exercise break - redact the letters (overflow)¶
Handout: Part B, Exercise B2 - on the computer.
- Walk a word and overwrite every letter with
#in place, leaving digits and punctuation alone. The new move is assigning back into the array (word[i] = '#';) as you go - the same in-place shape HW1's transforms use.
6. The teaser - why no & on a string?¶
Back to the oddity from segment 4:
- The one-line answer: an array's name, used in an expression, evaluates to the
address of its first element. So
wordalready is an address - there is nothing to take the address of. That is also why a function receivingint a[]can modify the caller's array (segment 3), and why it never learns the size. - This thing - "a value that is an address" - is called a pointer, and it is
the entire subject of Lecture 5. Today you have already been using pointers
without the name. Next time we give them the name, the
*and&operators, and the memory model (scope, lifetime, and the call stack) that goes with them.
7. Wrap-up¶
- Array: fixed-size, same-type, 0-indexed, contiguous; no bounds checking and
it does not know its own length, so you carry
nyourself. - Passing an array to a function passes a reference to the original (not a copy), so you also pass the length.
- String: an array of
charending in'\0'; loop until the terminator; no stored length.chars are small integers, so text is transformed with plain arithmetic - the foundation for HW1. - The standard
<string.h>functions are just such loops; HW1 has you write your own. - Cliffhanger: an array name is an address - a pointer - which is where Lecture 5 begins.