Eroxl's Notes
CPSC 213

Static Variable

Static variables are variables in which the compiler allocates memory for them throughout the entire run of the program. Static variables are placed on the data segment of a programs memory and are usually associated before the program is executed.

Endianness

Endianness refers to the direction in which bytes are transmitted over a data communication method.

Big-Endian

Big-endian works by transmitting / storing the most significant byte first.

Little-Endian

Little-endian works by transmitting / storing the least significant byte first.

Example

Consider the number 0x12345678 if we were to write this to memory at the index , we would get the following memory values:

Value Address
0x12 i
0x34 i+1
0x56 i+2
0x78 i+3

Value Address
0x78 i
0x56 i+1
0x34 i+2
0x12 i+3

Notice how in both the actual order of bits in the individual bytes is unchanged.

Byte Extension

Byte extension describes the process by which bytes are added when a stored values memory size needs to be increased (ie. from a byte to an int). There are two types of byte extension Signed Extension and Zero Extension. Both methods of byte extensions pad the start of the value instead of the end.

Signed Extension

Signed extension extends the bytes, with whatever the sign of the data is, (for example -11, would be padded with 1's and 11 would be padded would 0's). In C signed extension is used for all signed data types.

Example

Zero Extension

Zero extension extends the bytes, with zeroes, no matter what the sign of the data is. In C zero extension is used for all unsigned data types.

Example

Byte Truncation

Byte extension describes the process by which bytes are removed when a stored values memory size needs to be decrease (ie. from an int to a byte). Bits are always removed from the start of the value called truncation.

Example

Memory

Accessing Memory

Memory can be accessed by the CPU, and must be accessed in contiguous, power of two size chunks of bytes.

Areas of Memory

The memory of a computer is usually split up into 4 main sections.

Code Segment

The code segment contains the compiled machine instructions and is read only to ensure you can't accidentally overwrite your code at runtime.

Data Segment

The data segment contains any global and static variables.

Stack

The stack stores local variables, function parameters, and return addresses. The stack typically grows downward towards lower addresses in memory and is automatically managed, (ie. the addresses are freed when functions return).

Heap

Heap Memory

The heap is used for dynamically allocated memory (ie. malloc in C and new in C++). The heap typically grows upwards towards higher addresses in memory and is manually managed, (ie. you must free the memory when you're done with it).

Pitfalls

Dangling Pointers

When a program accesses memory that has already been freed, the contents at that location are undefined because the system may have reallocated that memory for other purposes. Pointers pointing to this freed memory are called dangling pointers.

Example

int *i = new int;
*i = 5;
delete i;

cout << *i << endl;

Here, i points to memory that has been deallocated. Accessing or writing to it can produce unexpected results.

Avoiding Dangling Pointers

To prevent using freed memory, you can set the pointer to NULL after deletion. This way, dereferencing it will trigger a crash (e.g., segmentation fault), making the bug easier to detect:

int *i = new int;
*i = 5;
delete i;
i = NULL;

cout << *i << endl; // Seg fault

Memory Leak

When a program loses access to allocated space before it has been freed (ie. by reassigning a pointer), the space can no loner be referenced, or freed. This memory therefore remains marked as allocated for the lifetime of the program. Typically memory leaks occur with memory on the heap.

int *i = new int;
int b = 2

*i = 5;
i = &b;

The original address of i was lost and can no longer be freed causing a memory leak

Memory Address

A memory address describes the location in memory where a given variable or identifier stores data.

Special Addresses

NULL Address

The null address is 0x0000 and represents an empty value. Trying to dereference this address with throw an error as it doesn't exist.

Pointer

A pointer is a variable which stores a memory address.

Declaring Pointer Variables

C and C++

Pointer variables are defined as follows:

int *ptr;

where int can be any data type and defines the type of value at that pointers address. Additionally ptr is just the name of the variable and can be any valid name.

Pointer Operations

C and C++

Address Operator

The address operator gets the address of an existing variable and is expressed by placing a & before the variable.

Example

int x = 23;
int *p = &x;

The variable p now stores the address of x.

Dereferencing Operator

The Dereferencing operator gets value at an address is expressed by placing a * before the pointer.

int x = 23;
int *p = &x;

*p = 50;
cout << x << endl; // Outputs 50

C++

new Keyword

The new keyword allocates new space on the heap and returns the first address of the allocated space.

int *b = new int;

delete Keyword

int *b = new int;
*b = 2;

delete b;

For arrays allocated on the heap the delete[] keyword can be used

int *b = new int[3]{1, 2, 3};

delete[] b;

The delete keyword releases the memory at the address referenced by the pointer variable passed to it.

Stacking Pointers

Because pointers can "point" to any address in memory we can create pointers to pointers

int x = 5;
int* p = &x;          // Memory address of `x`
int** q = &p;         // Memory address of `p`
int*** r = &q;        // Memory address of `r`

cout << **r << endl;  // Memory address of `x` (value of `p)
cout << **q << endl;  // Value of `x`
cout << ***r << endl; // Value of `x`

Address Alignment

Address alignment describes the process by which addresses are allocated such that they are multiples of the object's size. Aligned addresses are faster to access on most CPU's and on some unaligned addresses are not supported.

Structure Alignment

Structures are aligned differently to basic data types as they are composites, their alignment follows the following 3 rules:

  1. Fields of primitive types are aligned according to their size.
  2. Fields whose types are structures or arrays are aligned to the largest of their fields’ alignments
  3. The size of a structure must be a multiple of its alignment

When padding is introduced it must always be added between or after fields, ensuring that the alignment of the field before the padding is maintained.

Examples

Example 1

Given the following struct determine the offset of b as well as the total size of the struct and the alignment

struct ExampleOne {
    char  a;
    int   b;
    short c;
};

The individual fields have sizes of 1, 4, and 2 bytes respectively meaning the overall alignment of the struct must be 4 bytes, the size of the fields in the struct total to 7 bytes, meaning it must be padded up to 12 bytes, we can do this by adding padding of 3 bytes to a and 4 bytes to the end of c.

Field Size (Bytes) Offset
char a; 1 0
padding 3
int b; 4 4
short c; 2 6
padding 4
Total: 12

Reference Counting

Reference counting is a method of managing memory to decrease the risks common memory pitfalls like memory leaks or dangling pointers. Reference counting follows 4 basic rules to perform this

  1. Initialize the reference count to one (as the caller has a reference)
  2. Any procedure that stores a reference increments the count
  3. Any procedure that discards a reference decrements the count
  4. Never free an object directly, instead the object is free when the count goes to zero

General Examples

When a Reference is a Parameter

When a reference is passed to a procedure as a parameter, the caller necessarily must maintain it for the duration of the call so the callee doesn't need to call keep_ref, unless the callee saves the pointer somewhere that will outlast the call (ie. a global variable), or the callee returns the pointer to the caller.

When a Reference is a Return Value

When a reference is returned from a procedure, the callee must have a reference to the value which it returns to the caller, when it returns it though it transfers that reference to the caller, it's then the callers responsibility to call free_ref when it no longer needs the reference.

When a Reference is Stored in a Local Variable

References' stored in local variables necessarily must go away when the procedure returns so the procedure must call free_ref before it returns.

Implementations

Concrete Type

If the struct you're reference counting has a concrete type reference counting can be implemented as follows:

struct buffer *rc_malloc() {
	struct buffer *buf = malloc(sizeof(struct buffer));
	buf->ref_count = 1;

	return buf;
}

void rc_keep_ref(struct buffer *buf) {
	buf->ref_count++;
}

void rc_free_ref(struct buffer *buf) {
	buf->ref_count--;
	
	if (buf->ref_count == 0) free(buf);
}

General Type

A more general method can be used which works for any input type, this is done by reserving 8 bytes before the actual allocated memory for the ref count.

void* rc_malloc(int num_bytes) {
	// Add 8 bytes to store the ref_count
	int *ref_count = malloc(n+8);
	*ref_count = 1;
	
	return ((void*) ref_count) + 8;
}

void rc_keep_ref(void* ptr) {
	int *ref_count = ptr - 8;
	(*ref_count)++;
}

void rc_free_ref(void* ptr) {
	int *ref_count = ptr - 8;
	(*ref_count)--;
	
	if (*ref_count == 0) free(ref_count);
}

Conditional (Computer Science)

A conditional is a type of control flow that branches based on one or more expressions.

int a;
scanf("%d", &a);

if (a % 2 == 0) {
	printf("%d is even", a);
} else {
	printf("%d is odd", a);
}

Example of a conditional statement that checks whether a number is even or odd

Low Level Implementation

At the assembly level, conditionals are implemented using jumps as well as logic for the condition. The compiler translates the boolean expression of the condition into arithmetic ones depending on the architecture.

<init>
'conditional setup
goto then if <condition>
else:
	'alternative body
    goto end_if
then:
	'consequent body
end_if:

Example

Conditional (SM213)

Given the C code:

int a = 5;
int b = 2;

if (a > b) {
	// consequent
} else {
	// alternative
}

write an equivalent conditional in assembly

.pos 0x1000
	# <init>
	ld $a, r0         # r0 = &a
	ld (r0), r0       # r0 = a
	ld $b, r1         # r1 = &b
	ld (r1), r1       # r1 = b
	
	mv r1, r2         # r2 = b
	not r2            # r2 = !b
	inc r2            # r2 = -b
	
	add r0, r2        # r2 = a-b
	
	bgt r2, then      # a-b > 0 => goto then
	
	else: 
		# alternative
		br end_if
	
	then:
		# consequent

	end_if: 

.pos 0x2000
	a:      .long 5
	b:      .long 2

For Loop

A for loop is a type of control flow statement that repeatedly runs a section of code until a condition is satisfied.

for (int i = 0; i < 3; ++i) {
    printf("%d", i);
}

Example of a for loop that prints "012" in C

Low Level Implementation

At the assembly level, for loops are implemented using conditional and unconditional jumps. The compiler translates the initialization, condition check, increment, and loop body into equivalent jump-based control flow.

<init>
loop_start:
    'evaluate condition
    goto end_loop if not <continue_condition>

    'loop body
    ...

    'increment and repeat
    <step>
    goto loop_start
end_loop:

Example

For Loop (SM213)

Given the C code:

for (int i = 0; i < 3; ++i) {
	// ...
}

write an equivalent for loop in assembly

.pos 0x1000
	# <init>
	ld $i, r0         # r0 = &i
	ld (r0), r0       # r0 = i
	ld $-3, r1        # r1 = 3
	
	loop_start: 
		mov r0, r2    # r2 = i
		add r1, r2    # r2 = i - 3 (allows us to use beq)
		
		# evaluate condition
		beq r1, end_loop # go to just after loop
		
		# loop body
		
		# increment and repeat
		inc r0
		br loop # go to start of loop	

	end_loop: 

.pos 0x2000
	i:      .long 0

Program Counter

The program counter is a special register that memory address of the next instruction to be executed, it is continually incremented after an instruction is fetched. Control transfers, like jumping or branching can change the program counter in ways other than the standard basic incrementation.

When a subroutine is called it branches as well as storing the preceding contents of the program counter somewhere, so that when it returns the saved contents of the program counter can be placed back into the program counter which can then resume sequential execution after the subroutine call.

Procedure

A procedure is the higher level generalization of a subroutine with its own name, arguments and local scope. Procedures can be called which causes their encased code to be run, and possibly a result bound when the function returns.

int fib(n) {
	if (n == 0 || n == 1) return n;

	return fib(n-1) + fib(n-2);
}

Example of a procedure that returns the nth value of the Fibonacci sequence in C.

Low Level Implementation

Call Stack

Procedures can be implemented using a call stack to store and retrieve required information about the procedure such as the return address and any arguments.

Call Stack

The call stack is a specific implementation of a stack which stores relevant information for a procedure to execute.

The call stack is typically at the highest memory address possible for a given program and then grows downwards (to lower memory addresses) towards the heap.

Stack Frame

Each time a procedure is called a new entry is created (called a "stack frame"). Each stack frame stores local variables of the procedure, the return address and any arguments passed to the procedure.

Additionally, the stack frame might include saved registers if the procedure calls another procedure. When a procedure is called usually the start of this stack frame is written to a common register before invocation.

Allocation

When a new stack frame is allocated, space is reserved on the call stack for all data the procedure needs during its execution. This process typically involves:

  • Adjusting the Stack Pointer: The stack pointer register is moved (decremented) to allocate enough space for the new frame.
  • Saving the Return Address: The address of the next instruction (the point where the program should continue after the procedure returns) is pushed onto the stack.
  • Saving the Old Frame Pointer: The previous frame pointer is pushed onto the stack so it can be restored when returning.
  • Setting the New Frame Pointer: The frame pointer is updated to the current top of the stack, marking the base of the new frame. This allows local variables and parameters to be accessed using fixed offsets.
  • Allocating Space for Local Variables: The compiler reserves space for all local variables defined within the procedure.
  • Saving Registers (Optional): If the procedure will call other procedures or use certain registers, it may save the current register values in the stack frame to preserve their state.

Deallocation

When a procedure finishes executing, its stack frame must be removed from the call stack so control can return to the calling procedure. This process, known as stack frame deallocation or stack unwinding, typically involves the following steps:

  • Restoring Saved Registers (if any): Any registers saved at the beginning of the procedure are restored to their previous values to ensure the caller's state is unchanged.
  • Releasing Local Variable Space: The stack pointer is adjusted (incremented) to free the memory that was used for local variables and temporary data.
  • Restoring the Old Frame Pointer: The previously saved frame pointer is popped from the stack, restoring the caller's frame context.
  • Returning to the Caller: The return address is popped from the stack, and control is transferred back to that address, resuming execution in the calling procedure.

Dynamic Control Flow

Dynamic control flow refers to control flow where the target of a jump or call is determined at runtime rather than at compile time. This contrasts with static control flow, where the compiler knows the exact destination address at compile time.

Types of Dynamic Control Flow

Function Pointers

Function pointers allow procedures to be called indirectly by storing procedure addresses in variables and jumping to those addresses at runtime.

Polymorphic Dispatch

In object-oriented languages like Java, method calls on objects use polymorphic dispatch to determine which implementation to call based on the object's actual type at runtime.

Jump Tables

Jump tables enable efficient implementation of multi-way branches, such as switch statements, by storing destination addresses in an array indexed by the condition value.

Implementation

Dynamic control flow requires:

Function Pointer

A function pointer is a variable that stores the address of a procedure. Function pointers enable dynamic control flow by allowing the target of a procedure call to be determined at runtime.

Declaration in C

<return-type> (*<var-name>)(<argument-list>);

Example

void ping() {
    printf("ping\n");
}

void foo() {
    void (*aFunc)();    // Declare function pointer
    aFunc = ping;       // Assign function address
    aFunc();            // Call through pointer (calls ping)
}

Applications

Function pointers enable several important programming patterns:

Polymorphism

In C, function pointers can implement polymorphic behavior similar to object-oriented languages:

struct Animal {
    void (*makeSound)(void*);
};

Generic Functions

Function pointers allow writing generic functions that can perform arbitrary operations:

void map(int (*fn)(int, int), int n, int *a, int *b, int *result) {
    for (int i = 0; i < n; i++)
        result[i] = fn(a[i], b[i]);
}

Callbacks

Function pointers enable event-driven programming by registering handlers to be called later:

void diskRead(char* buf, int blockNum, int numBytes, 
              void (*whenComplete)(char*, int, int));

Polymorphism

Polymorphism is the ability of different types of objects to respond to the same method call in different ways. In object-oriented programming, polymorphism allows a single interface to represent different underlying forms (data types).

Types of Polymorphism

Subtype Polymorphism (Runtime Polymorphism)

The most common form in OOP, where a variable of a parent type can refer to objects of any subtype, and method calls are dispatched to the appropriate implementation based on the object's actual type at runtime.

class Animal {
    void makeSound() { System.out.println("Some sound"); }
}

class Dog extends Animal {
    void makeSound() { System.out.println("Bark"); }
}

class Cat extends Animal {
    void makeSound() { System.out.println("Meow"); }
}

Animal a = new Dog();
a.makeSound();  // Prints "Bark" (not "Some sound")

Polymorphic Dispatch

Polymorphic dispatch (also called dynamic dispatch) is the mechanism that determines which method implementation to call at runtime:

  1. The variable has a static apparent type (e.g., Animal)
  2. The object has a dynamic actual type (e.g., Dog)
  3. The actual type must be a subtype of the apparent type
  4. Method calls invoke the actual type's implementation

Jump Table

A jump table is an array of addresses that enables efficient multi-way branching. Jump tables are commonly used to implement switch statements by indexing into the table based on the condition value and jumping to the corresponding address.

Structure

A jump table is simply an array where each element stores the address of a code label:

void *jt[4] = { &&L0, &&L1, &&L2, &&L3 };

In SM213 Assembly:

jmptable:
    .long case0    # Address of case 0
    .long case1    # Address of case 1
    .long case2    # Address of case 2
    .long default  # Address of default

Switch Statement Implementation

C Code

switch (i) {
    case 0: j = 10; break;
    case 1: j = 11; break;
    case 2: j = 12; break;
    default: j = 14; break;
}

With Goto and Jump Table

static const void* jt[3] = { &&L0, &&L1, &&L2 };
if (i < 0 || i > 2) goto DEFAULT;
goto *jt[i];

L0: j = 10; goto CONT;
L1: j = 11; goto CONT;
L2: j = 12; goto CONT;
DEFAULT: j = 14;
CONT:

In Assembly

    ld $i, r0           # r0 = &i
    ld (r0), r0         # r0 = i
    ld $-3, r1          # r1 = -3
    add r0, r1          # r1 = i - 3
    bgt r1, default     # if i > 2, goto default
    ld $i, r0           # r0 = i (reload)
    beq r0, default     # if i < 0, goto default
    ld $jmptable, r1    # r1 = &jmptable
    j *(r1, r0, 4)      # Jump to jmptable[i]

Handling Gaps and Offsets

Gaps in case Labels

If case labels have gaps, the jump table includes the default address for missing cases:

switch (i) {
    case 20: j = 10; break;
    case 21: j = 11; break;
    case 23: j = 13; break;  // Note: case 22 missing
    default: j = 14; break;
}

Jump table for cases 20-23:

void *jt[4] = { &&L20, &&L21, &&DEFAULT, &&L23 };
goto *jt[i - 20];  // Normalize to 0-based index

Normalization

Case labels are normalized by subtracting the minimum case value:

if (i < 20 || i > 23) goto DEFAULT;
goto *jt[i - 20];  // Maps 20→0, 21→1, 22→2, 23→3

Performance Tradeoffs

Jump Table Advantages

  • Constant time: O(1) lookup regardless of number of cases
  • Efficient for dense cases: When case values are close together

Jump Table Disadvantages

  • Space overhead: Requires memory for all values in range
  • Inefficient for sparse cases: Wastes space for missing values

When to Use

The compiler chooses between jump tables and if-else chains based on:

Scenario Best Approach
Dense cases (few gaps) Jump table
Sparse cases (many gaps) If-else chain
Few cases (< 4-5) If-else chain
Large range (e.g., 1000-9000) If-else chain
Moderate range with most values Jump table

Switch Statement

A switch statement is a multi-way branch control flow construct that selects one of many code blocks to execute based on the value of an expression. Switch statements provide a more readable and potentially more efficient alternative to long chains of if-else statements.

Syntax in C

switch (expression) {
    case constant1:
        // code block 1
        break;
    case constant2:
        // code block 2
        break;
    default:
        // default code block
        break;
}

Restrictions

Switch statements have specific restrictions that enable efficient implementation:

  • Case labels must be constants: Values must be known at compile time
  • Cardinal types only: Case values must be integers or characters (until Java 1.7, which added strings)
  • Equality comparison: Only tests equality, not ranges or complex conditions
  • Single expression: All cases test the same expression

These restrictions allow the compiler to implement switches using jump tables for efficient O(1) branching.

Implementation Strategies

If-Else Chain

For sparse cases or few options, the compiler uses sequential comparisons:

if (i == 20) {
    j = 10;
} else if (i == 21) {
    j = 11;
} else if (i == 42) {
    j = 12;
} else {
    j = 14;
}

Jump Table

For dense cases, the compiler uses a jump table:

// Original switch
switch (i) {
    case 20: j = 10; break;
    case 21: j = 11; break;
    case 22: j = 12; break;
    default: j = 14; break;
}

// Equivalent with jump table
void *jt[3] = { &&L20, &&L21, &&L22 };
if (i < 20 || i > 22) goto DEFAULT;
goto *jt[i - 20];

L20: j = 10; goto CONT;
L21: j = 11; goto CONT;
L22: j = 12; goto CONT;
DEFAULT: j = 14;
CONT:

Assembly Implementation

In SM213 Assembly, jump tables enable efficient switches:

    ld $i, r0           # r0 = &i
    ld (r0), r0         # r0 = i
    
    # Check bounds: if i < 20 or i > 22, goto default
    ld $-20, r1
    add r0, r1          # r1 = i - 20
    ld $0, r2
    bgt r2, r1          # if r1 < 0 (i < 20), goto default
    ld $3, r2
    bgt r1, r2          # if r1 > 2 (i > 22), goto default
    
    # Jump through table
    ld $jmptable, r2
    j *(r2, r1, 4)      # jmptable[i - 20]

case20:
    ld $10, r1
    br done
case21:
    ld $11, r1
    br done
case22:
    ld $12, r1
    br done
default:
    ld $14, r1
    br done

done:
    ld $j, r0
    st r1, (r0)

jmptable:
    .long case20
    .long case21
    .long case22

Compiler Optimization

The compiler chooses the implementation strategy based on:

Cases Strategy Reason
Sparse (large gaps) If-else chain Jump table would waste memory
Few cases (<4-5) If-else chain Simpler, similar performance
Dense (small gaps) Jump table O(1) lookup, worth the space
Large range If-else chain Jump table too large

Fall-through Behavior

Without break, execution continues to the next case:

switch (i) {
    case 1:
        printf("One ");
        // falls through
    case 2:
        printf("or Two\n");
        break;
}

This is rarely desired and often causes bugs, which is why compilers warn about missing break statements.

Generic Programming

Generic programming is a programming paradigm where algorithms and data structures are written in terms of types that are specified later, allowing the same code to work with different data types. Generic code increases code reuse and reduces duplication.

In C Using Function Pointers

C achieves generic programming through function pointers and void* pointers, allowing functions to operate on any data type:

Generic Iteration

void map(int (*fn)(int, int), int n, int *s0, int *s1, int *dest) {
    for (int i = 0; i < n; i++)
        dest[i] = fn(s0[i], s1[i]);
}

int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }

map(add, 3, array1, array2, result);  // Add arrays
map(mul, 3, array1, array2, result);  // Multiply arrays

Generic Sorting

The C standard library's qsort function is fully generic:

void qsort(void *base, size_t nel, size_t width,
           int (*compare)(const void*, const void*));

Usage examples:

// Sort integers
int cmpInt(const void *a, const void *b) {
    int ia = *(int*)a;
    int ib = *(int*)b;
    return ia - ib;
}
qsort(intArray, n, sizeof(int), cmpInt);

// Sort strings
int cmpString(const void *a, const void *b) {
    char *sa = *(char**)a;
    char *sb = *(char**)b;
    return strcmp(sa, sb);
}
qsort(strArray, n, sizeof(char*), cmpString);

Using typedef for Clarity

The typedef statement creates type aliases that make generic code more readable:

typedef void* element_t;

int partition(element_t* array, int left, int right, 
              int (*cmp)(element_t, element_t)) {
    element_t pivotValue, t;
    // ... implementation
}

Without typedef, this would use confusing void** and void* throughout.

Generic Data Structure Operations

Function pointers enable generic operations on data structures:

struct node {
    int key;
    int value;
    struct node *next;
};

void traverse_list(struct node *list, void (*fn)(int, int)) {
    while (list) {
        fn(list->key, list->value);
        list = list->next;
    }
}

void print_element(int key, int value) {
    printf("%d: %d\n", key, value);
}

void sum_values(int key, int value) {
    total += value;
}

traverse_list(myList, print_element);  // Print all
traverse_list(myList, sum_values);     // Sum all

Trade-offs

Advantages

  • Code reuse: Write once, use with many types
  • Maintainability: Changes to algorithm affect all uses
  • Abstraction: Separates algorithm from data types

Disadvantages

  • Type safety: void* loses compile-time type checking
  • Complexity: More difficult to read and understand
  • Debugging: Harder to trace through function pointer calls
  • Performance: Indirect calls have slight overhead

In Other Languages

Other languages provide better generic programming support:

  • C++: Templates with compile-time type checking
  • Java: Generics with type erasure
  • Racket: Higher-order functions like map, foldl, filter

Relationship to Polymorphism

Generic programming is related to polymorphism but distinct:

  • Generic programming: Same code for different types (parametric)
  • Polymorphism: Different code for different types (subtype-based)

Both use function pointers in C but serve different purposes.

Memory Bus

The memory bus is a data and control path that connects the CPU, main memory, and the I/O bus. The memory bus operates at high speeds to match the performance requirements of the CPU and memory system.

I-O Bus

The I/O bus is a data and control path that connects the memory bus to I/O controllers. The I/O bus operates at a slower speed than the memory bus since I/O devices are typically much slower than the CPU and main memory.

I-O Controller

An I/O controller is a specialized processor that runs firmware to manage one specific type of I/O device. The controller connects the I/O device to the I/O bus, handling the translation between the device's native protocols and the bus communication protocols.

Each I/O controller is typically assigned a range of I/O-mapped memory addresses at boot time, allowing the CPU to communicate with it using standard load and store instructions.

I-O Device

An I/O device is a hardware mechanism that generates or consumes data, allowing a computer system to interact with the external world. I/O devices connect to the system through I/O controllers.

Common examples include:

  • Input devices: keyboard, mouse, touchscreen
  • Output devices: monitor, printer, speakers
  • Storage devices: hard disk, SSD, USB drive
  • Network devices: Ethernet adapter, Wi-Fi card
  • Imaging devices: camera, scanner

I-O Mapped Memory

I/O-mapped memory refers to memory addresses that are beyond the end of main memory and are instead mapped to I/O controllers. Each controller is assigned a specific range of addresses, typically configured at boot time.

When the CPU performs load or store instructions to I/O-mapped memory addresses, these operations are translated into messages sent across the I/O bus to the corresponding controller. This mechanism allows the CPU to communicate with I/O devices using the same instructions it uses for regular memory access.

For example, an address like 0x80000000 might be mapped to a disk controller's control registers, allowing the CPU to send commands by writing to that address.

Programmed Input-Output

Programmed I/O (PIO) is a data transfer method where the CPU directly transfers data one byte at a time between itself and an I/O controller. The CPU uses standard load and store instructions to I/O-mapped memory addresses to perform these transfers. PIO is always initiated by the CPU, not by the I/O controller.

Limitations

PIO has several significant limitations:

  • CPU overhead: The CPU must actively transfer each byte, wasting cycles that could be used for computation
  • Speed mismatch: I/O devices and controllers are much slower than the CPU, forcing the CPU to wait frequently
  • Unidirectional initiation: Only the CPU can initiate transfers; controllers cannot initiate communication even when they have data available

These limitations make PIO inefficient for transferring large amounts of data or handling asynchronous device events, leading to the development of DMA and interrupts.

Polling (Computer Science)

Polling is a technique where the CPU repeatedly checks an I/O controller to determine if data is available or if the device is ready. The CPU uses PIO to read status registers from the controller in a loop until the desired condition is met.

Example

int pollDeviceForValue() {
    volatile int *ready = 0x80000100;
    volatile int *value = 0x80000104;
    while (!*ready) {}  // Busy-wait until ready
    return *value;
}

When Polling is Acceptable

Polling is acceptable when:

  • The poll operation has low overhead
  • There is a high probability that data will be available on each poll
  • The device responds quickly

Problems with Polling

  • CPU waste: Reading I/O-mapped memory is slow, and the CPU wastes cycles waiting
  • Inefficient for rare events: If data is rarely available, most polls are wasted
  • Better alternatives exist: Interrupts allow the controller to signal the CPU only when needed, eliminating busy-waiting

Direct Memory Access

Direct Memory Access (DMA) is a capability that allows an I/O controller to directly transfer data to and from main memory without CPU involvement. DMA enables the CPU and I/O controllers to work in parallel, with controllers autonomously moving data while the CPU performs other tasks.

Operation

The CPU configures DMA transfers using PIO to send the controller:

  • The memory address for the transfer
  • The number of bytes to transfer
  • The direction (read from or write to memory)

Once configured, the controller performs the transfer independently. When complete, the controller typically signals the CPU using an interrupt.

Advantages

  • Parallelism: The CPU can continue executing instructions while the controller transfers data
  • Efficiency: Large data transfers don't consume CPU cycles
  • Performance: The system can overlap computation with I/O operations

DMA is essential for high-performance I/O operations such as disk reads/writes and network packet transfers.

Interrupt (Hardware)

A hardware interrupt is a signal from an I/O controller to the CPU that requests immediate attention. Interrupts allow controllers to initiate communication with the CPU, signaling events such as I/O completion, error conditions, or incoming data.

Hardware Implementation

The CPU includes special registers for interrupt handling:

  • isDeviceInterrupting: Set by the I/O controller when it needs attention
  • interruptControllerID: Identifies which controller is interrupting
  • interruptVectorBase: Points to the interrupt vector table (initialized at boot)

The CPU checks isDeviceInterrupting on every clock cycle—a fast hardware poll. When an interrupt is detected, the CPU:

  1. Saves its current state on the stack
  2. Jumps to the appropriate Interrupt Service Routine (ISR) using the interrupt vector table
  3. Executes the ISR
  4. Returns to the interrupted program using a special RTI (Return from Interrupt) instruction

Interrupt Vector Table

The interrupt vector table is an array of function pointers, indexed by controller ID, that points to each controller's ISR. This table is initialized by the operating system at boot time.

Advantages

  • Efficiency: Eliminates polling overhead by allowing controllers to signal only when needed
  • Asynchrony: Controllers can signal unexpected events (e.g., mouse clicks, network packets)
  • Parallelism: The CPU doesn't waste cycles waiting for I/O; it can perform useful work until interrupted

Asynchronous Operation

An asynchronous operation is one that is initiated but does not complete immediately, allowing the program to continue executing while the operation proceeds in the background. The operation's completion is signaled later through some notification mechanism.

Comparison to Synchronous Operations

In a synchronous operation, the program waits (blocks) until the operation completes:

// Synchronous - program waits here
char data = readDiskBlock(blockNum);
processData(data);  // Runs after read completes

In an asynchronous operation, the program continues immediately:

// Asynchronous - program continues immediately
readDiskBlock(blockNum, processDataCallback);
doOtherWork();  // Runs while disk read is in progress
// processDataCallback runs later when read completes

Challenges

Asynchronous operations break the natural sequential flow of programs:

  1. Order of execution: Code written sequentially may execute in a different order
  2. Data dependencies: Values computed asynchronously aren't immediately available
  3. Control flow: The program must be restructured around callbacks or event handlers

Example Problem

// This code appears sequential but won't work!
char checksumDiskData(int blockNum, int numBytes) {
    char buf[numBytes];
    char xsum = 0;
    requestDiskRead(buf, blockNum, numBytes);  // Asynchronous!
    for (int i = 0; i < numBytes; i++)
        xsum ^= buf[i];  // buf not yet filled!
    return xsum;
}

The disk read is asynchronous, so the for loop executes before the data arrives. The code must be restructured to use event handlers or other asynchronous patterns.

Asynchronous operations are necessary for performance when dealing with slow I/O devices, but they make programs significantly harder to write and debug. Modern systems use interrupts and DMA to enable asynchronous I/O.

Event-Driven Programming

Event-driven programming is a programming paradigm where the flow of the program is determined by events—occurrences such as I/O completions, user interactions, or timer expirations. Code is structured around event handlers (callbacks) that execute asynchronously when their associated events occur.

Key Concepts

  • Event: An occurrence that triggers code execution (e.g., disk read completion, mouse click)
  • Event Handler: A function that executes when a specific event occurs
  • Event Registration: Associating a handler with an event
  • Event Firing: Triggering the execution of registered handlers

Example

void computeChecksum(char *buf, int blockNum, int numBytes) {
    char xsum = 0;
    for (int i = 0; i < numBytes; i++)
        xsum ^= buf[i];
    printf("Checksum: %d\n", xsum);
    free(buf);
}

void checksumDiskData(int blockNum, int numBytes) {
    char *buf = malloc(numBytes);
    // Register handler and initiate read
    diskRead(buf, blockNum, numBytes, computeChecksum);
    // Function returns immediately; handler runs later
}

Challenges

Event-driven code is inherently more difficult than sequential code:

  • Callback hell: Deeply nested callbacks make code hard to read and maintain
  • Control flow: The execution order is non-obvious from reading the code
  • Debugging: Stack traces are fragmented across multiple callback invocations
  • State management: Sharing data between handlers requires careful management

Why Use Event-Driven Programming

Despite the difficulties, event-driven programming is necessary when dealing with asynchronous operations like I/O. The CPU can execute millions of instructions while waiting for a single disk read; event-driven code allows useful work during these waits.

Event-driven programming is common in:

  • GUI applications (handling user input events)
  • Web programming (JavaScript, Node.js)
  • Operating systems (handling interrupts)
  • Network programming (handling incoming connections and data)

Thread

A thread is an abstraction that allows a program to execute multiple independent streams of control concurrently on a single physical processor. Threads can be thought of as lightweight processes that share the same address space and resources.

Threads enable programs to handle asynchronous operations, such as input and output, without blocking the entire program. Instead of waiting for an operation to complete, a thread can be paused and resumed, allowing other threads to run in the meantime.

States

A thread can be in one of several states:

  • Nascent: Newly created but not yet runnable
  • Runnable: Ready to execute but waiting for CPU time
  • Running: Currently executing on the CPU
  • Blocked: Waiting for an event (e.g., I/O completion)
  • Dead: Finished execution

Transitions between states are managed by the operating system or thread library.

Operations

Creation

Threads are created by a function which typically takes a procedure and arguments. This "forks" the control flow, starting a new thread that runs concurrently.

Blocking and Unblocking

  • Block: Save the thread's state and stops it until unblocked.
  • Unblock: Unblock moves a blocked thread back to runnable.

Joining

Joining a thread blocks the current thread until the joined thread completes, optionally retrieving its return value. This provides synchronous behavior.

Yielding

Yielding temporarily pauses the current thread, allowing others to run. Notably the thread remains runnable.

Switching

Thread switching occurs at the instruction level:

  1. Save current thread's registers to its stack
  2. Save stack pointer to thread control block
  3. Load new thread's stack pointer from its thread control block
  4. Restore registers from the new stack

The program counter is handled implicitly via the stack, as the return address is saved/restored.

Scheduling Policies

Scheduling policies determine which runnable thread is selected to run next on the CPU when multiple threads are ready. They balance competing goals such as fairness (equal CPU time), responsiveness (quick reaction to events), priority (important tasks first), and efficiency (maximizing throughput). Policies can be preemptive (interrupting running threads) or non-preemptive (threads run until they yield or block).

Round-robin

In round-robin scheduling, threads take turns running for a fixed time slice (typically 10-100ms). When a thread's time expires, it is preempted and placed back in the runnable queue, allowing the next thread to run. This ensures fair sharing of CPU time among threads of equal priority, preventing any single thread from monopolizing the processor.

Round robin may not prioritize urgent tasks and can introduce overhead from frequent context switches.

Priority-based

Priority-based scheduling assigns each thread a priority level, with higher-priority threads running before lower ones. Within the same priority, round-robin is often used. This is useful for systems where certain tasks (e.g., user interface updates or real-time processing) must respond quickly.

Priorities can be static (fixed) or dynamic (adjusted based on behavior). A drawback is starvation, where low-priority threads may never run if high-priority ones dominate.

Preemption

Preemption allows the scheduler to interrupt a running thread to switch to another, improving responsiveness. It occurs in two main ways: priority preemption (when a higher-priority thread becomes runnable) and time-based preemption (at the end of a time slice). This is implemented using timer interrupts that periodically check if a switch is needed.

Preemptive scheduling is essential for interactive and real-time systems but adds complexity, as threads must handle being interrupted at any point.

Lock

A lock is a primitive for controlling synchronization between concurrent programs. Lock's prevent state from being accessed by multiple threads at the same time. A thread can perform two operations on a lock, attempting to acquire it or release it. When attempting to acquire a lock if it's already "held" by another thread the acquiring thread will enter a waiting state until the lock is released by the holding thread.

Asynchronous Condition Variable

An asynchronous condition variable is a primitive for controlling synchronization between concurrent programs, they pause the execution of certain threads until it receives a signal to continue their execution.

A thread can perform two operations on an asynchronous condition variable, it can either wait for the condition variable to signal, or broadcast which releases all waiting threads.