Static variables are variables in which the compiler allocates memory for them throughout the entire run of the program. Static variables are placed on the data segment of a programs memory and are usually associated before the program is executed.
Endianness refers to the direction in which bytes are transmitted over a data communication method.
Big-endian works by transmitting / storing the most significant byte first.
Little-endian works by transmitting / storing the least significant byte first.
Consider the number 0x12345678 if we were to write this to memory at the index
| Value | Address |
|---|---|
| 0x12 | i |
| 0x34 | i+1 |
| 0x56 | i+2 |
| 0x78 | i+3 |
| Value | Address |
|---|---|
| 0x78 | i |
| 0x56 | i+1 |
| 0x34 | i+2 |
| 0x12 | i+3 |
Notice how in both the actual order of bits in the individual bytes is unchanged.
Byte extension describes the process by which bytes are added when a stored values memory size needs to be increased (ie. from a byte to an int). There are two types of byte extension Signed Extension and Zero Extension. Both methods of byte extensions pad the start of the value instead of the end.
Signed extension extends the bytes, with whatever the sign of the data is, (for example -11, would be padded with 1's and 11 would be padded would 0's). In C signed extension is used for all signed data types.
Zero extension extends the bytes, with zeroes, no matter what the sign of the data is. In C zero extension is used for all unsigned data types.
Memory can be accessed by the CPU, and must be accessed in contiguous, power of two size chunks of bytes.
The memory of a computer is usually split up into 4 main sections.
The code segment contains the compiled machine instructions and is read only to ensure you can't accidentally overwrite your code at runtime.
The data segment contains any global and static variables.
The stack stores local variables, function parameters, and return addresses. The stack typically grows downward towards lower addresses in memory and is automatically managed, (ie. the addresses are freed when functions return).
The heap is used for dynamically allocated memory (ie. malloc in C and new in C++). The heap typically grows upwards towards higher addresses in memory and is manually managed, (ie. you must free the memory when you're done with it).
When a program accesses memory that has already been freed, the contents at that location are undefined because the system may have reallocated that memory for other purposes. Pointers pointing to this freed memory are called dangling pointers.
int *i = new int;
*i = 5;
delete i;
cout << *i << endl;
Here, i points to memory that has been deallocated. Accessing or writing to it can produce unexpected results.
To prevent using freed memory, you can set the pointer to NULL after deletion. This way, dereferencing it will trigger a crash (e.g., segmentation fault), making the bug easier to detect:
int *i = new int;
*i = 5;
delete i;
i = NULL;
cout << *i << endl; // Seg fault
When a program loses access to allocated space before it has been freed (ie. by reassigning a pointer), the space can no loner be referenced, or freed. This memory therefore remains marked as allocated for the lifetime of the program. Typically memory leaks occur with memory on the heap.
int *i = new int;
int b = 2
*i = 5;
i = &b;
The original address of
iwas lost and can no longer be freed causing a memory leak
A memory address describes the location in memory where a given variable or identifier stores data.
The null address is 0x0000 and represents an empty value. Trying to dereference this address with throw an error as it doesn't exist.
A pointer is a variable which stores a memory address.
Pointer variables are defined as follows:
int *ptr;
where int can be any data type and defines the type of value at that pointers address. Additionally ptr is just the name of the variable and can be any valid name.
The address operator gets the address of an existing variable and is expressed by placing a & before the variable.
int x = 23;
int *p = &x;
The variable p now stores the address of x.
The Dereferencing operator gets value at an address is expressed by placing a * before the pointer.
int x = 23;
int *p = &x;
*p = 50;
cout << x << endl; // Outputs 50
new KeywordThe new keyword allocates new space on the heap and returns the first address of the allocated space.
int *b = new int;
delete Keywordint *b = new int;
*b = 2;
delete b;
For arrays allocated on the heap the delete[] keyword can be used
int *b = new int[3]{1, 2, 3};
delete[] b;
The delete keyword releases the memory at the address referenced by the pointer variable passed to it.
Because pointers can "point" to any address in memory we can create pointers to pointers
int x = 5;
int* p = &x; // Memory address of `x`
int** q = &p; // Memory address of `p`
int*** r = &q; // Memory address of `r`
cout << **r << endl; // Memory address of `x` (value of `p)
cout << **q << endl; // Value of `x`
cout << ***r << endl; // Value of `x`
Address alignment describes the process by which addresses are allocated such that they are multiples of the object's size. Aligned addresses are faster to access on most CPU's and on some unaligned addresses are not supported.
Structures are aligned differently to basic data types as they are composites, their alignment follows the following 3 rules:
When padding is introduced it must always be added between or after fields, ensuring that the alignment of the field before the padding is maintained.
Given the following struct determine the offset of b as well as the total size of the struct and the alignment
struct ExampleOne {
char a;
int b;
short c;
};
The individual fields have sizes of 1, 4, and 2 bytes respectively meaning the overall alignment of the struct must be 4 bytes, the size of the fields in the struct total to 7 bytes, meaning it must be padded up to 12 bytes, we can do this by adding padding of 3 bytes to a and 4 bytes to the end of c.
| Field | Size (Bytes) | Offset |
|---|---|---|
char a; |
1 | 0 |
| padding | 3 | |
int b; |
4 | 4 |
short c; |
2 | 6 |
| padding | 4 | |
| Total: 12 |
Reference counting is a method of managing memory to decrease the risks common memory pitfalls like memory leaks or dangling pointers. Reference counting follows 4 basic rules to perform this
When a reference is passed to a procedure as a parameter, the caller necessarily must maintain it for the duration of the call so the callee doesn't need to call keep_ref, unless the callee saves the pointer somewhere that will outlast the call (ie. a global variable), or the callee returns the pointer to the caller.
When a reference is returned from a procedure, the callee must have a reference to the value which it returns to the caller, when it returns it though it transfers that reference to the caller, it's then the callers responsibility to call free_ref when it no longer needs the reference.
References' stored in local variables necessarily must go away when the procedure returns so the procedure must call free_ref before it returns.
If the struct you're reference counting has a concrete type reference counting can be implemented as follows:
struct buffer *rc_malloc() {
struct buffer *buf = malloc(sizeof(struct buffer));
buf->ref_count = 1;
return buf;
}
void rc_keep_ref(struct buffer *buf) {
buf->ref_count++;
}
void rc_free_ref(struct buffer *buf) {
buf->ref_count--;
if (buf->ref_count == 0) free(buf);
}
A more general method can be used which works for any input type, this is done by reserving 8 bytes before the actual allocated memory for the ref count.
void* rc_malloc(int num_bytes) {
// Add 8 bytes to store the ref_count
int *ref_count = malloc(n+8);
*ref_count = 1;
return ((void*) ref_count) + 8;
}
void rc_keep_ref(void* ptr) {
int *ref_count = ptr - 8;
(*ref_count)++;
}
void rc_free_ref(void* ptr) {
int *ref_count = ptr - 8;
(*ref_count)--;
if (*ref_count == 0) free(ref_count);
}
A conditional is a type of control flow that branches based on one or more expressions.
int a;
scanf("%d", &a);
if (a % 2 == 0) {
printf("%d is even", a);
} else {
printf("%d is odd", a);
}
Example of a conditional statement that checks whether a number is even or odd
At the assembly level, conditionals are implemented using jumps as well as logic for the condition. The compiler translates the boolean expression of the condition into arithmetic ones depending on the architecture.
<init>
'conditional setup
goto then if <condition>
else:
'alternative body
goto end_if
then:
'consequent body
end_if:
Given the C code:
int a = 5;
int b = 2;
if (a > b) {
// consequent
} else {
// alternative
}
write an equivalent conditional in assembly
.pos 0x1000
# <init>
ld $a, r0 # r0 = &a
ld (r0), r0 # r0 = a
ld $b, r1 # r1 = &b
ld (r1), r1 # r1 = b
mv r1, r2 # r2 = b
not r2 # r2 = !b
inc r2 # r2 = -b
add r0, r2 # r2 = a-b
bgt r2, then # a-b > 0 => goto then
else:
# alternative
br end_if
then:
# consequent
end_if:
.pos 0x2000
a: .long 5
b: .long 2
A for loop is a type of control flow statement that repeatedly runs a section of code until a condition is satisfied.
for (int i = 0; i < 3; ++i) {
printf("%d", i);
}
Example of a for loop that prints "012" in C
At the assembly level, for loops are implemented using conditional and unconditional jumps. The compiler translates the initialization, condition check, increment, and loop body into equivalent jump-based control flow.
<init>
loop_start:
'evaluate condition
goto end_loop if not <continue_condition>
'loop body
...
'increment and repeat
<step>
goto loop_start
end_loop:
Given the C code:
for (int i = 0; i < 3; ++i) {
// ...
}
write an equivalent for loop in assembly
.pos 0x1000
# <init>
ld $i, r0 # r0 = &i
ld (r0), r0 # r0 = i
ld $-3, r1 # r1 = 3
loop_start:
mov r0, r2 # r2 = i
add r1, r2 # r2 = i - 3 (allows us to use beq)
# evaluate condition
beq r1, end_loop # go to just after loop
# loop body
# increment and repeat
inc r0
br loop # go to start of loop
end_loop:
.pos 0x2000
i: .long 0
The program counter is a special register that memory address of the next instruction to be executed, it is continually incremented after an instruction is fetched. Control transfers, like jumping or branching can change the program counter in ways other than the standard basic incrementation.
When a subroutine is called it branches as well as storing the preceding contents of the program counter somewhere, so that when it returns the saved contents of the program counter can be placed back into the program counter which can then resume sequential execution after the subroutine call.
A procedure is the higher level generalization of a subroutine with its own name, arguments and local scope. Procedures can be called which causes their encased code to be run, and possibly a result bound when the function returns.
int fib(n) {
if (n == 0 || n == 1) return n;
return fib(n-1) + fib(n-2);
}
Example of a procedure that returns the nth value of the Fibonacci sequence in C.
Procedures can be implemented using a call stack to store and retrieve required information about the procedure such as the return address and any arguments.
The call stack is a specific implementation of a stack which stores relevant information for a procedure to execute.
The call stack is typically at the highest memory address possible for a given program and then grows downwards (to lower memory addresses) towards the heap.
Each time a procedure is called a new entry is created (called a "stack frame"). Each stack frame stores local variables of the procedure, the return address and any arguments passed to the procedure.
Additionally, the stack frame might include saved registers if the procedure calls another procedure. When a procedure is called usually the start of this stack frame is written to a common register before invocation.
When a new stack frame is allocated, space is reserved on the call stack for all data the procedure needs during its execution. This process typically involves:
When a procedure finishes executing, its stack frame must be removed from the call stack so control can return to the calling procedure. This process, known as stack frame deallocation or stack unwinding, typically involves the following steps:
Dynamic control flow refers to control flow where the target of a jump or call is determined at runtime rather than at compile time. This contrasts with static control flow, where the compiler knows the exact destination address at compile time.
Function pointers allow procedures to be called indirectly by storing procedure addresses in variables and jumping to those addresses at runtime.
In object-oriented languages like Java, method calls on objects use polymorphic dispatch to determine which implementation to call based on the object's actual type at runtime.
Jump tables enable efficient implementation of multi-way branches, such as switch statements, by storing destination addresses in an array indexed by the condition value.
Dynamic control flow requires:
A function pointer is a variable that stores the address of a procedure. Function pointers enable dynamic control flow by allowing the target of a procedure call to be determined at runtime.
<return-type> (*<var-name>)(<argument-list>);
void ping() {
printf("ping\n");
}
void foo() {
void (*aFunc)(); // Declare function pointer
aFunc = ping; // Assign function address
aFunc(); // Call through pointer (calls ping)
}
Function pointers enable several important programming patterns:
In C, function pointers can implement polymorphic behavior similar to object-oriented languages:
struct Animal {
void (*makeSound)(void*);
};
Function pointers allow writing generic functions that can perform arbitrary operations:
void map(int (*fn)(int, int), int n, int *a, int *b, int *result) {
for (int i = 0; i < n; i++)
result[i] = fn(a[i], b[i]);
}
Function pointers enable event-driven programming by registering handlers to be called later:
void diskRead(char* buf, int blockNum, int numBytes,
void (*whenComplete)(char*, int, int));
Polymorphism is the ability of different types of objects to respond to the same method call in different ways. In object-oriented programming, polymorphism allows a single interface to represent different underlying forms (data types).
The most common form in OOP, where a variable of a parent type can refer to objects of any subtype, and method calls are dispatched to the appropriate implementation based on the object's actual type at runtime.
class Animal {
void makeSound() { System.out.println("Some sound"); }
}
class Dog extends Animal {
void makeSound() { System.out.println("Bark"); }
}
class Cat extends Animal {
void makeSound() { System.out.println("Meow"); }
}
Animal a = new Dog();
a.makeSound(); // Prints "Bark" (not "Some sound")
Polymorphic dispatch (also called dynamic dispatch) is the mechanism that determines which method implementation to call at runtime:
Animal)Dog)
A jump table is an array of addresses that enables efficient multi-way branching. Jump tables are commonly used to implement switch statements by indexing into the table based on the condition value and jumping to the corresponding address.
A jump table is simply an array where each element stores the address of a code label:
void *jt[4] = { &&L0, &&L1, &&L2, &&L3 };
In SM213 Assembly:
jmptable:
.long case0 # Address of case 0
.long case1 # Address of case 1
.long case2 # Address of case 2
.long default # Address of default
switch (i) {
case 0: j = 10; break;
case 1: j = 11; break;
case 2: j = 12; break;
default: j = 14; break;
}
static const void* jt[3] = { &&L0, &&L1, &&L2 };
if (i < 0 || i > 2) goto DEFAULT;
goto *jt[i];
L0: j = 10; goto CONT;
L1: j = 11; goto CONT;
L2: j = 12; goto CONT;
DEFAULT: j = 14;
CONT:
ld $i, r0 # r0 = &i
ld (r0), r0 # r0 = i
ld $-3, r1 # r1 = -3
add r0, r1 # r1 = i - 3
bgt r1, default # if i > 2, goto default
ld $i, r0 # r0 = i (reload)
beq r0, default # if i < 0, goto default
ld $jmptable, r1 # r1 = &jmptable
j *(r1, r0, 4) # Jump to jmptable[i]
If case labels have gaps, the jump table includes the default address for missing cases:
switch (i) {
case 20: j = 10; break;
case 21: j = 11; break;
case 23: j = 13; break; // Note: case 22 missing
default: j = 14; break;
}
Jump table for cases 20-23:
void *jt[4] = { &&L20, &&L21, &&DEFAULT, &&L23 };
goto *jt[i - 20]; // Normalize to 0-based index
Case labels are normalized by subtracting the minimum case value:
if (i < 20 || i > 23) goto DEFAULT;
goto *jt[i - 20]; // Maps 20→0, 21→1, 22→2, 23→3
The compiler chooses between jump tables and if-else chains based on:
| Scenario | Best Approach |
|---|---|
| Dense cases (few gaps) | Jump table |
| Sparse cases (many gaps) | If-else chain |
| Few cases (< 4-5) | If-else chain |
| Large range (e.g., 1000-9000) | If-else chain |
| Moderate range with most values | Jump table |
A switch statement is a multi-way branch control flow construct that selects one of many code blocks to execute based on the value of an expression. Switch statements provide a more readable and potentially more efficient alternative to long chains of if-else statements.
switch (expression) {
case constant1:
// code block 1
break;
case constant2:
// code block 2
break;
default:
// default code block
break;
}
Switch statements have specific restrictions that enable efficient implementation:
These restrictions allow the compiler to implement switches using jump tables for efficient O(1) branching.
For sparse cases or few options, the compiler uses sequential comparisons:
if (i == 20) {
j = 10;
} else if (i == 21) {
j = 11;
} else if (i == 42) {
j = 12;
} else {
j = 14;
}
For dense cases, the compiler uses a jump table:
// Original switch
switch (i) {
case 20: j = 10; break;
case 21: j = 11; break;
case 22: j = 12; break;
default: j = 14; break;
}
// Equivalent with jump table
void *jt[3] = { &&L20, &&L21, &&L22 };
if (i < 20 || i > 22) goto DEFAULT;
goto *jt[i - 20];
L20: j = 10; goto CONT;
L21: j = 11; goto CONT;
L22: j = 12; goto CONT;
DEFAULT: j = 14;
CONT:
In SM213 Assembly, jump tables enable efficient switches:
ld $i, r0 # r0 = &i
ld (r0), r0 # r0 = i
# Check bounds: if i < 20 or i > 22, goto default
ld $-20, r1
add r0, r1 # r1 = i - 20
ld $0, r2
bgt r2, r1 # if r1 < 0 (i < 20), goto default
ld $3, r2
bgt r1, r2 # if r1 > 2 (i > 22), goto default
# Jump through table
ld $jmptable, r2
j *(r2, r1, 4) # jmptable[i - 20]
case20:
ld $10, r1
br done
case21:
ld $11, r1
br done
case22:
ld $12, r1
br done
default:
ld $14, r1
br done
done:
ld $j, r0
st r1, (r0)
jmptable:
.long case20
.long case21
.long case22
The compiler chooses the implementation strategy based on:
| Cases | Strategy | Reason |
|---|---|---|
| Sparse (large gaps) | If-else chain | Jump table would waste memory |
| Few cases (<4-5) | If-else chain | Simpler, similar performance |
| Dense (small gaps) | Jump table | O(1) lookup, worth the space |
| Large range | If-else chain | Jump table too large |
Without break, execution continues to the next case:
switch (i) {
case 1:
printf("One ");
// falls through
case 2:
printf("or Two\n");
break;
}
This is rarely desired and often causes bugs, which is why compilers warn about missing break statements.
Generic programming is a programming paradigm where algorithms and data structures are written in terms of types that are specified later, allowing the same code to work with different data types. Generic code increases code reuse and reduces duplication.
C achieves generic programming through function pointers and void* pointers, allowing functions to operate on any data type:
void map(int (*fn)(int, int), int n, int *s0, int *s1, int *dest) {
for (int i = 0; i < n; i++)
dest[i] = fn(s0[i], s1[i]);
}
int add(int a, int b) { return a + b; }
int mul(int a, int b) { return a * b; }
map(add, 3, array1, array2, result); // Add arrays
map(mul, 3, array1, array2, result); // Multiply arrays
The C standard library's qsort function is fully generic:
void qsort(void *base, size_t nel, size_t width,
int (*compare)(const void*, const void*));
Usage examples:
// Sort integers
int cmpInt(const void *a, const void *b) {
int ia = *(int*)a;
int ib = *(int*)b;
return ia - ib;
}
qsort(intArray, n, sizeof(int), cmpInt);
// Sort strings
int cmpString(const void *a, const void *b) {
char *sa = *(char**)a;
char *sb = *(char**)b;
return strcmp(sa, sb);
}
qsort(strArray, n, sizeof(char*), cmpString);
The typedef statement creates type aliases that make generic code more readable:
typedef void* element_t;
int partition(element_t* array, int left, int right,
int (*cmp)(element_t, element_t)) {
element_t pivotValue, t;
// ... implementation
}
Without typedef, this would use confusing void** and void* throughout.
Function pointers enable generic operations on data structures:
struct node {
int key;
int value;
struct node *next;
};
void traverse_list(struct node *list, void (*fn)(int, int)) {
while (list) {
fn(list->key, list->value);
list = list->next;
}
}
void print_element(int key, int value) {
printf("%d: %d\n", key, value);
}
void sum_values(int key, int value) {
total += value;
}
traverse_list(myList, print_element); // Print all
traverse_list(myList, sum_values); // Sum all
void* loses compile-time type checkingOther languages provide better generic programming support:
map, foldl, filterGeneric programming is related to polymorphism but distinct:
Both use function pointers in C but serve different purposes.
The memory bus is a data and control path that connects the CPU, main memory, and the I/O bus. The memory bus operates at high speeds to match the performance requirements of the CPU and memory system.
The I/O bus is a data and control path that connects the memory bus to I/O controllers. The I/O bus operates at a slower speed than the memory bus since I/O devices are typically much slower than the CPU and main memory.
An I/O controller is a specialized processor that runs firmware to manage one specific type of I/O device. The controller connects the I/O device to the I/O bus, handling the translation between the device's native protocols and the bus communication protocols.
Each I/O controller is typically assigned a range of I/O-mapped memory addresses at boot time, allowing the CPU to communicate with it using standard load and store instructions.
An I/O device is a hardware mechanism that generates or consumes data, allowing a computer system to interact with the external world. I/O devices connect to the system through I/O controllers.
Common examples include:
I/O-mapped memory refers to memory addresses that are beyond the end of main memory and are instead mapped to I/O controllers. Each controller is assigned a specific range of addresses, typically configured at boot time.
When the CPU performs load or store instructions to I/O-mapped memory addresses, these operations are translated into messages sent across the I/O bus to the corresponding controller. This mechanism allows the CPU to communicate with I/O devices using the same instructions it uses for regular memory access.
For example, an address like 0x80000000 might be mapped to a disk controller's control registers, allowing the CPU to send commands by writing to that address.
Programmed I/O (PIO) is a data transfer method where the CPU directly transfers data one byte at a time between itself and an I/O controller. The CPU uses standard load and store instructions to I/O-mapped memory addresses to perform these transfers. PIO is always initiated by the CPU, not by the I/O controller.
PIO has several significant limitations:
These limitations make PIO inefficient for transferring large amounts of data or handling asynchronous device events, leading to the development of DMA and interrupts.
Polling is a technique where the CPU repeatedly checks an I/O controller to determine if data is available or if the device is ready. The CPU uses PIO to read status registers from the controller in a loop until the desired condition is met.
int pollDeviceForValue() {
volatile int *ready = 0x80000100;
volatile int *value = 0x80000104;
while (!*ready) {} // Busy-wait until ready
return *value;
}
Polling is acceptable when:
Direct Memory Access (DMA) is a capability that allows an I/O controller to directly transfer data to and from main memory without CPU involvement. DMA enables the CPU and I/O controllers to work in parallel, with controllers autonomously moving data while the CPU performs other tasks.
The CPU configures DMA transfers using PIO to send the controller:
Once configured, the controller performs the transfer independently. When complete, the controller typically signals the CPU using an interrupt.
DMA is essential for high-performance I/O operations such as disk reads/writes and network packet transfers.
A hardware interrupt is a signal from an I/O controller to the CPU that requests immediate attention. Interrupts allow controllers to initiate communication with the CPU, signaling events such as I/O completion, error conditions, or incoming data.
The CPU includes special registers for interrupt handling:
The CPU checks isDeviceInterrupting on every clock cycle—a fast hardware poll. When an interrupt is detected, the CPU:
The interrupt vector table is an array of function pointers, indexed by controller ID, that points to each controller's ISR. This table is initialized by the operating system at boot time.
An asynchronous operation is one that is initiated but does not complete immediately, allowing the program to continue executing while the operation proceeds in the background. The operation's completion is signaled later through some notification mechanism.
In a synchronous operation, the program waits (blocks) until the operation completes:
// Synchronous - program waits here
char data = readDiskBlock(blockNum);
processData(data); // Runs after read completes
In an asynchronous operation, the program continues immediately:
// Asynchronous - program continues immediately
readDiskBlock(blockNum, processDataCallback);
doOtherWork(); // Runs while disk read is in progress
// processDataCallback runs later when read completes
Asynchronous operations break the natural sequential flow of programs:
// This code appears sequential but won't work!
char checksumDiskData(int blockNum, int numBytes) {
char buf[numBytes];
char xsum = 0;
requestDiskRead(buf, blockNum, numBytes); // Asynchronous!
for (int i = 0; i < numBytes; i++)
xsum ^= buf[i]; // buf not yet filled!
return xsum;
}
The disk read is asynchronous, so the for loop executes before the data arrives. The code must be restructured to use event handlers or other asynchronous patterns.
Asynchronous operations are necessary for performance when dealing with slow I/O devices, but they make programs significantly harder to write and debug. Modern systems use interrupts and DMA to enable asynchronous I/O.
Event-driven programming is a programming paradigm where the flow of the program is determined by events—occurrences such as I/O completions, user interactions, or timer expirations. Code is structured around event handlers (callbacks) that execute asynchronously when their associated events occur.
void computeChecksum(char *buf, int blockNum, int numBytes) {
char xsum = 0;
for (int i = 0; i < numBytes; i++)
xsum ^= buf[i];
printf("Checksum: %d\n", xsum);
free(buf);
}
void checksumDiskData(int blockNum, int numBytes) {
char *buf = malloc(numBytes);
// Register handler and initiate read
diskRead(buf, blockNum, numBytes, computeChecksum);
// Function returns immediately; handler runs later
}
Event-driven code is inherently more difficult than sequential code:
Despite the difficulties, event-driven programming is necessary when dealing with asynchronous operations like I/O. The CPU can execute millions of instructions while waiting for a single disk read; event-driven code allows useful work during these waits.
Event-driven programming is common in:
A thread is an abstraction that allows a program to execute multiple independent streams of control concurrently on a single physical processor. Threads can be thought of as lightweight processes that share the same address space and resources.
Threads enable programs to handle asynchronous operations, such as input and output, without blocking the entire program. Instead of waiting for an operation to complete, a thread can be paused and resumed, allowing other threads to run in the meantime.
A thread can be in one of several states:
Transitions between states are managed by the operating system or thread library.
Threads are created by a function which typically takes a procedure and arguments. This "forks" the control flow, starting a new thread that runs concurrently.
Joining a thread blocks the current thread until the joined thread completes, optionally retrieving its return value. This provides synchronous behavior.
Yielding temporarily pauses the current thread, allowing others to run. Notably the thread remains runnable.
Thread switching occurs at the instruction level:
The program counter is handled implicitly via the stack, as the return address is saved/restored.
Scheduling policies determine which runnable thread is selected to run next on the CPU when multiple threads are ready. They balance competing goals such as fairness (equal CPU time), responsiveness (quick reaction to events), priority (important tasks first), and efficiency (maximizing throughput). Policies can be preemptive (interrupting running threads) or non-preemptive (threads run until they yield or block).
In round-robin scheduling, threads take turns running for a fixed time slice (typically 10-100ms). When a thread's time expires, it is preempted and placed back in the runnable queue, allowing the next thread to run. This ensures fair sharing of CPU time among threads of equal priority, preventing any single thread from monopolizing the processor.
Round robin may not prioritize urgent tasks and can introduce overhead from frequent context switches.
Priority-based scheduling assigns each thread a priority level, with higher-priority threads running before lower ones. Within the same priority, round-robin is often used. This is useful for systems where certain tasks (e.g., user interface updates or real-time processing) must respond quickly.
Priorities can be static (fixed) or dynamic (adjusted based on behavior). A drawback is starvation, where low-priority threads may never run if high-priority ones dominate.
Preemption allows the scheduler to interrupt a running thread to switch to another, improving responsiveness. It occurs in two main ways: priority preemption (when a higher-priority thread becomes runnable) and time-based preemption (at the end of a time slice). This is implemented using timer interrupts that periodically check if a switch is needed.
Preemptive scheduling is essential for interactive and real-time systems but adds complexity, as threads must handle being interrupted at any point.
A lock is a primitive for controlling synchronization between concurrent programs. Lock's prevent state from being accessed by multiple threads at the same time. A thread can perform two operations on a lock, attempting to acquire it or release it. When attempting to acquire a lock if it's already "held" by another thread the acquiring thread will enter a waiting state until the lock is released by the holding thread.
An asynchronous condition variable is a primitive for controlling synchronization between concurrent programs, they pause the execution of certain threads until it receives a signal to continue their execution.
A thread can perform two operations on an asynchronous condition variable, it can either wait for the condition variable to signal, or broadcast which releases all waiting threads.