Big-O notation is a mathematical concept commonly utilized to describe the limiting behaviour of functions with very large inputs. Formally, it can be defined as follows:
is said to be in if there exist a real number and a natural number such that , for all .
Big omega notation written as
Alternatively, it can be defined independently from Big-O notation as follows
is said to be in if there exist a real number and a natural number such that , for all .
Big theta notation written as
Alternatively, it can be defined independently from Big-O notation as follows
is said to be in if there exist two a real numbers and and a natural number such that , for all .
In computer science big-O notation describes a generalization of the mathematical big-O notation to computer programs and is used to describe their behaviour with large input sizes.
Loops don't always take a single iteration approach where the integer starts at
int findInd(vector<int>& arr, int q) {
for (int i = 0; i < arr.size(); i++) {
if (arr[i] == q) return i;
}
return -1;
}
Treating
We arrive at this conclusion as all our operations are treated as constant time operations (thus only adding or multiplying by constants), and our for loop grow proportionally to our input size
bool hasDuplicate(int arr[], int size) {
for (int i = 0; i < size-1; i++) {
for (int j = i+1; j < size; j++) {
if (arr[i] == arr[j]) return true;
}
}
return false;
}
Treating
This can be determined by again looking at the for loops we can see that in the worst case scenario the first for loop is run from 0 to size-1, and the internal loop from 1 to size-1, additionally, we can see that every time the first for loop is run the nested one runs as well, meaning we can multiply the time demands, giving us
void candyApple(int n) {
for (int i = 1; i < n; i *= 3) {
cout << "iteration: " << i << endl;
}
}
Here, the loop variable i starts at 1 and is multiplied by 3 every iteration. That means the number of iterations is the largest integer
Taking logarithms:
So the runtime grows proportionally to
void caramelCorn(int n) {
for (int i = 0; i * i < 6 * n; i++) {
cout << "iteration: " << i << endl;
}
}
Here, the loop condition is i * i < 6n. The loop continues until
So the number of iterations is proportional to
An array is a data structure which contains a "collection" of elements.
Array data can either be allocated static or dynamic, depending on how the constituting variables are initialized.
int data[] = {1, 2, 3, 4, 5};
printf("%d", data[0]); // 1
printf("%d", data[3]); // 4
int *data = malloc(sizeof(int) * 5)
data[0] = 1;
data[1] = 2;
data[2] = 3;
data[3] = 4;
data[4] = 5;
printf("%d", data[0]); // 1
printf("%d", data[3]); // 4
free(data);
Loop invariants are properties of the algorithm or structure that are always true at particular points in the program. Within a loop's body these properties may become violated briefly, but the rest of the loop's instructions fix these properties for the next iteration.
Loop invariants are used in an inductive proof to formally validate the "correctness" of an algorithm. For a loop invariant to be used in a proof features of the code being analyzed must be used to demonstrate that any violations of the loop invariant are restored before the next iteration.
To determine the loop invariant we can roughly follow these steps
int minIndex(vector<int>& arr, int a) {
int m = a;
for (int i = a+1; i < arr.size(); i++) {
if (arr[i] < arr[m]) {
m = i;
}
}
return m;
}
Firstly, we can read through the code and see that it only updates m to be the current index if the element at the current index is smaller than the element at the index m. We can roughly determine then that our exampleFunction returns the index smallest element after a, or a if the element at a is the smallest already.
Running through a few examples we can see that m is always the index of the smallest element of the processed array so far.
Loop Invariant: m is the index of the minimum element in the arr[a..i-1] at the start of the iteration i for all
Base Case: i=a+1
m contains the index of the smallest element of arr[a..a]int m = a;Inductive Step: i = k, a + 1 <= k < arr.size()
k, ie. m contains the index of the smallest element of arr[a..k-1]arr[k] < arr[m]
arr[k] is smaller than any element of arr[a..k-1]m is set to k, and contains the index of the smallest element of arr[a..k], thus loop invariant holds after processing index karr[k] is not smaller than the smallest element of arr[a..k-1]m contains the index of the smallest element of arr[a..k], thus the loop invariant holds after processing index kTermination: i=arr.size()
m contains the index of the smallest element of arr[a..arr.size()-1]a to arr.size()-1m contains the index of the smallest element in the entire subarrayarr.size() is finite, and i increments each iteration towards arr.size() the program must terminate.
A linked list is a type of dynamic data structure that consists of nodes linked together. Each node contains two parameters, data and the address of the next node.
The head is a property of a linked list that represents the first node in the list. When accessing elements the head node is the first one (ie arr.get(0)).
The tail is a property of a linked list that represents the last node in the list.
We can define a linked list abstractly as follows:
template <class T>
struct Node {
T data;
Node * next;
Node(LIT ndata, Node * nx=nullptr): data(ndata),next(nx) {}
};
template <class T>
struct LinkedList {
private:
Node* head;
Node* tail;
int length;
public:
LinkedList();
// ... Ommited
}
To traverse a linked list maintain a pointer to the current node dereferencing it each iteration, if the next parameter of the node is NULL, the whole linked list has been traversed, otherwise update the pointer to the current node to be the next parameter.
void iterateLinkedList(Node<int>* startingNode) {
Node<int>* p = startingNode;
while (p->next != nullptr) {
std::cout << p->data << std::endl;
p = p->next;
}
}
To insert into a linked list maintain a pointer to the node to insert, and a pointer to the node before and after the node to insert, then just set the next parameter of the node before to be the address of the node to insert, and the next parameter of the node to insert to be the node after the node to insert.
void insertNodeIntoLinkedList(
Node<int>* previousNode,
Node<int>* nextNode,
Node<int>* newNode
) {
previousNode->next = newNode;
newNode->next = newNode;
}
To delete from a linked list set the previous node's next to be the node after the one to delete, and then free the memory of the node to delete
void deleteNodeFromLinkedList(
Node<int>* previousNode,
Node<int>* nodeToDelete,
Node<int>* nextNode
) {
previousNode->next = nextNode;
delete nextNode;
}
| Operation | Time Complexity |
|---|---|
| Insertion at Beginning | |
| Insertion at End | |
| Insertion at Position | |
| Deletion at Beginning | |
| Deletion at End | |
| Deletion at Position | |
| Searching for Element |
| Operation | Time Complexity |
|---|---|
| Insertion at End |
A linked list is a variation of the singly linked list where each node also contains a pointer to it's previous node as well as it's next node. This additional reference can introduce performance improvements for insertions and deletions at the cost of additional memory.
| Operation | Time Complexity |
|---|---|
| Insertion at Beginning | |
| Insertion at End | |
| Insertion at Position | |
| Deletion at Beginning | |
| Deletion at End | |
| Deletion at Position | |
| Searching for Element |
Insertion sort is a simple sorting algorithm which starts by splitting the input array into two parts, a sorted part and an unsorted part, the sorted part bigger than the first element in the unsorted part is then shifted forwards one spot until there's a spot for the first element of the unsorted part to be placed within the sorted part.
// Assumes arr[0..p-1] are in sorted order
void slide(vector<int>& arr, int p) {
int temp = arr[p];
int j = p;
while (j > 0 && arr[j-1] > temp) {
arr[j] = arr[j-1];
j--;
}
arr[j] = temp;
}
void insertionSort(vector<int>& arr) {
for (int i = 0; i < arr.size(); i++) {
slide(arr, i);
}
}
Looking at the two loops, we can see that the first loop always iterates from 0 to arr.size(), the inner function on the other hand iterates backwards from i to 0. Using this knowledge we can construct a formula for the runtime
where arr.size()
Loop Invariant: Before iteration i of the for loop, arr[0..i-1] contains the i smallest elements of arr in ascending order.
Base Case: i = 0
i=0, the invariant states that arr[0..-1] contains the 0 smallest elements of arr in ascending order.arr[0..-1] contains no elements the invariant holds in the base case.
Inductive Step: i = k, 0 ≤ k < arr.size()k, i.e. arr[0..k-1] contains the k smallest elements of arr in ascending order.slide(arr, k) requires arr[0..k-1] to be sorted, which is guaranteed by the invariant.slide, we know that afterwards arr[0..k] is sorted.k, arr[0..k] contains the k+1 smallest elements of arr in ascending order and the loop invariant holds.
Termination: i = arr.size()arr[0..arr.size()-1] contains the arr.size() smallest elements of arr in ascending order.arr.size() is finite, and i increments each iteration towards arr.size(), the program must terminate.
Merge sort is a sorting algorithm that uses a divide-and-conquer approach to sort an array. The algorithm works by recursively dividing the array into two halves until each subarray contains only one element, then merging these subarrays back together in sorted order.
void mergeSort(vector<T>& arr) {
mergeSortRecursive(arr, 0, arr.size() - 1);
}
void mergeSortRecursive(vector<T>& arr, int low, int high) {
if (low >= high) return;
int mid = (low + high) / 2;
// Sort the left half
mergeSortRecursive(arr, low, mid);
// Sort the right half
mergeSortRecursive(arr, mid + 1, high);
// Merge the sorted halves
merge(arr, low, mid, high);
}
void merge(vector<T>& arr, int low, int mid, int high) {
vector<T> temp;
int a = low,
b = mid + 1;
for (int k = low; k <= high; k++) {
if (a <= mid && (b > high || arr[a] < arr[b])) {
temp.push_back(arr[a++]);
continue;
}
temp.push_back(arr[b++]);
}
for (int k = low; k <= high; k++) {
arr[k] = temp[k - low];
}
}
We can notice that at every recursion step the sub arrays are divided into two equally sized sub arrays each of length
This means that for each level of recursion, the work performed on each totals
When looking at the merge operation we can see we only iterate each element one time giving is a runtime of
Combining these facts we can determine runtime of our algorithm
Using big-O notation we can express this as follows
Selection sort is a simple sorting algorithm which starts by splitting the input array into two parts, a sorted part and an unsorted part, the unsorted part is then iterated through and the smallest element is swapped with the element at the start of the unsorted part.
void swap(int &a, int &b) {
int temp = a;
a = b;
b = temp;
}
void selectionSort(vector<int>& arr) {
for (int i = 0; i < arr.size(); i++) {
int m = i;
for (int j = m; j < arr.size(); j++) {
if (arr[j] < arr[m]) {
minIndex = j;
}
}
swap(arr[i], arr[m]);
}
}
Selection sort can sometimes be useful in a few systems where reads are super quick but writes are very expensive and where there isn't a large amount of data.
Looking at the two for loops, we can see that the first loop always iterates from 0 to arr.size(), the inner loop on the other hand iterates from i to arr.size(). Using this knowledge we can construct a formula for the runtime
where arr.size()
In asymptotic terms we can see that the quantity of swap operations in this algorithm is modelled by
Firstly, we can read through the code and see the form of the minIndex function which was already proven in this example. We can thus assume that m is always the index of the smallest unsorted element. Using this knowledge we can see that the function swaps the element at index i with the element at the index of the smallest unsorted element m.
Running through a few examples we can see that arr[0..i-1] always contains the
Loop Invariant: Before iteration i of the loop, arr[0..i-1] contains the i smallest elements of arr in ascending order.
Base Case: i=0
i=0, the invariant states that arr[0..-1] contains the 0 smallest elements of arr in ascending orderarr[0..-1] contains no elements the invariant holds in the base case.Inductive Step: i = k, 0 <= k < arr.size()
k, ie. arr[0..k-1] contains the k smallest elements of arr in ascending order.arr[k..arr.size()-1]arr[k]k arr[0..k] contains the k+1 smallest elements of arr in ascending order and the loop invariant holds.Termination: i=arr.size()
arr[0..arr.size()-1] contains the arr.size() smallest elements of arr in ascending orderarr.size() is finite, and i increments each iteration towards arr.size() the program must terminate.
A stack is a collection data type in which elements are removed in the order that they were inserted (with the most recent element being added being the first to be removed). Stacks define two main operations:
Additionally, a peek operation can be implemented which returns the last value added without removing it.
template <class T>
class Stack {
public:
Stack();
void push(const T& element); // Insert into the top of the stack
T pop(); // Remove and return from the top of the stack
}
Using a singly linked list with a head pointer and new elements are inserted and deleted at the head of the linked list to ensure
Singly linked lists offer better worst case performance as they always maintain
A resizable array can be used to create a stack by maintaining a integer that stores the index of the last element which is treated as the top of the stack, elements are then inserted at the end of the array, resizing as necessary.
In the best and average case stacks implemented with resizable arrays offer
A queue is a collection data type in which elements are removed in the order that they were inserted (with the first being inserted being removed first). Queues define two main operations:
Additionally, a peek operation can be implemented which returns the first value added without removing it.
template <class T>
class Queue {
public:
Queue();
// Insert into the back of the queue
void enqueue(const T& element);
// Remove and return from the front of the queue
T dequeue();
}
Using a singly linked list with head and tail pointer the front of the queue. New elements are inserted at the tail of the linked list and elements are removed from the head to ensure
A circular array can be used to create a queue by maintaining an integer that stores the index of the front and the back of the queue.
When elements of the queue are removed instead of having to move all the elements in the array backwards one index (an
If the queue's max size is unknown the circular array can be combined with a resizable array to ensure the queue can grow as necessary, otherwise a fixed size can be decided with older elements being overwritten as required.
A tree is a type of dynamic data structure that consists of nodes linked to a single parent node (except for the root node) and which can be linked to multiple children nodes. Each node contains "three" parameters, the data, some data structure storing the address of it's children nodes, and a address to it's parent node.
The tree data structure can be thought of as an implementation of a finite tree. Tree's can alternatively be thought of as a subset of graphs which have no cycles. Because tree's have no cycles, recursion is a useful technique for traversing them.
Tree's are a type of graph and thus the same terminology also applies, each connection between a parent and a child is an edge.
Tree traversal refers to the process by which you access nodes in the tree, each defines a specific order, and each no is only visited once.
Depth first traversal is a method of traversing trees which explores as far along a given branch as possible before exploring the next branch.
There are 2 general types of depth-first traversal, each of which determines the order in which the root is visited compared to the subtree are visited.
In preorder traversal, the node is processed before its subtrees. This traversal is useful for creating a copy of the tree or generating a prefix notation of an expression tree.
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
def preorder_traverse(root, visit):
if root is None:
return
visit(root.value)
for child in root.children:
preorder_traverse(child, visit)
In postorder traversal, the node is processed after its subtrees. This traversal is useful for tasks such as deleting the tree or generating a postfix notation of an expression tree. It works for general trees as well.
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
def postorder_traverse(root, visit):
if root is None:
return
for child in root.children:
postorder_traverse(child, visit)
visit(root.value)
Breadth first traversal is a method of traversing trees which explores all the nodes at a give level before descending to the next.
The standard implementation uses a queue where elements are pulled from the front of the queue to be processed and then their children are pushed to the end of the queue.
from collections import deque
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
def breadth_first_traversal(root, visit):
if root is None:
return
queue = deque([root])
while queue:
current_node = queue.popleft()
visit(current_node.value)
if current_node.left:
queue.append(current_node.left)
if current_node.right:
queue.append(current_node.right)
A binary tree is a type of tree where each node can have at most 2 children. The children of a node in a binary tree are usually referred to as the left and right nodes of the parent.
A perfect binary tree is any binary tree in which every node but the leaf nodes are full (that is they all have 2 children nodes), and all the leaf nodes are on the same level.
Each level of a perfect binary tree has exactly
A complete binary tree is any binary tree in which the following conditions are met:
A full binary tree is any binary tree in which all nodes either have two children or are leaf nodes.
In-order traversal is a type of depth-first traversal for binary trees in which the left subtree is visited and then the current node is visited and finally the right subtree is visited.
class BinaryTreeNode:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def in_order_traverse(root, visit):
if root is None:
return
in_order_traverse(root.left, visit)
visit(root.value)
in_order_traverse(root.right, visit)
ERROR: Could not find file Binary Search Tree
A balanced binary tree is any binary search tree who's leaves are all about the same distance from the root (the exact distance depends on the algorithm used).
Auto balancing trees are algorithms for ensuring binary trees stay balanced before and after insertions and deletions
AVL tree's are a type of self-balancing binary trees where each node's left and right subtrees differ in height by at most only 1. AVL trees perform their auto balancing by performing "rotations" on insertion and deletion depending on where affected nodes were/are located.
To support quick inserts and removals, AVL tree's typically store the height of each node alongside the node's data. This height is then recalculated and updated for a few specific nodes after rotations.
After a node is inserted or deleted from the tree it's predecessor can become "imbalanced" (the difference between the heights of it's left and right children become greater than 1).
With the assumption that the tree was balanced before an operation there can only be 4 different types of imbalance.
A left-left imbalance occurs when a node’s left subtree is taller than its right subtree by more than 1, and the extra height comes from the left child’s left subtree. A left-left imbalance can be fixed by performing a single right rotation on the root node.
Example of a subtree with a left-left imbalance
A right-right imbalance occurs when a node’s right subtree is taller than its left subtree by more than 1, and the extra height comes from the right child’s right subtree. A right-right imbalance can be fixed by performing a single left rotation on the root node.
Example of a subtree with a right-right imbalance
A left-right imbalance occurs when a node’s left subtree is taller than its right subtree by more than 1, and the extra height comes from the left child’s right subtree. A left-right imbalance can be fixed by performing a left rotation on the left child, turning the tree into a left-left imbalance which can be resolved with a right rotation at the subtrees new root.
Example of a subtree with a left-right imbalance
A right-left imbalance occurs when a node’s right subtree is taller than its left subtree by more than 1, and the extra height comes from the right child’s left subtree. A right-left imbalance can be fixed by performing a right rotation on the right child, turning the tree into a right-right imbalance which can be resolved with a left rotation at the subtrees new root.
Example of a subtree with a right-left imbalance
Rotations are operations that can be performed on AVL tree's which change the shape of the tree while maintaining the properties of a binary search tree.
Given a subtrees root with a right sub child a left rotation can be performed in the following steps:
temp.temp's left child (if it exists).temp's left child to be the old root.temp accordingly.Given a subtrees root with a left sub child a right rotation can be performed in the following steps:
temp.temp's right child (if it exists).temp's right child to be the old root.temp accordingly.
A b-tree is a type of self-balancing tree which maintains sorted data. The b-tree can be thought of as a generalization of binary search trees allowing each node to store multiple keys rather than just one. The structure of a B-tree ensures that all leaf nodes are at the same depth.
B-tree's allow for
B-Tree node's must always have one less key than they have children, additionally, all non-root nodes must have between
B-tree's are especially efficient compared to binary search tree's for large datasets where data may need to be stored and retrieved from secondary storage instead of memory, this property is important and commonly utilized for creating efficient databases.
Insertion into a B-tree follows a specific sequence of operations to maintain the balance and properties of the tree. The key idea is to always insert into a leaf node while ensuring that no node ever exceeds
Firstly the b-tree is searched for the first leaf node that can contain the key (even if it already has the maximum amount of keys). Next the key is inserted into the node.if the node was already full the node is split.
All the keys less then the median are in placed in one of the new child node and all keys greater are placed in the other. The median is moved up into the parent node and if the parent then overflows as well the process is repeated again for the parent node.
A hash function is a function that is used to map arbitrary length data to a single fixed length number. Hash functions always need to be fast, deterministic (meaning that for a given input they always produce the same output) and uniform (meaning their outputs are evenly distributed among their output range).
A hash table is a data structure that maps a given key to a specific value. A hash table uses a hash function given a key to compute an index into either a regular array or an array of buckets. Hash tables provide incredibly quick accesses (
A hash table collision is a collision in the hash of a key in a hash table.
The load factor of a hash table refers to how "full" the table is and is calculated as
where
Open addressing is a method of resolving hash collisions by probing or searching through alternative locations in the table until either the target entry is found or an unused slot is found.
When entries are removed from a hash table that uses open addressing the empty space makes the probing / searching algorithms terminate early.
One solution to prevent this is to mark deleted elements as "tombstones" which allows the searching algorithm to continue when it encounters one instead of failing. After a lot of removals "tombstones" can clog up the hash table so periodically the hashes for the whole table should be recomputed.
Linear probing is a method of open addressing in which the index is incremented once every time a collision is found until the record or an open slot is found.
For example if there is a collision at the index
Quadratic probing is a method of open addressing in which the index is incremented linearly once every time a collision is found until the record or an open slot is found.
For example if there is a collision at the index
Double hashing is a method of open addressing in which the in which the index is incremented by a second hash function of the key when a collision is encountered. For the second hash function an additional rule that it must never return 0 is introduced.
A priority queue is a collection data type similar to a queue or stack except that elements are removed by their priority (either the largest or smallest depending on the implementation). Queues define two main operations:
Typically for a single priority queue implementation it will only implement either extract maximum or extract minimum but not both.
template <class T>
class PriorityQueue {
public:
PriorityQueue();
// Insert an element with associated priority
void insert(const T& element);
// Remove and return the highest priority element
T extractMax();
// Return the highest priority element without removing
T peek();
}
A binary heap provides an efficient implementation of a priority queue. Elements are stored in a complete binary tree structure that maintains the heap property where each parent node has a priority greater than (max heap) or less than (min heap) its children.
Using a binary heap both insert and extract maximum/minimum operations can be performed in
The heap is typically stored using an array representation where for an element at index
A binary heap is a complete binary tree that satisfies the heap property. Notably two heaps can contain the same data but with items in different positions, the heap property allows multiple valid arrangements. Additionally, heaps can store the same value in multiple positions at the same time.
Binary heaps implement the same operations as a queue which is why they are typically used in their implementation. Specifically binary heaps implement the following operations:
template <class T>
class MinHeap {
public:
Heap(int initialCapacity);
// Insert an element
void insert(const T& element);
// Remove and return from the root (minimum value)
T removeMin();
}
template <class T>
class MinHeap {
public:
Heap(int initialCapacity);
// Insert an element
void insert(const T& element);
// Remove and return from the root (maximum value)
T removeMax();
}
Binary heaps despite being a representation of a tree are usually implemented using a resizable array. Where for a given node at index
Given a node at index
When a new element is inserted into a binary heap, first the algorithm finds the first available leftmost leaf on the bottom level (preserving the completeness of the tree).
The partial ordering is fixed by comparing the new value to it's parent and recursively swapping them until the value is no longer greater than it's parent (this operation is called heapify up).
The worst case performance for a insertion operation is
When we remove the root (ie. in a removeMin or removeMax) operation we start by copying the root so we can return it, we then replace the root node with the right-most leaf node (and remove the that leaf node from it's current position).
The partial ordering is fixed by comparing the new root to it's children and recursively swapping it with the largest child until the value is no longer greater than either of it's children (this operation is called heapify down). Finally the copied original root data is returned to the caller.
Similarly to insertion removing the root node is also
A set is a data structure which contains a "collection" of elements stored in no specific order, which can only store each value once. Unlike other collection data structures, typically sets are only added to and then used to test for Element within the set. The set data structure can be thought of as an implementation of a finite set. Sets usually define the following operations:
Additionally, some set implementations may provide operations for the following:
template <class T>
class Set {
public:
Set();
// Add an element to the set
void add(const T& element);
// Remove an element from the set
void remove(const T& element);
// Check if the set contains an element
bool contains(const T& element) const;
}
A common implementation of a set is using a hash table, where each element is stored in an array based on the hash value of the element. This allows for average-case
A disjoint set, is a data structure that keeps track of a partition of a set into disjoint subsets. It defines three primary operations:
template <class T>
class DisjointSet {
public:
DisjointSet();
// Create a new set with a single element
void makeSet(const T& element);
// Find the representative of the set containing the element
T find(const T& element);
// Union the sets containing element1 and element2
void unionSets(const T& element1, const T& element2);
}
A common implementation of the disjoint set data structure is using a forest of trees. Each tree represents a subset, and the root of the tree is the representative of that subset. The Find operation traverses up the tree to find the root, while the Union operation links the roots of two trees together.
To improve the efficiency of the disjoint set operations, two optimizations are commonly used:
With these optimizations, the amortized time complexity for each operation is nearly constant, specifically
A graph is a structure consisting of a pair of sets
Connectivity describes whether vertices in a graph can reach each other through a sequence of edges.
A graph is connected if there exists a path between every pair of vertices. In other words, you can travel from any vertex to any other vertex by following edges.
Formally, a graph
In a directed graph, a graph is weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph.
In a directed graph, a graph is strongly connected if there is a directed path from every vertex to every other vertex.
A graph is considered sparse if the number of edges is much less than the maximum number of possible edges. Conversely, a graph is dense if the number of edges is close to the maximum number of possible edges.
Formally, for a graph with
Edges are the connections between vertices in graphs defined as an unordered pair of vertices.
A directed edge is a specific type of edge which indicates a direction as well as a connection. Unlike normal edges, directed edges are defined as an ordered pair meaning the edge
Example of a directed graph with 5 vertices
and 6 edges .
A directed graph is a type of graph which replaces the edges with directed edges.
Example of a directed graph with 5 vertices
and 6 edges .
A weighted edge is a type of edge that has an associated numerical value, called its weight. This weight typically represents a cost, distance, time, capacity, or any other quantitative measure relevant to the graph application. Formally, a weighted edge is defined as a pair together with a weight:
A weighted graph is a type of graph which replaces the edges (or directed edges) with weighted edges. Weighted graphs can be either undirected or directed depending on the use case.
A graph is a data structure which contains a "web" of elements stored with nodes being connected to each other. The graph data structure can be thought of as an implementation of a finite graph. Graphs typically define the following operations:
adjacent(a, b): test whether there is a edge from vertex a to vertex b.
incidentEdges(a): returns all edges which are connected to a.
insertEdge(a, b, e): adds an edge from a to b.
removeEdge(a, b): removes the edge from a to b.
addVertex(a): adds the vertex a to the graph
removeVertex(a): removes the vertex a from the graph
Additionally if the graph is modelled as a directed graph we can define the following additional operations:
e originates.e points to.An edge list stores a graph as a collection (typically a list/array) of all edges in the graph. Each edge is stored as a pair (for undirected graphs) or an ordered pair (for directed graphs):
{u, v}(u, v) meaning “from u to v”Additional data (like weights or labels) can be stored alongside each pair.
Use case: very simple to implement, good for sparse graphs where fast adjacency tests are not needed.
An adjacency list stores, for every vertex, a list of the vertices to which it is adjacent.
For a vertex u, Adj[u] contains all vertices v such that there is an edge u → v (or {u,v} in undirected graphs).
Use case: most common representation in practice; excellent for DFS/BFS, graph algorithms, and sparse graphs.
An adjacency matrix for a graph with
Use case: dense graphs, or algorithms that need extremely fast adjacency checks.
Comparing the different implementations of a graph with
| Edge List | Adjacency List | Adjacency Matrix | |
|---|---|---|---|
| Space | |||
adjacent(a,b) |
|||
incidentEdges(a) |
|||
insertEdge(a,b,e) |
|||
removeEdge(a,b) |
|||
addVertex(a) |
|||
removeVertex(a) |
Breadth-first traversal is a generalization of breadth-first tree traversal to all graphs. Instead of starting at a root node, we may start at any vertex and explore all vertices reachable from it.
Because of this, we must keep track of which vertices have already been visited to avoid infinite loops and repeated work.
As with trees, the standard implementation uses a queue to ensure vertices are visited in increasing order of distance (number of edges) from the start vertex.
The algorithm's time complexity is
Assume the graph is represented using an adjacency list.
from collections import deque
def breadth_first_traversal(graph, start, visit):
visited = set()
queue = deque([start])
visited.add(start)
while queue:
v = queue.popleft()
visit(v)
for neighbor in graph[v]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
visited prevents revisiting vertices via cyclesIf the graph may be disconnected, we must run breadth-first traversal from every unvisited vertex.
from collections import deque
def breadth_first_traversal_all(graph, visit):
visited = set()
for start in graph:
if start in visited:
continue
queue = deque([start])
visited.add(start)
while queue:
v = queue.popleft()
visit(v)
for neighbor in graph[v]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
This guarantees:
Depth-first traversal is a method of traversing graphs which explores a path as far as possible before backtracking to explore alternative paths. Depth-first traversal is a generalization of depth-first tree traversal to graphs.
Depth-first traversal is typically implemented using recurssion or an explicit stack.
Assume the graph is represented using an adjacency list.
def depth_first_traversal(graph, start, visit):
visited = set()
def dfs(v):
visited.add(v)
visit(v)
for neighbor in graph[v]:
if neighbor not in visited:
dfs(neighbor)
dfs(start)
This visits exactly the vertices in the connected component containing start.
def depth_first_traversal_all(graph, visit):
visited = set()
def dfs(v):
visited.add(v)
visit(v)
for neighbor in graph[v]:
if neighbor not in visited:
dfs(neighbor)
for v in graph:
if v not in visited:
dfs(v)
A spanning tree of a graph is a tree that includes all vertices of the graph and is a subgraph of the original graph. A spanning tree contains the minimum number of edges needed to connect all vertices without creating any cycles.
A spanning tree of a graph
Key Characteristics:
A graph has a spanning tree if and only if the graph is connected.
For a connected graph with
A spanning tree can be constructed from a connected graph using several methods:
Breadth-First Search (BFS)
Depth-First Search (DFS)
Edge Removal
A minimum spanning tree (MST) of a weighted graph is a spanning tree where the sum of edge weights is minimized among all possible spanning trees. The minimum spanning tree connects all vertices in the graph with the minimum total weight while maintaining no cycles.
Weight Minimization:
Where
Key Characteristics:
Kruskal's algorithm is a greedy algorithm that finds a minimum spanning tree for a connected, weighted graph. The algorithm sorts edges by weight and iteratively adds the smallest weight edge that does not create a cycle.
The algorithm performs the following steps:
Kruskal's algorithm is particularly effective for sparse graphs where the number of edges is much smaller than
When to use Kruskal's:
Looking at the main operations in Kruskal's algorithm:
find operation (twice) and possibly a unionSets operation on the union-find data structure.With path compression and union by rank optimizations, each union-find operation takes
Processing all
Combining these operations:
Since