Intro
This document will list and explain popular Graph algorithms that are commonly used in ComputerScience and SoftwareDevelopment. // I’m sure they’re popular with DataScience and MachineLearning too.
Depth First Search
DFS is a Trees traversal algo that starts at the Root node and goes as Deep as possible on each branch before back tracking and switching to the next branch.
- All children of a node are searched before moving to next sibling.
- Utilizes a Stack (Stacks) to keep track of vertices to process.
How it works
- Start at root node.
- Travel down its children. // just pick one, often left to right…
- When a vertex is visited, add it to the
visited
stack. - Make recursive call to DFS function and
push
to the stack - When dead end,
pop
the vertex off the stack, this will then process the next vertex top of the stack (the one preceding it).
Pseudo code:
Algorithm Depth_First_Search(Graph G)
// Graph G = {V,E}
// initialize visited to false for all vertices
for each vertex v in V
if v has not been visited
dfs_helper(v)
function dfs_helper(Vertex v)
// here we do "something" when visiting the node
visit node v
for each vertex w in V adjacent to v
if w has not been visited
dfs_helper(w)
Super simplified:
def DFS(node):
# mark nodes as visited, do something
for each neighbor of node:
if neighbor is not visited:
DFS(neighbor)
return
Info
Typical Results
- List of vertices in order visited
- List of vertices in order of “dead-ends” (when popped from stack)
DFS Tree
The resulting tree from a DFS traversal is called a DFStree, it contains all the edges used to visit the vertices. // Unused edges from the original tree are called “Back edges”.
Common problems
You’ll often encounter the following and similar problems when discussing DFS:
- Find a spanning tree of a graph
- Find a path between two vertices v and u
- Find a path out of a maze
- Determine if a graph has a cycle
- Find all the connected components of a graph
- Search the state-space of problems for a solution (AI)
Efficiency
The basic operation in DFS is the loop checking if the neighbor’s been visited. This loop is recursive, whether it’s DFS or a helper function, and will examine ALL the neighbors of every neighbor… The efficiency relies on the DataStructure used to implement the tree. So the efficiency must be:
- for Adjacency Matrix
- for Adjacency Lists
Backtracking
A common pattern that many, myself included, struggle to fully grasp at times. #BackTracking is an extension of DFS but instead of traversing an existing structure like a graph or tree, here you need to build the structure as you explore options.
This is relevant for problems involving decisions that dynamically generate structure of the tree. // Think of combinatorial problems.
Scenario:
- You don’t have enough info to make informed decision at each step/node.
- Each decision will create a new set of choices.
- Some sequences of choices (potentially more than one) may be the best solution.
IDEA
- Construct solutions one component at a time
- If a partial solution can be developed further without violating constraints → Choose first legitimate option for the next component.
- If there is no option for the next component → Backtrack to replace the last component of partial solution.
How it works
- At each node make a decision on how to proceed based on the problem. Each decision forms a branch.
- After each decision, you either get a valid solution or a dead-end.
- If dead-end, backtrack to previous decision to undo it and try another path.
Solutions as Trees
The solutions can be organized in a tree.
- Each node is a state at one stage in the sequence of solutions.
- The root represents the initials state before the search for a solution begins.
- Nodes at the first level represent the first choice, second level are second choice…so on so forth.
Examples
// Try LeetCode problem 17: Letter Combinations of phone pad
- n-Queens Problem
- Hamiltonian Cycles
Breadth First Search
BFS also starts at the Root BUT explores all the vertices at the current depth level before going deeper or switching to another branch. BFS is great for finding the Shortest path in Unweighted graphs, e.g. shortest path in a grid.
How it works
- Start at the root node
- visit all neighbors of the same distance from the node, i,e, same depth level. Immediate neighbors first then their neighbors and so on so forth…
- Add visited vertices to a queue. Queues
- While the queue is not empty
- visit all adjacent nodes to the queue’s head.
- If not visited, visit and add to queue
- remove head from queue
Pseudo code:
Algorithm Breadth_First_Search(Graph G)
// Graph G = {V,E}
initialize visited to false for all vertices
for each vertex v in V
if v has not been visited
bfs_helper(v)
function bfs_helper(Vertex v)
visit node v
initialize a queue Q
add v to Q
while Q is not empty
for each w adjacent to Q.head
if w has not been visited
visit node w
add w to Q
Q.dequeue()
Or:
function BFS(start_node)
mark start_node visited
create queue and enqueue start_node
while queue not empty
current = dequeue queue // to get head of queue
for each neighbor of current
if neighbor not visited
mark as visited
enqueue neighbor
Info
The edges from the original tree but not included in the BFStree are called “Cross Edges”.
Common problems
Frankly you can use BFS for anything that calls for DFS, but the prior is best for problems that require finding the shortest path.
Efficiency
Same efficiency as DFS:
- Adjacency matrix:
- Adjacency list:
Priority Queue (Heap)
Heaps are common implementations of priority queues and are almost synonymous with “top-K values” or “least-k values” type problems.
Heaps are trees where the elements (nodes) are organized in a specific manner.
- Min Heap: smallest value is the root and each parent is smaller than both of its children.
- Max Heap: Largest value is the root and each parent is larger than both of its children.
Heap problem twist
To find K largest, we need to use a Min Heap, whereas with K smallest we need a Max Heap! Example: To find the three smallest values in [3,6,1,5,9,12,8] we need to build a Max Heap with size 3. Go through and add first three values, on 4th value check if current is smaller than previous largest value (the root).
- If it is, replace root.
- Otherwise, you’ve found k-smallest values