Welcome to CSCC73
TA name: William Granados
Tutorial: TUT002 - Tuesday 4-5pm, BV264
Office hours: Friday 3-4pm, IC402, Rotational (for all TAs)
Email Policy: Before emailing me, consider attending my practical or posting the question on Piazza. My email address is william.granadosmatamoros@mail.utoronto.ca. When emailing me, use your utoronto email and in the subject include "CSCC73".
Expectations: Content covered may appear on assignments or exams, so please be on time.
Good luck on finals! Today we went over Johnson's Algorithm
- Johnson's Algorithm
- Challenge: All pairs shortest path, but faster than Floyd Warshall for sparse graphs
- Considerations: Running Dijkstras $N$ times, which works well on sparse graphs.
- Issues: Dijkstras doesn't work with negative weight edges, requires changing graph while preserving shortest paths.
- Solution: For each edge $(u,v) \in E$, define unique $X_u, X_v$ value for each node, and consider $wt'(u,v) = wt(u,v) + x_u-x_v$, where $wt'(u,v) \geq 0$ for Dijkstras to work. Above equivalent to a set of difference constraints as $x_u - x_v \leq wt(u,v)$ which is solved with Bellman Ford, as long as no negative wt cycle exists. With set of $s \to u$ paths given from dist array in bellman ford, treat them as your $x_u$'s, and create $wt'$ as discussed, then run Dijkstras $N$ times on $wt'$, to obtain $\delta(u,v) = \text{shortest } u \to v \text{ path}$, finally remove value of $x_u-x_v$, for each $u \to v$ solution.
- Justification:Turns out for any $s \to t \text{ path } p$, the values are a telescoping sum and $wt'(path) = wt(path) + x_s - x_t$, so after solving problem, can just remove these values.
Today I handed back remarks, any uncollected midterms have been left with Dr. Bretscher.
Here are some extra notes on solving LPs, using slack variables.
- Example 1
- Variables: X,Y
- Objective Function: $max \{X+Y\}$
- Constraints: $a \cdot X + b \cdot Y \leq 1$, $x \geq 0 $, $y \geq 0$
- Problem: Find $a,b \in \mathbb{R}$ that satisfy the following.
- When LP is infeasible
- When the LP is unbounded
- When the LP has a unique solution.
- Example 2
- Variables: X,Y
- Objective Function: $max \{5 \cdot X+ 3 \cdot Y\}$
- Constraints: $5 \cdot X - 2 \cdot Y \geq 0$, $x + y \leq 7$, $0 \leq x \leq 5$, $y \geq 0$
- Problem: Solve the above LP, graph it, identify verticies, and optimal solution, then verify it.
This week we're going over circulation problem with lower bound, and the optimum bandwidth allocation.
Midterms will be handed back this week.
- Circulation Problem (lower bounds)
- Reduced to basic ciruclation problem, by converting vertices with lower bounds into a set of vertices with different demands similar to the classical problem.
- Bandwith Allocation
- Went over example from DPV 7.1.3
This week we covered a famous theorem used in Network Flow problems, more resources here.
- Halls theorem
- Thm: Given a bipartite graph $G = (V_1, V_2, E)$, and $|V_1| = |V_2|$. Then $G$ has a perfect matching iff $\forall S \subset V_1 , |S| \leq |N(S)|$.
- Proof Idea:We can model a bipartite graph as a network flow problem my mapping the given input graph into a flow graph, from this flow graph we can apply FF to give a lower bound on the max flow (or min cut) of the graph, from their we can apply breakdown the problem into some set S containing $V_1$, and $\bar S$ containing $V_2$. We calculate the flow on each half, and bound it above by the properties we know.
Today we went over the solutions for the midterm.
- Q1
- Problem: Given processing times $p_1, \cdots, p_n$, find ordering of $\pi$, s.t $ p_{\pi(1)}, \dots p_{\pi(n)}$ minimizes the cost function $C(j) = \sum_{i=0}^{j} p_{\pi(i)}, $ for the average completion time, $ \frac{1}{n} \cdot \sum_{j=0}^{n} C(j)$
- Solution: Sort by processing time.
- Proof: Let $G = g_1, \cdots, g_n$ be the output from our greedy solution, and $O = o_1, \cdots, o_n$ be any optimal solution such that, $G \neq O$. Choose $r > 0$ such that $o_r \neq g_r$, then $o_r > g_r$ by way G is constructed. Consider swapping $o_r$ with $min\{o_k|k > r\}$, lets call this $O'$, then $\sum_{j=r}^{k} C(j) \text{ on } O' \leq \sum_{j=r}^{k} C(j) \text{ on } O$, because $(k-r) \cdot o_k \leq (k-r) \cdot o_r$. This does not affect the processing times of $C(j-1), C(k+1)$, and reduces the average running time by $\frac{(k-r)*(o_r-o_k)}{n}$ and can be done repeatedly to create a better solution similar to $G$, no worse than $O$.
- Q2
- Problem: Given N lines bounded by a rectangular box, starting from the bottom of the box, and ending at the top of the box. Count the number of intersections.
- Possible Solution: Consider labeling the lines base on their left to right order on the top of the box, from 1..n, resulting in a set S. Now order this set based on each lines left to right order on the bottom of the box. Define an inversion on S as a pair $i,j$ such that $i < j, S(i) > S(j)$. Now count the the inversions on this list using the $O(nlogn)$ Divide and Conquer algorithm described in KT 5.3.
- Q3
- Problem: Find largest strictly increasing then strictly decreasing subsequence in an array A.
- Solution: Observe that a strictly decreasing subsequence is simply an increasing subsequence if we look backwards from the array. Calculate LIS forwards and backwards on A. Inductively, since we've defined LIS to be the longest subsequence ending at i, we can run through both arrays and take the maximum of $LIS_{fwd}+LIS_{bwd}-1$
- Q4
- Problem: Calculate max IDS on a graph connected with triangular components.
- Solution: let i be the ith triangular component, $a_i$ be the node connecting components $i \to i+1$, $b_i, c_i$ by other nodes in the component. If we select $a_i$, then we cannot select component $i-1$, $b_i$, or $c_i$, so must select $a_i$ and best IDS upto compontent $i-2$. If we do not select $a_i$, we should take $max(b_i, c_i)$, since these nodes are indepedent from component $i-1$ we can simply select the best IDS upto component $i-1$.
Today we covered Maximum Independent Set (Path)
- Problem: Given a graph G=(V,E), a path is $v_1, v_2, \dots, v_n $ s.t $v_i, v_j$ are connected if $|i-j|=1$. Each vertex has a weight $w_i$. Find set in a path G, with largest weight and no adjacent nodes.
- Example: Given a path [1,8,6,3,6], MIS = 14 (8+6)
- Approach 1: Greedily choose highest value node, delete adjacent nodes, repeat. Does this work? No. Counter example: [7,8,7,1]
- Approach 2: Greedily choose all odd or even indecies since maximum number of values. Does this work? No. Check first example.
- Approach 3: Find best IDS on $x_1, \dots, x_{i-1}$, looking ahead if $w_i$ improves solution, then must require best IDS from $x_1, \dots, x_{i-2}$, otherwise don't include and IDS is max IDS from $x_1, \dots, x_{i-1}$
- DP Recurrence: $DP[i] = max(DP[i-1], DP[i-2] + w_i)$
- Runtime: $O(N)$
- Note: The problem for complete graphs is NP-hard, so unlikely that a polynomial algorithm exists.
Today we covered the classic LCS and LIS problems, resources.
LIS
- LCS
- Problem: Given two strings $X,Y$, find the longest common subsequence of characters.
- Example: X,Y = TGACTA, GTGCATG, LCS = len(TCGA) = 4
- Idea: Find best LCS on $x_1, \dots, x_{i-1}$ and $y_1, \dots , y_{j-1}$, try to improve LCS by ending on $x_i, y_j$
- DP Recurrence: $DP[i][j] = \text{LCS from } x_1, \dots, x_i \text{ and } y_1 , \dots y_j $ $DP[i][j] =\begin{cases} DP[i-1][j-1]+1, & \text{if $x_i=x_j$}.\\ max \{DP[i-1][j] , DP[i][j-1]\}, & \text{otherwise} \end{cases}$
- Runtime: $O(N \cdot M)$, in terms of length of both strings
- Problem: Given an unsorted array A, find the longest increasing subsequence of characters.
- Example: A=[7,3,8,5,2,6], LIS = len([3,4,6]) = 3
- Approach 1: Reduce to LCS by sorting S=A, then running LCS(A, S)
- Approach 2: Calculate LIS ending at $a_i$, then LIS must be some subset $\{a_1, \dots a_{i-1} \} \cup a_i$
- DP Recurrence: $\text{DP[i] = longest LIS ending at } a_i$ $DP[i] = 1 + max(\{DP[j] | 1 \leq j \leq i \land a_j < a_i\})$
- Runtime: $O(N^2)$, for looping through array and finding best LIS on $a_1, \dots , a_{i-1}$
Today we covered Chain Matrix Multiplication
Overview
- Problem: Compute chain matrix multiplication with minimum number of scalar multiplications.
- Implicit Tree: Can represent computation as tree structure, where leaves are matricies, and intermediate nodes are results, and root is final result.
- Idea: From implicit tree structure, notice we choose $k, s.t \, i \leq k \leq j$, as our tree spliter.
- Observation: Can represent matricies dimensions as unique $m_0, m_1, \dots , m_{n-1}, m_n$, where $A_1 = \mathbb{R}^{m_0 \cdot m_1}$, $A_2 = \mathbb{R}^{m_1 \cdot m_2}, etc.$
- DP Recurrence: $DP[i][j] = \text{min # mults in } A_i, \dots, A_j, \text{where } i < j$ $DP[i][j] =\begin{cases} 0, & \text{if $i=j$}.\\ min_{i \leq k \leq j} {DP[i][k] + DP[k+1][j] + m_{i-1}\cdot m_k \cdot m_j}, & \text{if i < j}. \end{cases}$
Today we covered some more solving different recurrences, and designing a D&C solution for a problem.
Recurrences
Master Theorem
Challenge Problem
- $T(n) = 4 \cdot T(\frac{n}{8}) + c \cdot n, T(1) = 1$, we proved $T(n) \in O(n)$ using back substituion.
- $T(n) = 2 \cdot T(\frac{n}{4}) + \log{n}, T(1) = 1$, we proved $T(n) \in O(\sqrt{n})$ using back substituion.
- Solves recurrences of the form $T(n) = a \cdot T(\frac{n}{b}) + c \cdot n^d$, $T(1) = \text{some constant}$
- if $a < b^d \implies T(n) = \theta(n^d)$
- if $a = b^d \implies T(n) = \theta(n^d\log{n})$
- if $a > b^d \implies T(n) = \theta(n^{\log_{b}{n}})$
- Solved the Maximum subarray problem.
This week discussed approaches for finding runtime in Divide & Conquer Algorithms (KT 5.1-5.2).
Merge sort
Similar D&C Algorithms, that create more nodes per level in the recursion tree.
Similar D&C Algorithms, only 1 node per level in recursion tree.
- Recurrence: $T(n) = 2 \cdot T(\frac{n}{2}) + c \cdot n $
- Approach 1: Unrolling recursion, we examined the recursion tree of the algorithm.
- We calculated the amount of work done per node to be $\frac{c \cdot n}{2^{i}}$
- There were $2^{i}$ nodes per level, so overall work was $2^{i} \cdot \left( \frac{c \cdot n }{2^{i}} \right) = c \cdot n$
- There are $\log{n}$ levels since it's a perfect binary tree, so overal time complexity was $\sum_{i=0}^{\log{n}} c \cdot n = O(n\log{n})$
- Approach 2: Backsubstituion, we took a wild guess that the algorithm would run in $O(n\log(n))$, then proved this through induction. Important step in proof involved logarithm identities.
- Recurrence: $T(n) = q \cdot T(\frac{n}{2}) + c \cdot n, q > 2$
- Approach: Unrolling recursion, we examagined the recursion tree of the algorithm, same idea as above, but we used the identity for geometric sums to simpify the equation. Resulting in algorithms running in $O(n^{log{q}})$
- Problems: You're encouraged to apply similar techniques when evaluating Closest Pair of Points, and Karatsuba Multiplication
- Recurrence: $T(n) = T(\frac{n}{2}) + c \cdot n$
- Approach:Unrolling recursion, similar idea as above, using geometric sum, resulting in algorithms running in $O(n)$
This week we covered the MST problem (KT Pg.142-149), and how it can be solved with Prims and Kruskals, along with the correctness proof for these.
Prim's Algorithm
Kruskal's Algorithm
Correctness
Challenge Problems
- Idea: Push edges from an arbitrary on a min heap, iteratively select the smallest weighted edge which is a part of our solution.
- Time Complexity: $O(|E| \cdot \log(V))$ with an adjacency list and binomial heap, varies with implementation.
- Fun Fact The graph must be connected, otherwise will only find MST in one of the components.
- Idea: Sort edges by least weight, use a disjoint set data structure to iteratively select the smallest weighted edge that is a part of our solution.
- Time Complexity: $O(|E| \cdot \log(V))$, depending on DS implemention, can assume we use path compression and union by rank.
- Fun Fact: If the graph isn't connected, will generate a minimum spanning forest (a collection of trees!).
- Tree theorem: Need to prove that an MST is a tree
- Cut Theorem: Need to prove that greedily choosing the smallest edge at each iteration of the algorithm, will result in an MST.
There are no tutorials this week. Instead here's a link to a funny comic XKCD - Algorithms.