Reference Lemma 3.4

Lemma 3.4

Two claims:

Space: The total memory used at any point during the algorithm is $O(k^2)$.
Update time: Processing each new edge from the stream takes $O(1)$ time (constant, independent of $n$).

Key dependencies:

Lemma/Fact	What it provides
Lemma 3.2	Constructing the reduced subgraph takes time and space $O(\Vert H\Vert + k^2)$
Lemma 3.3	Each $G_s^f$ has at most $4k^2$ edges
Algorithm (Figure 1)	The structure being analyzed (steps 1-6)

Step 1 picks $\lceil\log(1/\epsilon)\rceil$ hash functions from a universal family. $\epsilon$ is a fixed constant thus $O(1)$ hash functions used.

Step 2 sets $G_0^f = \emptyset$ for each hash function $f$. Since there are $O(1)$ hash functions, this is also $O(1)$.

Step 3 simply reads the first batch of edges (up to $4k^2$ of them) and stores them directly in memory. $O(k^2)$

Step 4 is the main processing loop. At any given moment, the algorithm has four subgraphs in memory:

Subgraph	What it is	Size bound	Why
$G_{\hat{s}}^f$	Previous reduced compact subgraph	$\leq 4k^2$ edges	Lemma 3.3
$G_{\hat{s}+1,s}$	Previous batch of raw edges	$\leq 4k^2$ edges	By definition (batches are size $4k^2$)
$G_s^f$	New reduced compact subgraph (being computed)	$\leq 4k^2$ edges	Lemma 3.3
$G_{s+1,s’}$	Current batch of raw edges (being read)	$\leq 4k^2$ edges	$s’ \leq s + 4k^2$

Total edges stored: at most $4 \times 4k^2 = 16k^2 = O(k^2)$

By Lemma 3.2, constructing $R_f$ on a graph of size $m$ takes space $O(m + k^2)$. Here $m = O(k^2)$, so the space is $O(k^2 + k^2)$ = $O(k^2)$.

Step 5.1: Computes the final reduced subgraph $G_f$ from $G_{\hat{s}}^f \cup G_{\hat{s}+1,s}$ – same computation as step 4.1, same $O(k^2)$ space.

Step 5.2: Constructs a maximum weighted $k$-matching in $G_f$. Since $G_f$ has at most $4k^2$ edges (by Lemma 3.3), solving the matching problem on this small graph takes $O(k^2)$ space.

Update Time

Step 3 For the first batch the algorithm simply reads each edge and stores it. $O(1)$

Step 4 Interleaving Trick:

By Lemma 3.2, step 4.1 (the reduced subgraph computation on $G_{\hat{s}}^f \cup G_{\hat{s}+1,s}$, of size $O(k^2)$) runs in total time $O(k^2)$.

Since step 4.2 reads exactly $4k^2$ edges per batch, we can divide the $O(k^2)$ work of step 4.1 into $4k^2$ equal segments – each segment costs $O(k^2) / 4k^2 = O(1)$.

The interleaving then works as:

Read one edge from the stream into the buffer $G_{s+1,s’}$ – $O(1)$
Execute one segment of the $\mathcal{R}_f$ computation from step 4.1 – $O(1)$
Total per edge: $O(1)$

By the time all $4k^2$ edges of the new batch have been read, the $\mathcal{R}_f$ computation from the previous batch is exactly complete.

The Interleaved Execution Pattern

Time ------------------------------------------------------->

+------+------+------+------+------+------+----------+------+
|Read  |Comp  |Read  |Comp  |Read  |Comp  |          |Comp  |
|edge  |seg   |edge  |seg   |edge  |seg   |   ...    |seg   |
|s+1   | 1    |s+2   | 2    |s+3   | 3    |          |4k^2  |
+------+------+------+------+------+------+----------+------+
  O(1)   O(1)   O(1)   O(1)   O(1)   O(1)             O(1)

Each "tick" (Read + Compute segment) = O(1) + O(1) = O(1)

In each tick:

Read one edge from the stream into the buffer $G_{s+1,s’}$ – $O(1)$
Execute one segment of the $\mathcal{R}_f$ computation from step 4.1 – $O(1)$
Total per edge: $O(1)$

This Works since: The Numbers Line Up

This works because the number of edges to read ($4k^2$) exactly matches the number of computation segments ($4k^2$):

Step 4.2 reads up to $4k^2$ edges (one batch)
Step 4.1’s $O(k^2)$ computation divides into $4k^2$ segments of $O(1)$
One edge read per one computation segment = perfect interleaving

By the time we’ve read all $4k^2$ edges in the new batch, the $\mathcal{R}_f$ computation from the previous batch is also complete.

Interleaving is Necessary

Without interleaving, the algorithm would:

Read $4k^2$ edges ($O(1)$ each)
Compute $\mathcal{R}_f$ – $O(k^2)$ all at once (not reading any edges!)
Read next $4k^2$ edges ($O(1)$ each)
Compute $\mathcal{R}_f$ – $O(k^2)$ all at once again

This would give an update time of $O(k^2)$ for the last edge before each $\mathcal{R}_f$ computation, because the algorithm would stall while computing $\mathcal{R}_f$.

The interleaving ensures we never stall – we spread the $O(k^2)$ computation evenly across the $4k^2$ edge reads, giving $O(1)$ amortized AND worst-case update time per edge.

Space Summary

Component	Space
Hash functions (Step 1-2)	$O(1)$
Initial buffer $G_{1,s}$ (Step 3)	$O(k^2)$
Four subgraphs in main loop (Step 4)	$O(k^2)$
$R_f$ computation workspace (Step 4.1)	$O(k^2)$
Query computation (Step 5)	$O(k^2)$
Total	$O(k^2)$

Update Time Summary

Phase	Edges processed	Time per edge	Reason
Step 3 (initial buffer)	First $\leq 4k^2$	$O(1)$	Just read and store
Step 4 (main loop)	All remaining	$O(1)$	Interleaving trick
Overall	All edges	$O(1)$