Hi everyone,
Time to solve last week’s AtCoder problem.
Check out the post from last week for a refresher.
You’ll recall that we used memoization to design a correct solution, but one that was too slow to pass all of the test cases. I hinted at the end of the post that we were spending too much time processing the effects of large jump distance intervals. Here, we’re going to develop an alternate solution that runs extremely quickly.
A shoutout here to my colleague Stephen Babb for his help with this problem. I had a solution in mind that I was going to post, but it did not follow the format of the examples in Algorithmic Thinking as closely as I’d hoped. I mentioned this to Stephen, ignored this problem… but soon I had an email from him where he provided the idea developed here… which I think fits right in with what I did in the book. Thank you, Stephen!
In our previous solution, we used a memo
array to store the result of each subproblem. Specifically, for each j
, memo[j]
stored the number of ways that Tak can get from cell 1 to cell j
.
Remember this test case?
8 2
2 4
5 5
Here’s what the memo
array would look like, starting with index 0:
[0, 1, 1, 2, 4, 8]
This tells us that there are 0 ways to get to cell 0, 1 way to get to cell 1, 1 way to get to cell 2, 2 ways to get to cell 3, 4 ways to get to cell 4, and 8 ways to get to cell 5.
Now we’re going to change what the memo
array stores. Focus again on element memo[j]
. Rather than store the number of ways to get from cell 1 to cell j
, we’re going to store the number of ways to get from cell 1 to cell 2, plus the number of ways to get from cell 1 to cell 3, etc., etc., plus the number of ways to get from cell 1 to cell j
.
So instead of this:
[0, 1, 1, 2, 4, 8]
we have this:
[0, 1, 2, 4, 8, 16]
For example, for index 4, we’re storing $0+1+1+2+4 = 8$. And for index 5, we’re storing $0+1+1+2+4+8 = 16$.
I owe you answers to two questions at this point.
The first question is: how are we supposed to use this memo
array now? It contains wrong answers! Like, memo[5]
is 16, and the answer isn’t 16… it’s 8! So clearly we can’t just return memo[5]
and be done with it. But I’d like you to recall the prefix sum stuff from page 228 of the book. We have a prefix sum array here! memo[5]
is the sum of the solutions for the first 5 cells. memo[4]
is the sum of the solutions for the first 4 cells. To extract the number of ways to get from cell 1 to cell 5, then, we can just subtract these: $16-8 = 8$. Aha! So while it is true that memo
doesn’t literally contain the solutions anymore, we can still recover the solution with a simple subtraction.
The second question is: how does this weird new memo array even help us? I mean, sure, we can recover the solution that we want from it, and that’s pretty cool, but why do this prefix sum dance at all?
The reason is that we can now process the effects of a jump distance interval, no matter how big the interval, by simply considering the endpoints of the interval. For example, suppose that we’ve filled in the entire memo
array except for index 5:
[0, 1, 2, 4, 8, x]
We want to figure out the value for memo[5]
now.
How many solutions does the 2-4 jump distance interval give us? What we need here is the sum of the elements of memo
from memo[1]
to memo[3]
. And how do we figure out this sum? Just subtract: memo[3] - memo[0]
. This is a really small interval, so the savings here aren’t impressive, but notice that the technique wouldn’t change one bit if the interval spanned thousands of jump distances.
Consider the other test case from last week:
50000 1
100 48000
Suppose we’re solving for memo[50000]
. For this 100-48000 interval, all we’d have to do to get its sum is
memo[50000-100] - memo[50000-48000-1]
.
Here’s the updated code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define N 200000 // Max number of cells
#define MOD 998244353
#define MAX_INTERVALS 10 // Max number of jump distance intervals
// Last time, this function returned the number of ways to
// get from cell 1 to (exactly) cell n.
// This time, the function returns the number of ways
// to get to cell 1, plus the number of ways to get to cell 2,
// plus the number of ways to get to cell 3, ...,
// plus the number of ways to get to cell n.
int solve(int jump_intervals[][2], int n, int k, int memo[]) {
int i, total;
if (n <= 0)
return 0;
if (memo[n] != -1)
return memo[n];
if (n == 1) {
memo[n] = 1;
return memo[n];
}
total = solve(jump_intervals, n - 1, k, memo);
for (i = 0; i < k; i++) {
total = (total + solve(jump_intervals, n - jump_intervals[i][0], k, memo)) % MOD;
total = (total - solve(jump_intervals, n - jump_intervals[i][1] - 1, k, memo));
while (total < 0)
total += MOD;
}
memo[n] = total;
return memo[n];
}
int main(void) {
int n, k, i, start, end, res;
static int jump_intervals[MAX_INTERVALS][2];
static int memo[N + 1];
scanf("%d%d", &n, &k);
for (i = 0; i < k; i++) {
scanf("%d%d", &start, &end);
jump_intervals[i][0] = start;
jump_intervals[i][1] = end;
}
for (i = 1; i <= n; i++)
memo[i] = -1;
res = solve(jump_intervals, n, k, memo) - solve(jump_intervals, n - 1, k, memo);
while (res < 0)
res += MOD;
printf("%d\n", res);
return 0;
}
Let’s look at the main
function first (40). It’s almost the same as last time, except now we have an array for the jump distance intervals (42). This is a two-dimensional array, where jump_intervals[j][0]
contains the smallest distance in interval j
and jump_intervals[j][1]
contains the biggest. Rather than record each jump distance in this range, now our code stores only the two endpoints (49-50). The only other notable change to main
is in how we recover the solution from what solve
gives us. We now need to solve the problem for n
and subtract from it the solution for n - 1
(56).
Now for the solve
function (16). There’s a new parameter here, k
, which tells us how many jump distance intervals there are.
What’s the solution for memo[n]
? First of all, we have to include all of the solutions up to and including n - 1
(28). Without that, we’d be counting only the ways to get to cell n
, when what we want here is the number of ways to get to any cell up to and including n
.
Next, we go through each jump distance interval (30), and use its endpoints to add the number of solutions for the current interval.
Don’t forget to correctly mod the result! When adding, this is easy: just use the mod operator directly (31). When subtracting, however, we can’t use the C mod operator in the same way, because if the first operand is negative then it will give us a negative result. So we just manually add MOD
until we get a number that’s not negative (33-34).
This solution is rocket fast because there are at most 10 jump distance intervals to consider. So to solve for some problem size j
, we need only process 10 intervals, each of which takes us only a few steps. We therefore have an $O(n)$ algorithm here, compared to the $O(n^2)$ algorithm from last week. And now we’re good to go with AT Coder, as we’ll pass all of the test cases within the time limit.
On my Windows laptop here, this test case no longer works:
50000 1
100 48000
A running program is granted a call stack of a specific size. Unfortunately – and this is a general caveat of code that uses memoization – if the recursion goes too deep then the stack may not be adequate. One option is to increase the amount of stack space. For example, this works for me on Windows:
gcc -Wl,--stack,16777216 abc179_d.c
(That’s a 16 megabyte stack. You might need to increase even further for even bigger examples.) A more satisfying solution is to convert the code from memoization to dynamic programming. Do give that a try if you’d like more practice with that!
Thank you all! I’m really enjoying engaging with you, my reader, and truly appreciate your support.
Just FYI, I maintain a very low-traffic newsletter where I send occasional updates related to the book and what’s going on here on the blog. Please see my Contact and Newsletter page if you’re interested. Thanks!
Written on January 9th, 2021 by Daniel Zingaro