Hello, algorithms people!
I love finding challenging problems that use hash tables or recursion or dynamic programming or graphs or binary search or segment trees or… all of the stuff that I go over in my Algorithmic Thinking book.
But you know? Sometimes there’s a problem that doesn’t use any of those things – and, still, it’s so neat that I can’t help posting about it.
I recently found one of those!
It’s this Alkemija COCI problem on DMOJ. Please read through the problem description.
An ad hoc problem is one where we need a problem-specific solution strategy to solve it. We don’t use a well-known solution strategy like dynamic programming; we don’t model it using a well-known modeling strategy like graphs. We come up with something new that solves only this problem.
One reason I didn’t include an Ad Hoc Problems chapter in the book is because there’s not much that we can transfer from one such problem to another. Compare that to graphs, where I can teach you how to work with graphs, and then you can go around solving lots of different graphs problems.
But this doesn’t mean that solving ad hoc problems is frivolous. There’s a different kind of thinking required to solve ad hoc problems, a kind that we may not exercise when using our catalog of well-known data structures or algorithms. We need to come up with something new – a new way to model the data, a problem-specific algorithm — that may have little to do with other problems we’ve solved.
Are ad hoc problems easier than other problem types? Harder? It depends completely on the problem itself. They can sometimes be “easy” because they don’t require any prior knowledge of specific data structures or algorithms. At the same time, they can be “hard” because you can’t just throw dynamic programming or a graph at them. (Not to say that using dynamic programming or graphs is easy. But, hey, at least you have a starting point!)
I tend to find ad hoc problems quite challenging and, therefore, fun and worthwhile.
Today, I want to just solve this problem correctly. Spoiler: our solution is not going to be fast enough, but at least we’ll have something. It will also give us a benchmark implementation that we can use to test our faster solutions and make sure we’re still getting the right answers.
Every chemical reaction has a before set of substances and an after set of substances. If we already own all of the before substances, we can create and therefore own all of the after substances.
We don’t know how many substances are in each reaction’s before set or after set. We do know that each such set has at most 100000 substances, but it would be wasteful to create arrays of 100000 elements for each before set and each after set.
Actually, wait. We can make a stronger claim than saying that each set has at most 100000 substances. The problem specification states that the sum of the values of all before or after substances in a set is at most 100000. The way to pack the most substances in a set, minding this restriction, is to fill it with the smallest-valued ones: 1, 2, 3, and so on. There’s a useful equation we can use here: the sum of the first $n$ integers is $n(n+1)/2$. The sum of the first 500 integers is already more than 100000, so 500 elements is already bigger than any of these sets can be.
But, still: we may have some very small before or after sets. Why create arrays of 500 elements when only very few may be used?
Instead of using arrays, I’ve decided to use linked lists, which can grow to the precise size we need for each set. (I also use linked lists in Chapter 1 of my Algorithmic Thinking book, if you need a refresher.)
Here’s the code for my first attempt. Read through the code, and then we’ll discuss.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_SUBSTANCES 100000
#define MAX_REACTIONS 100000
typedef struct substance_node {
int substance;
struct substance_node *next;
} substance_node;
// Return 1 if all substances starting from node are in is_owned,
// 0 otherwise.
int can_react(substance_node *node, int is_owned[]) {
while (node) {
if (!is_owned[node->substance])
return 0;
node = node->next;
}
return 1;
}
int main(void) {
static int is_owned[MAX_SUBSTANCES + 1];
static substance_node *before[MAX_REACTIONS];
static substance_node *after[MAX_REACTIONS];
int num_substances, num_owned, num_reactions, i, j, substance;
int num_before, num_after, changed;
substance_node *node;
scanf("%d%d", &num_substances, &num_owned);
for (i = 0; i < num_owned; i++) {
scanf("%d", &substance);
is_owned[substance] = 1;
}
scanf("%d", &num_reactions);
for (i = 0; i < num_reactions; i++) {
scanf("%d%d", &num_before, &num_after);
for (j = 0; j < num_before; j++) {
node = malloc(sizeof(substance_node));
if (node == NULL) {
fprintf(stderr, "malloc error\n");
exit(1);
}
scanf("%d", &node->substance);
node->next = before[i];
before[i] = node;
}
for (j = 0; j < num_after; j++) {
node = malloc(sizeof(substance_node));
if (node == NULL) {
fprintf(stderr, "malloc error\n");
exit(1);
}
scanf("%d", &node->substance);
node->next = after[i];
after[i] = node;
}
}
changed = 1;
while (changed) {
changed = 0;
for (i = 0; i < num_reactions; i++)
if (can_react(before[i], is_owned)) {
substance_node *node = after[i];
while (node) {
if (!is_owned[node->substance]) {
is_owned[node->substance] = 1;
num_owned++;
changed = 1;
}
node = node->next;
}
}
}
printf("%d\n", num_owned);
for (i = 1; i <= num_substances; i++)
if (is_owned[i])
printf("%d ", i);
printf("\n");
return 0;
}
Each substance_node
(8) contains a substance number and a pointer to the next substance in the linked list. The before
(27) and after
(28) arrays will point to the start of the linked lists.
One other data structure that I’m maintaining here is is_owned
(26). If is_owned[i]
is 0
, then substance i
isn’t currently owned; otherwise, it is. At the beginning, before any reactions take place, the owned substances are the ones we read from the input (38).
The core of the program is a loop that continues as long as changed
is 1
(70). In this loop, we first set changed
to 0
(71), and then go through all of the reactions to see which ones can be used given the substances that we already own. I have a helper function that I call (73) for each reaction to determine whether we can use it. If there’s a reaction that we can use, and it yields a new substance that we didn’t already own, then we set changed
back to 1
(79) so that the loop will iterate again. After all, we have at least one new substance; maybe this made new reactions available to us.
When the main loop (70) terminates, how do we know for sure that we can’t create any new substances? Well, for changed
to be 0
, it means that we checked through all of the reactions and none of them was able to do anything for us in terms of new substances. Nothing changed! We’ve reached a steady state: if a reaction doesn’t apply now, then it will never apply, because no reaction applies now.
If you submit our first attempt to DMOJ, you’ll get some but not all of the points. Our solution is too slow.
One bottleneck here is that we check every reaction whenever there’s a change in the substances that we own. For example, say we’re running along, mixing chemicals and making progress, and we eventually own substance 5. Now we’re going to check every reaction again, even those that have nothing to do with substance 5!
Here’s the start of a test case that bodes poorly for us:
100000 1
1
40000
1 2
1
1 2
2 3
1 2
1 2 3
3 4
1 2 3
1 2 3 4
4 5
1 2 3 4
1 2 3 4 5
5 6
1 2 3 4 5
1 2 3 4 5 6
6 7
1 2 3 4 5 6
1 2 3 4 5 6 7
7 8
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
... continue the pattern
If we follow this pattern through, many of the reactions are going to have hundreds of substances, all of which will have to be checked when we want to determine whether a given reaction can be used. To make matters even worse, we could reverse the order of the reactions in the test case, so that we pick up only one new reaction on each iteration of the main loop.
Time to speed this thing up. Give it a try! Here’s a hint: wouldn’t it be nice if we could keep track of only the reactions that are ready to be used? And what if we track not the specific substances required to use each reaction, but simply the number of substances that we don’t yet own from each reaction?
Thank you. Until next time!
Written on April 17th, 2021 by Daniel Zingaro