Hi everyone,
Last week, we started solving a segment tree problem from the 2010 Canadian Computing Olympiad (CCO). To get up to speed:
Last week, I presented some initial code for solving this thing. We used a segment tree in a cool way to be able to handle each of the three operations quickly, and to respond correctly to the “Q” operations.
You’ll recall though that we were stuck with three issues:
We couldn’t support the “M” (modify rating) operation at all, because we didn’t know how to go from a friend’s name to their current rating. Without their current rating, we couldn’t find that old rating in the tree and zero it out. We need to add something to our code that lets us retrieve the rating for a specified friend; that is, to go from a friend to their rating.
We could only partially support the “Q” operation, because we were outputting the $k$th-highest rating, not the friend with that rating. The problem was that we had no way to go from a rating to the friend with that rating. We need to add something to our code that lets us retrieve the friend that has a specified rating; that is, to go from a rating to the friend with that rating. Wait wait – what if there are multiple friends with the same rating? No, no: the problem description disallows that.
We could only support ratings between 1 and 1 million. We need to support ratings up to 100 million – but we couldn’t, because, the way we were doing it, the segment tree would outstrip the available memory.
Let’s address the first two of these issues first. Then we’ll come back for the third.
To solve the first two issues, we need a way to go from a friend to their rating, and a way to go from a rating to the friend with that rating. We’re going to add two arrays to do this: friend_to_rating
to go from a friend to their rating, and rating_to_friend
to go the other way (from a rating to the corresponding friend). We’ll keep these arrays up-to-date whenever a rating changes, and we’ll use them to fully implement the “M” and “Q” operations.
Here’s the updated code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_RATING 1000000
#define MAX_FRIENDS 1000000
typedef struct segtree_node {
int left, right;
int num_ratings; // Number of people with ratings in the range left..right
} segtree_node;
void init_segtree(segtree_node segtree[], int node,
int left, int right) {
int mid;
segtree[node].left = left;
segtree[node].right = right;
if (left == right)
return;
mid = (left + right) / 2;
init_segtree(segtree, node * 2, left, mid);
init_segtree(segtree, node * 2 + 1, mid + 1, right);
}
int query_segtree(segtree_node segtree[], int node, int k) {
int in_right;
if (segtree[node].left == segtree[node].right)
return segtree[node].left;
in_right = segtree[node * 2 + 1].num_ratings;
if (in_right >= k)
return query_segtree(segtree, node * 2 + 1, k);
else
return query_segtree(segtree, node * 2, k - in_right);
}
void bump_rating_segtree(segtree_node segtree[], int node, int rating,
int up_down) {
int left_max_index;
segtree[node].num_ratings += up_down;
if (segtree[node].left == segtree[node].right)
return;
left_max_index = segtree[node * 2].right;
if (rating <= left_max_index)
bump_rating_segtree(segtree, node * 2, rating, up_down);
else
bump_rating_segtree(segtree, node * 2 + 1, rating, up_down);
}
int main(void) {
static segtree_node segtree[MAX_RATING * 4 + 1];
static int friend_to_rating[MAX_FRIENDS + 1];
static int rating_to_friend[MAX_RATING + 1];
int n, i, friend, rating, k, old_rating, result;
char code;
init_segtree(segtree, 1, 1, MAX_RATING);
scanf("%d", &n);
for (i = 0; i < n; i++) {
scanf(" %c", &code);
if (code == 'N') { // New friend
scanf("%d%d", &friend, &rating);
friend_to_rating[friend] = rating;
rating_to_friend[rating] = friend;
bump_rating_segtree(segtree, 1, rating, 1);
} else if (code == 'M') { // Modify friend rating
scanf("%d%d", &friend, &rating);
old_rating = friend_to_rating[friend];
bump_rating_segtree(segtree, 1, old_rating, -1);
bump_rating_segtree(segtree, 1, rating, 1);
friend_to_rating[friend] = rating;
rating_to_friend[old_rating] = 0;
rating_to_friend[rating] = friend;
} else { // Query kth highest rating
scanf("%d", &k);
result = query_segtree(segtree, 1, k);
printf("%d\n", rating_to_friend[result]);
}
}
return 0;
}
All of the new action is in the main
function. Our new arrays are defined near the beginning of the function (57-58). The rating_to_friend
array has one index per rating, so we have another problem here if we try to support the full 100 million ratings (rather than only a million ratings). Don’t worry about that for now… remember that we’re going to solve that particular issue later.
Now let’s talk about the changes that we needed to make to the code for each operation.
For operation “N” (new friend), we don’t have much new. We just have to update the two new arrays (69-70). That’s because we need to associate the new friend with their initial rating, and that initial rating with this new friend.
For operation “M” (modify rating), we’ve done quite a bit of new stuff. First, now we have a way to figure out the old/previous rating for this friend by using our new friend_to_rating
array (74).
With that old rating in hand, we can bump it down by 1, from 1 to 0, in the segment tree, thereby removing that old rating (75). Now that old rating is gone from the segment tree.
Next, we can bump up the new rating in the segment tree (76). Beautiful: that’s how we support modifying a rating in the segment tree! All that’s left is to keep our new arrays in sync with this change in rating (77-79).
For operation “Q” (query), we were close before. The only thing new is to translate from the retrieved rating to the corresponding friend (82).
OK! Two issues down, one to go.
(I flippin’ love this technique…)
It’s true that the ratings for friends can be between 1 and 100 million. Someone could have a rating of 1. Someone else could have a rating of 100 million. But – here’s the leverage we have – only up to 1 million of those ratings can actually be used. That’s because there are at most 1 million operations in the test case, and each operation can refer to at most one new rating.
So, only one million out of a possible 100 million ratings can actually be used; that is, a maximum of 1 per cent of the ratings. That’s very sparse usage of the ratings. Trying to support 100 million unique ratings, even if we could somehow do it, is therefore overkill.
Let’s go back to the test case from last week:
8
N 3 700
N 1 500
Q 2
N 7 1000
Q 2
M 3 45000
Q 2
Q 1
The only ratings in use here are 700, 500, 1000, and 45000. Only four ratings! Yet, with the techniques developed to this point, we’d need a segment tree to support the ratings from 1 to 45000. All but four of those ratings would go unused!
There’s no reason, however, for us to be stuck with those huge ratings. All that matters is that we keep the ratings the same relative to each other. As long as we know that 500 is the smallest rating, 700 is next, 1000 is next, and 45000 is the biggest, we’ll have lost nothing. It doesn’t matter what we call the ratings, as long as their order is maintained.
The idea, then, is to assign our own small rating to each of the original ratings. Our ratings will start at 1. We’ll use 1 for the smallest rating, 2 for the second-smallest rating, 3 for the third-smallest, and so on. Here, we’ll assign 1 to rating 500, 2 to rating 700, 3 to rating 1000, and 45000 to rating 4. Then we’ll translate as needed between the original, big ratings and our new, small ratings. Someone wants to change the 45000 rating? No — they’re really referring to our small, 4 rating. That is, we will translate a big rating from the outside to our own small rating that we use internally.
One problem: how do we know what all of the used ratings are? We need them all, up front, so that we can assign a small rating (1, 2, 3, etc.) to each of them. But we read each operation, one at a time, and then that operation’s info goes away. What do we do?
Simple: we process each operation twice. The first time we process the operations, we are just looking through the stuff to identify all of the ratings that are in play. We store the operations so that we can run through them again. (If we didn’t store them, then they’d be gone: there’s certainly no way to back up and read the operations again from the input.) The second time we process the operations, we actually perform them, translating between big ratings and small ratings as needed.
We’re ready at last to fully solve this problem. Here’s the final code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_FRIENDS 1000000
#define MAX_OPERATIONS 1000000
typedef struct segtree_node {
int left, right;
int num_ratings; // Number of people with ratings in the range left..right
} segtree_node;
typedef struct operation {
char code;
int arg1, arg2;
} operation;
void init_segtree(segtree_node segtree[], int node,
int left, int right) {
int mid;
segtree[node].left = left;
segtree[node].right = right;
if (left == right)
return;
mid = (left + right) / 2;
init_segtree(segtree, node * 2, left, mid);
init_segtree(segtree, node * 2 + 1, mid + 1, right);
}
int query_segtree(segtree_node segtree[], int node, int k) {
int in_right;
if (segtree[node].left == segtree[node].right)
return segtree[node].left;
in_right = segtree[node * 2 + 1].num_ratings;
if (in_right >= k)
return query_segtree(segtree, node * 2 + 1, k);
else
return query_segtree(segtree, node * 2, k - in_right);
}
void bump_rating_segtree(segtree_node segtree[], int node, int rating,
int up_down) {
int left_max_index;
segtree[node].num_ratings += up_down;
if (segtree[node].left == segtree[node].right)
return;
left_max_index = segtree[node * 2].right;
if (rating <= left_max_index)
bump_rating_segtree(segtree, node * 2, rating, up_down);
else
bump_rating_segtree(segtree, node * 2 + 1, rating, up_down);
}
int compare(const void *v1, const void *v2) {
const int n1 = *(const int *)v1;
const int n2 = *(const int *)v2;
return n1 - n2;
}
int main(void) {
static segtree_node segtree[MAX_OPERATIONS * 4 + 1];
static int friend_to_rating[MAX_FRIENDS + 1];
static int rating_to_friend[MAX_OPERATIONS + 1];
static operation operations[MAX_OPERATIONS + 1];
static int all_ratings[MAX_OPERATIONS];
int n, i, friend, rating, k, result, total_ratings;
int rating_location, old_rating_location;
init_segtree(segtree, 1, 1, MAX_OPERATIONS);
scanf("%d", &n);
total_ratings = 0;
// Learn about the ratings used. Store the operations for later
all_ratings[total_ratings++] = 0;
for (i = 0; i < n; i++) {
scanf(" %c", &operations[i].code);
if (operations[i].code == 'N') {
scanf("%d%d", &operations[i].arg1, &operations[i].arg2);
all_ratings[total_ratings++] = operations[i].arg2;
} else if (operations[i].code == 'M') {
scanf("%d%d", &operations[i].arg1, &operations[i].arg2);
all_ratings[total_ratings++] = operations[i].arg2;
} else {
scanf("%d", &operations[i].arg1);
}
}
qsort(all_ratings, total_ratings, sizeof(int), compare);
// Now we do the operations for real
for (i = 0; i < n; i++) {
if (operations[i].code == 'N') { // New friend
friend = operations[i].arg1;
rating = operations[i].arg2;
rating_location = (int *)bsearch(&rating, all_ratings, total_ratings,
sizeof(int), compare) - all_ratings;
friend_to_rating[friend] = rating_location;
rating_to_friend[rating_location] = friend;
bump_rating_segtree(segtree, 1, rating_location, 1);
} else if (operations[i].code == 'M') { // Modify friend rating
friend = operations[i].arg1;
rating = operations[i].arg2;
old_rating_location = friend_to_rating[friend];
bump_rating_segtree(segtree, 1, old_rating_location, -1);
rating_location = (int *)bsearch(&rating, all_ratings, total_ratings,
sizeof(int), compare) - all_ratings;
bump_rating_segtree(segtree, 1, rating_location, 1);
friend_to_rating[friend] = rating_location;
rating_to_friend[old_rating_location] = 0;
rating_to_friend[rating_location] = friend;
} else { // Query kth highest rating
k = operations[i].arg1;
result = query_segtree(segtree, 1, k);
printf("%d\n", rating_to_friend[result]);
}
}
return 0;
}
We have a new type to store each operation as we read it (13-16). We use code
for the operation’s code (N
, M
, or Q
). We use arg1
and arg2
for the second and (for N
and M
) third piece of information from the operation’s line.
Again, the new stuff is in the main
function, so let’s jump there now. Instead of one pass through the operations, we now have two. The first pass stores the operations as it reads them in the new operations
array, and stores each rating in the new all_ratings
array (85). That all_ratings
array starts with a 0
rating (80), just so each real rating gets placed at an index that starts at 1 (not 0).
Now we’ve got all of the ratings stored in the all_ratings
array. Our next step is to sort them (94). This gives us an array where each original rating in sorted order is assigned to the indices 1, 2, 3, and so on. The location (index) of an original rating in this array tells us the small rating that it corresponds to. For example, if all_ratings
were [0, 500, 700, 1000, 45000]
, then the original rating of 1000 would correspond to our new, small rating of 3.
Now for the second pass through the ratings, this time performing them for real. Let’s see how the code for each operation has changed.
For operation “N”, we’re given a rating for the new friend (100). But that’s not the rating we’re going to use. Instead, we use the new, small rating, which is the index of the rating in all_ratings
. And how do we find that index? The all_ratings
array is sorted, so we can use binary search here. Rather than implement it myself, I’ve chosen to use the built-in C bsearch
function (101). It’s similar to qsort
in that it requires a comparison function to determine the relationship between two values; I’ve provided one earlier in the code (59-63).
For operation “M”, again we need to translate from an original rating – the new rating of the friend – to our small rating (111). Why didn’t we have to do that for the friend’s old rating? Because we already have the correct, small rating (109).
For operation “Q”, there is no change!
In one way, the techniques we used today (an array to go from friend to rating, an array to go from rating to friend, a binary search on a sorted array of ratings) are nothing new. But combined with a segment tree, they enable us to solve a very challenging problem. Really, today’s material had nothing to do specifically with a segment tree; the data structure knowledge was already deployed last week. So if you’re ever struggling with a problem that a data structure seems to almost solve, think about whether you can just augment it a little to get it to do what you want. Of course, it’s possible that it’s the wrong data structure, but don’t throw away what you have until you’re sure!
Thank you for joining me! Until next time…
Written on December 13th, 2020 by Daniel Zingaro