Design an online judge like leetcode.
Candidate: I’m familiar with such systems. Can we focus on designing the
code-focused scenarios?
Interview: Can you list the scenarios?
Candidate: Here’s what I was thinking:
Not in scope (lower priority):
Interviewer: Ya, that looks ok to me.
Candidate: Do you have an estimate for the number of submissions coming in
per second during peak times?
Interviewer: We should be able to handle a peak load of about 100 submits
per second.
Candidate: Ok. Let us decide how the evaluation is done by the online judge.
The basic idea behind evaluation is this:
test_case_file.txt
and a correct_answers.txt
file for each problem.stdin
python user_submitted_file.py < test_cases_file.txt > user_solution.txt
correct_answers.txt
and user_solution.txt
to see if all outputs
match.Interviewer: Hmmm, another option is to make the coder to implement a specific
class with a method-name and arguments. But your proposal looks good to me for now.
Candidate: (just thinking out loud) One of the main parts of the flow is
the code-submit part, where the coder uploads their submission. This needs to be
stored somewhere. A distributed filesystem is a natural choice here, as we can
just store the user-submissions as files. The distributed filesystem can take
care of replicating the data under the hood.
Another thing to note here is that it takes a lot of time to evaluate the submission.
So, having the server synchronously compile, run and evaluate the user code is not
ideal. We’ll have to do this asynchronously.
Having those things in mind, here’s what the high-level design might look like.
Interviewer: Why do you need controller and the queue to append your
work items?
Candidate: We’d need that to prevent all workers from constantly hitting the
persistent store for pending work. If the controller is not present, the alternative
approach would be:
PENDING
item.
PENDING
item, the worker sets it to PROCESSING
.The above approach creates a bottleneck on the persistent store. With the controller
approach, there is exactly one component polling for pending items. After selecting
all PENDING
items, we can use the queue + multiple-workers setup to parallelize
the processing on those pending items.
Of course, this set up adds additional components, which require us to handle
failure scenarios. We will get to that in a bit.
Interviewer: Ok.
Candidate: Let us go over the responsibilities of each component in the design.
Submit service
.Submit service
uploads the code to distributed filesystem (DFS). If this step Submit service
PENDING
Now that the submission has been recorded in the persistent store, we need to
detect the new submission and work on it.
Controller
is the component that:
PENDING
work-items (ie, detect new submissions)ENQUEUED
These contain the pending work-items (code submissions) that need to be evaluated.
The main benefits of the queue are:
The worker does the following:
For the design to work properly, we’d need the queue to satisfy certain properties:
A simple choice for a queue with those properties could be Amazon SQS. It persists
messages, and allows us to set “visibility timeout” on a per-item basis. This
would be similar to the “lease” mechanism describe above.
We’ll have the following entities:
User table |
---|
user_id (PK) |
name |
Questions table |
---|
question_id (PK) |
path_to_content_file |
path_to_test_cases_file |
path_to_solutions_file |
UserSubmissions table |
---|
user_id (FK) |
submission_id (PK) |
question_id (FK) |
submission_timestamp (indexed) |
path_to_code_contents |
status (PENDING , ENQUEUED , WRONG_ANSWER , ACCEPTED , TLE , …) |
last_update_timestamp |
The Controller
and Worker
components would be mainly working on the UserSubmissions
table, as it contains all the “work-item” information.
Given that the number of submits per second is going to be ~100, it seems ok to
have a MySQL database for our system.
An alternative set up would be to have the UserSubmissions
table be a NoSQL database,
as that is the table which receives most number of writes. One possible way to set
up the key for the NoSQL db would be:
Key: <user_id>
-<submission_timestamp>
-<question_id>
Value (column-families): status, path_to_code_contents, last_updated_timestamp
With the above key, we can do a quick range-scan for a user’s submissions over a
time-period. That helps with efficiently showing the submission history.
Having said that, I would lean towards having MySQL as my choice for DB, given the
low write-QPS.
In order to be more resilient to controller failures, we should have 3 (or 5) replicas
of the controller running. At any given time, the “master” controller is elected
via master election. If the master fails, we can run another round of election
to elect another new controller. That way, the new controller can resume operations
quickly. Only the master will be able to enqueue items to the queue.
If an “enqueue” on the queue fails, the controller can retry with exponential
backoff. If it still fails after multiple attempts, we can update the persistent
store status of this work item to INTERNAL_ERROR
and have alerts on these items.
Our choice of queue is SQS, which guarantees at-least-once delivery. This means
that there could be duplicate items inserted into the queue. But then, this is ok
as long as there are not too many of these duplicates. The worst that could happen
is that we would end up re-processing the same work-item multiple times. Given the
evaluation process done in the worker is idempotent, the correctness will not be
affected.
We will use the “visibility timeout” feature of the queue to make sure that any
work-item that de-queues the item will not be “visible” to other workers.
After a worker picks up an item from the queue, it runs a background thread to
keep “extending the lease” by small amounts (say, 10 minutes). This serves 2 purposes:
NOTE: the “lease” can be seen as just manipulating the “visibility timeout” of the
message in the queue.
In order to monitor the overall end-to-end latency, and other failure cases, we
can have an optional service that reads the persistent store and computes reports
and exports metrics for end-to-end latency, failures etc.
Each worker should run in a sandbox environment for the following reasons:
This is a whole topic in itself, and we may not be able to go too deep into this
topic. But some high-level options could be:
SIGALARM
for time-limit-exceeded.
setrlimit
for resource limits. (available in C/C++).If you see any issues with the existing approach, or would like to suggest
improvements, please send an email to contact@sys-design-interview.com, or comment
below. Always ready to accept feedback and improve!