20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Serving PyTorch models

451

IMPLEMENTATION

We implement this by writing two functions. The model runner function starts at the

beginning and runs forever. Whenever we need to run the model, it assembles a batch

of inputs, runs the model in a second thread (so other things can happen), and

returns the result.

The request processor then decodes the request, enqueues inputs, waits for the

processing to be completed, and returns the output with the results. In order to

appreciate what asynchronous means here, think of the model runner as a wastepaper

basket. All the figures we scribble for this chapter can be quickly disposed of to the

right of the desk. But every once in a while—either because the basket is full or when

it is time to clean up in the evening—we need to take all the collected paper out to the

trash can. Similarly, we enqueue new requests, trigger processing if needed, and wait

for the results before sending them out as the answer to the request. Figure 15.2 shows

our two functions in the blocks we execute uninterrupted before handing back to the

event loop.

A slight complication relative to this picture is that we have two occasions when we

need to process events: if we have accumulated a full batch, we start right away; and

when the oldest request reaches the maximum wait time, we also want to run. We

solve this by setting a timer for the latter. 5

Request proceSsor (caLled for each request)

d = get_request_data()

im = decode_image_to_tensor(d)

work_item['input'] = image

aDd_to_queue(work_item)

schedule_next_proceSsor_run()

wait_for_ready(work_item)

im_out = work_item['result']

return encode_in_response(im_out)

Event

LOop

Model RuNner (runs forever, with pauses)

while True:

wait_for_work()

batch = get_batch_from_queue()

if more_work_left:

schedule_next_proceSsor_run()

result = launch_model_in_other_thread(batch)

extract_result_and_signal_ready()

Model Execution (launched by Model RuNner)

Runs in other thread to not block

run_model_in_jit() # no GIL once in JIT

signal to event lOop (and thus Model RuNner)

Figure 15.2 Our asynchronous server consists of three blocks: request processor, model runner, and model

execution. These blocks are a bit like functions, but the first two will yield to the event loop in between.

5

An alternative might be to forgo the timer and just run whenever the queue is not empty. This would

potentially run smaller “first” batches, but the overall performance impact might not be so large for most

applications.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!