If this helped you, please share!

Robust automatic retries in Celery tasks

Published October 22, 2018 in programming - 0 Comments

The Celery distributed task queue introduced retrying a failed task automatically for known exception types in version 4.0, and some very useful retry exponential backoff settings in version 4.2. Exponential backoff is beneficial because it spaces out retry requests in exponentially increasing intervals which can allow time to recover or restart. I turned off jitter in this example to make the backoff retry intervals clearer, but in practice making retry intervals random can also prevent overwhelming workers with closely spaced retry requests. Retry jitter is enabled by default in Celery.

Automatically retrying a failed task is useful, but might need additional error handling to make the process more robust. Error logging is an obvious action to take on failure, but the ability to execute recovery steps in an exception handler may be needed if the task code is not idempotent and the task affects system state.

I wanted to see what exception handling in a task with automatic retry would look like, so I put together a simple proof of concept (GitHub project) that runs in Docker Compose with RabbitMQ as the broker and a MongoDB backend.

The test_failure task calls a function that raises a known exception. Ultimately, the task will fail after 4 retry attempts. Since jitter is turned off, the exponential backoff retry interval starts at 2 seconds and increases by 2n seconds until the maximum number of retries is reached.

Re-raising the exception is necessary to trigger task retry. The exception message and retry attempts are traceable in the Celery worker logs:

Failing task log output highlighted.

The backoff times work as expected. Both the Celery worker and the code calling the task have access to the exception traceback:

Exception trace from task.

Tags: celery, docker, python

No comments yet

Leave a Reply: