If this helped you, please share!

Useful Airflow on Kubernetes Features

Published June 7, 2019 in data , devops - 0 Comments

KubernetesExecutor

The KubernetesExecutor sets up Airflow to run on a Kubernetes cluster. This executor runs task instances in pods created from the same Airflow Docker image used by the KubernetesExecutor itself, unless configured otherwise (more on that at the end). Getting Airflow deployed with the KubernetesExecutor to a cluster is not a trivial task. I used this article and the Helm chart it recommends as a starting point. Check out this tutorial if you’re not familiar with Helm. I think it’s worth the effort to set up the KubernetesExecutor if you’re not using a managed Airflow service, because it leverages all the benefits Kubernetes offers including monitoring, logging, restarting pods, managing volumes etc.

KubernetesPodOperator

The KubernetesPodOperator runs a task in a Kubernetes pod using any Docker image that you choose. The KubernetesPodOperator is a powerful tool for writing robust task instances and has the advantages of managing task code in containers. Also, as long as you have an available Kubernetes cluster, the KubernetesPodOperator can be used with other executors.

Storage

It’s very common for Docker containers to use storage to manage data, to read and write files. Kubernetes builds on Docker’s storage capabilities with persistent volumes and volume claims. Airflow task instances can take advantage of this storage by mounting Kubernetes volumes through the operator used to describe the task.

Volumes

If we configured a persistent volume labeled with the name “my-volume” with a persistent volume claim labeled with the name “my-volume” and mount it to “/usr/local/tmp” in containers in pods, the a KubernetesPodOperator can mount the volume as shown:

Operators other than the KubernetesPodOperator can set up Kubernetes resources on the KubernetesExecutor using the executor_config parameter. The same volume mount is set up as shown through executor_config:

The executor_config settings for the KubernetesExecutor need to be JSON serializable.

Custom Airflow Images

It’s also possible to run operators that are not the KubernetesPodOperator in Airflow Docker images other than the one used by the KubernetesExecutor. For example, I could have created a new Airflow Docker image named airflow:test with a different Python setup, or built with potentially risky code that I want to test. I can run a task instance described by any operator that has an executor_config parameter in an airflow:test container as shown:

No comments yet

Leave a Reply: