A number of misconfigured Apache Airflow instances have exposed the credentials of popular services including cloud hosting providers, payment processing, and social media platforms. Apache Airflow is an open-source tool that helps organizations automate workflows. Using this application, users can create scheduled workflows such as collecting and parsing data to and from AWS buckets. These misconfigurations lead to the exposure of credentials and sensitive data. One of the most prevalent issues was hard coded credentials in the DAG code, a main function of Airflow. As described in the tool’s documentation, “A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting tasks together, organized with dependencies and relationships to say how they should run.” Credentials can also be exposed in the tool’s “Variable” feature. Another alarming issue is the ability for unauthenticated users to run database queries in versions of Airflow prior to v1.10. The results generated by such queries can expose proprietary database information, jeopardizing compliance with many Federal and State Privacy laws. In Airflow versions prior to 1.10.13, credentials entered using the tool’s CLI interface end up being logged in plaintext, as described in CVE-2020-17511.
The main recommendation for this issue is to update Airflow beyond version 2.0 as soon as possible. In addition, the issue of hard coded credentials tends to be a common problem in development. Secure coding practices are paramount to data integrity and should not be left up to the lone developer or team. A code review process, or more specifically, and Application Security team dedicated to verifying the viability of products can save an enterprise millions of dollars in losses associated with data leaks and compromise.