A number of misconfigured Apache Airflow instances have exposed the credentials of popular services including cloud hosting providers, payment processing, and social media platforms. Apache Airflow is an open-source tool that helps organizations automate workflows. Using this application, users can create scheduled workflows such as collecting and parsing data to and from AWS buckets. These misconfigurations lead to the exposure of credentials and sensitive data. One of the most prevalent issues was hard coded credentials in the DAG code, a main function of Airflow. As described in the tool’s documentation, “A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting tasks together, organized with dependencies and relationships to say how they should run.” Credentials can also be exposed in the tool’s “Variable” feature. Another alarming issue is the ability for unauthenticated users to run database queries in versions of Airflow prior to v1.10. The results generated by such queries can expose proprietary database information, jeopardizing compliance with many Federal and State Privacy laws. In Airflow versions prior to 1.10.13, credentials entered using the tool’s CLI interface end up being logged in plaintext, as described in CVE-2020-17511.
By Akshay Rohatgi and Randy Pargman About this Student Research Project Binary Defense’s mission is