Ben Dickson

31 August 2021 at 11:05 UTC

Updated: 31 August 2021 at 11:44 UTC

Developers revoke YAML support to protect against exploitation

The team behind TensorFlow, Google’s popular open source Python machine learning library, has revoked support for YAML due to an arbitrary code execution vulnerability.

YAML is a general-purpose format used to store data and pass objects between processes and applications. Many Python applications use YAML to serialize and deserialize objects.

According to an advisory on GitHub, TensorFlow and Keras, a wrapper library for TensorFlow, used an unsafe function to deserialize YAML-encoded machine learning models.

A proof-of-concept shows the vulnerability being exploited to return the contents of a sensitive system file:

“Given that YAML format support requires a significant amount of work, we have removed it for now,” the maintainers of the library said in their advisory.

Deserialization insecurity

“Deserialization bugs are a great attack surface for codes written in languages like Python, PHP, and Java,” Arjun Shibu, the security researcher who discovered the bug, told The Daily Swig.

“I searched for Pickle and PyYAML deserialization patterns in TensorFlow and, surprisingly, I found a call to the dangerous function .”

READ MORE Microsoft warns of critical Azure Cloud vulnerability impacting Cosmos DB accounts

The function loads a YAML input directly without sanitizing it, which makes it possible to inject the data with malicious code.

Unfortunately, insecure deserialization is a common practice.

“Researching further using code searching applications like, I saw thousands of projects/libraries deserializing python objects without validation,” Shibu said. “Most of them were ML specific and take user input as parameters.”

Impact on machine learning applications

The use of serialization is very common in machine learning applications. Training models is a costly and slow process. Therefore, developers often used pre-trained models that have been stored in YAML or other formats supported by ML libraries such as TensorFlow.

“Since ML applications usually accept model configuration from users, I guess the availability of the vulnerability is common, making a large proportion of products at risk,” Shibu said.

Read more of the latest hacking news

Regarding the YAML vulnerability, Pin-Yu Chen, chief scientist at…

Continue reading: