Check Point CloudGuard Spectral exposes new obfuscation techniques for malicious packages on PyPI

November 9, 2022

Highlights: 

  • Check Point Research (CPR) detects a new and unique malicious package on PyPI, the leading package index used by developers for the Python programming language

  • The new malicious package was designed to hide code in images and infect through open-source projects on Github

  • CPR responsibly disclosed this information to PyPI, who removed the packages immediately

 

New obfuscation techniques for malicious packages – hiding code inside images 

The Check Point CloudGuard Spectral Data Science team recently detected a new and unique malicious package on PyPI, the repository of software for the Python programming language. The malicious package was designed to hide code in images (image base code obfuscation – Steganography) and infect PyPI users through open-source projects on Github. These findings reflect careful planning and thought by a threat actor, who proves that obfuscation techniques on PyPi have evolved.

 

Common malicious packagestructure 

Malicious packages on the open-source domain commonly include 3 main components

  1. The malicious code: responsible for downloading and running a virus executable, opening a remote shell to the attacker or just by collecting and publishing elsewhere all the PII it can find.
  2. The carrier code: responsible for sneaking in the malicious code. Commonly it will be a legit package with the malicious snippet bundled in as part of the installation code (like setup.py in PyPI or post install scripts in NPM). The carrier code is hidden through obfuscation or can be dynamically downloaded from sources, like pastebin.com, during the installation.
  3. The infecting package: attracts victims to install the malicious package in the first place. A common technique is name-squatting variations; choosing package name which resembles common legit ones.

The attackers will commonly face a package naming tradeoff. Choosing a package name that is too bold or common can lead to high visibility and quick detection. However, choosing a niche name can lead to a small amount of package downloads and lower the potential number of successful infections. The threat actor would have to fill this gap by actively engaging with potential users to make them to install the infected package. In most cases attackers seem to favor scale (mimicking common packages names, assuming high download numbers will guarantee at least some infections to happen even given a shorter potential package lifespan). Some cases, such as Apicolor, include more unique and non-trivial malicious code design choices.

 

Apicolor

The malicious package researchers detected was named ‘apicolor’. At first glance, it seemed like one of the many in development packages on PyPI. It was quite new (initially published on the 31/10) with a general description and a vague header stating this is a ‘core lib for REST API’. Apicolor was almost invisible to the common malicious packages’ observers.

After taking a deeper look into the package installation script, researchers noticed a strange, non-trivial, code section at the beginning. It starts by manually installing extra requirements (not through the more common requirements section), then it downloads a picture from the web and uses the newly installed package to process the picture and trigger the processing generated output using the exec command. Code snippet highly differs from what we commonly see on general setup.py installation scripts. 

The two packages being manually installed are requests (quite popular helper package for API usage), and judyb. The judib package details initially seem like an ‘in progress’ package, having an empty description and a vague header stating this is ‘a pure Python judyb module’. A deeper look revealed judib was first released around the same time as apicolor.    

The judyb code turned out to be a steganography module, responsible hiding and revealing hidden messages inside pictures. Check Point Research suspected that the image downloaded during the apicolor installation may include a hidden part inside of it.   

Returning now to apicolor installation code, the first step was to observe the picture downloaded from the web. It seemed legit, nothing unusual in it. 

Applying judyb ‘reveal’ method to this image revealed a hidden message, uncovered from that image. The message seems to include a base64 obfuscated Python code, a common practice for malicious packages to hide their malicious code with.  

Using base64 to de-obfuscate that snippet revealed the following common malicious code pattern : download a malicious exe from the web and run it locally.  

After finding the malicious and carrier parts of the apicolor package, it was time to expose the infector part.

 

Actively infecting 

The immediate place to investigate such packages is GitHub. Researchers searched for code projects using these packages, enabling the team to further understand their infection techniques (if anyone mistakenly installed them and if they did, how it happened). Using this search, it became apparent that apicolor and judib are quite niche, having low usage on GitHub projects.  

Only three GitHub users seem to include these packages in their code. Adding them as (a super redundant) requirements to their publicly accessible GitHub projects. Not surprisingly, these three users turned out to be new GitHub users.  

Hidden infection 

The infection process goes as follows:

While searching the web for legit projects a user will come across these GitHub open-sourced projects and install them locally, not knowing it brings in a malicious package import. It’s important to note that the code seems to work. In some cases, there are empty malicious packages. From the installer point of view, they’re trying an open-source project from GitHub, not knowing it hides a malicious trojan part inside it. 

Carful users will consider only open-source projects with a reputation, looking at aspects like number of stars and forks. The mentioned project appears to fit with this criteria, having dozens of stars and hundreds of forks. However, a deeper look shows its reputation is synthetically generated, having only a single forking account and a set of staring accounts which most of them only seem ‘to star’ these accounts projects, probably a supporting part of this campaign.   

 

Responsible disclosure to PyPI

Once these packages were identified Check Point Research alerted PyPI of their existence, who removed the apicolor package.

 Check Point CloudGuard Spectral users remain protected against such malicious packages 

Supply chain attacks are designed to exploit trust relationships between an organization and external parties. These relationships could include partnerships, vendor relationships, or the use of third-party software. Cyber threat actors will compromise one organization and then move up the supply chain, taking advantage of these trusted relationships to gain access to other organizations’ environments. Such attacks became more frequent and grew in impact in recent years, therefore it is essential developers make sure are keeping their actions safe, double checking every software ingredient in use and especially such that are being downloaded from different repositories, especially ones which were not self-created.

At Spectralops.io, now a Check Point company, our mission is to generate a secure development process to ensure developers are doing the right things (security wise). As part of this effort, we are constantly scanning PyPI after malicious packages to prevent such supply chain attack risks.

 

The next level of attacks are here

Researchers are seeing a new type of organized attacks. Threat actors have progressed from the ‘mimic a common package and slightly hide your malicious code’ technique. They are creating organized campaigns that directly target certain types of users. Moving the infection phase from the highly watched PyPI platform to a more crowded domain, such as GitHub, makes detecting malicious packages more difficult. These type of attacks seem to target users working from home, likely individuals who use their corporate machines for side projects. Threat actors and hackers are evolving, always searching for new domains and techniques to act upon. New malicious attacks are out there, and users must stay alert and aware.