Invisible C2″Š”, “Šthanks to AI-powered techniques

Just about every cyberattack needs a Command and Control (C2) channel”Š”, “Ša way for attackers to send instructions to compromised systems and receive stolen data. This gives us all a chance to see attacks that are putting us at risk.

LLMs can help attackers avoid signature based detection

Traditionally, C2 traffic might be disguised as normal web traffic, DNS queries, or go through known platforms like Slack or Telegram. Now, LLMs are helping attackers to even better hide C2 communications in plain sight and even to automate the decision-making on the attacker’s behalf, reducing or eliminating the need for C2. In this post, I explore how AI models can be used to obfuscate C2, making detection extremely challenging. Building off the last post about polymorphic attacks, I look at real examples, why updated approaches escape the detection of network security, and conclude by briefly examining how adaptive models might help. Once again, this is not intended to be a DeepTempo pitch. Rather, I am examining advanced and emerging attack patterns in some depth. It is only by understanding today’s rapidly innovating adversaries that we can effectively respond.

The Emergence of AI-Covert Channels

Using well-known legitimate services for C2 is not new”Š”, “Š”domain fronting” and “living off the land” have been around for a while. What LLMs add is flexibility and improved believability: Using AI APIs as C2: As detailed in my last blog, some malware authors have realized they can piggyback on AI APIs, including OpenAI to fetch instructions or code. To a defender, traffic to api.openai.com doesn’t raise an eyebrow, whereas traffic to an unknown IP would. The malware can ask the AI for instructions in natural language, effectively turning the AI into an unwitting proxy for the attacker. Natural Language Steganography: LLMs can generate human-like text that includes hidden instructions. For example, an attacker could prompt an LLM to produce an innocuous-looking email or document containing encoded commands, such as the first letter of each sentence or another pattern that the malware knows to parse. To a human or a standard scanner, the text looks legitimate (even meaningful). Only the malware knows that, say, “The weather is lovely today in Paris. News reports show markets are up.” actually encodes a command like “DOWNLOAD UPDATES”. This is textual steganography with the help of AI fluency. LLM-driven Autonomy: In an even more advanced twist, threat actors are experimenting with malware that delegates decision-making to an LLM. Instead of a human operator manually typing commands, the malware can consult an AI model for what to do next. A fascinating Proof-of-Concept by researchers at Deep Instinct showed an LLM acting as the brains of the operation”Š”, “Šthe malware would send the LLM a description of the situation, and the LLM would respond with an action to execute, effectively becoming an AI controller. This not only obfuscates C2, since the “controller” is an AI chat interface, but also speeds up attacks since there is no need to wait for human input. These approaches result in C2 communications that are highly atypical for traditional detection: they might look like benign chatbot conversations or random text data, with no fixed signatures.

A useful pattern: BlackMamba’s ChatGPT C2

We introduced BlackMamba earlier as polymorphic malware; it’s also a poster child for AI-obfuscated C2. Here are a few ways that BlackMamba’s command and control works”Š”, “Šand we are seeing similar approaches in the wild: Dynamic Payload Delivery: When BlackMamba wants to log keystrokes, it doesn’t carry a keylogger binary. Instead, it sends a query to ChatGPT’s API asking for a snippet of code to capture keystrokes (the exact query was likely crafted by the malware author). ChatGPT responds with a fresh code snippet, likely in Python or PowerShell, to perform keylogging.. BlackMamba then executes that snippet. The next time, for data exfiltration, it might ask ChatGPT for code to send data over HTTP. Each of these interactions is a command exchange, but it appears as normal Q&A with an AI. The hardest C2 to identify is no C2: in this case, there is no attacker IP or server domain to trace. The attacker interacts with ChatGPT or a jail broken equivalent initially, for instance, by setting up the prompt engineering such that the malware’s queries trigger certain responses. Essentially, the AI service acts as a broker between the attacker and malware. Relay servers: The idea of using a legitimate API such as ChatGPT could be applied, of course, to the use of relay servers. We’ve seen that technique in the wild recently. A hacker could broaden their attacks to develop a series of relay servers that are legitimate home servers or IoT controllers and then use those. This also eludes rules and behavioral-based indicators. Another conceptual example: what if an attacker uses something like Twitter in combination with ChatGPT for C2? For instance, malware posts from your environment a tweet that looks like a random quote; at the same time, you have an LLM monitoring a Twitter feed and whenever it sees those patterns, it responds with a direct message containing commands encoded in a story. This is the sort of creativity unlocked by mixing AI with existing platforms. Similar approaches have already been undertaken with the use of stenography to hide messages and patterns in attacks.

Why LLM enabled C2 Confounds Traditional Defenses

Obfuscated C2 via LLMs creates a perfect storm for traditional network and host-based defenses: Encryption and Trusted Domains: Most AI API traffic is HTTPS, so security tools don’t see the payload. Instead, many tools rely on domain reputation and they see OpenAI or Azure or a similar public service as benign. Lack of Signatures: No fixed indicator like “the malware connects to 192.168.1.100 on port 4444.” The AI domain could be one of many. The content is dynamic natural language or code and lacks repetitive byte patterns. Network anomaly detection might pick up a machine suddenly talking to an API it never did before”Š”, “Šbut in an age where many apps call out to cloud APIs, this typically wouldn’t raise immediate flags, unless that network anomaly detection was able to look far enough back in time to see additional odd behaviors by that entity or by some class of entities. As machine learning systems today rarely look back beyond a handful of events, they are very unlikely to trigger such an alert. IDS/IPS Evasion: Traditional intrusion detection signatures might look for known C2 protocols such as specific HTTP patterns or plaintext keywords in traffic. Today’s AI-adapted C2 does not match these patterns. For example, an IDS may have a rule “alert on powershell keyword over HTTP”; however, this powershell keyword might appear only inside an encrypted AI response. Also”Š”, “Šperhaps even simpler”Š”, “Šthe commands could be phrased differently each time, thus throwing off the pattern-matching system. Traffic Volume: While traditional rules and pattern matching ML often look for unexpected increases in traffic, more intelligent AI-enabled C2 can have lower traffic. Since more intelligent malware only calls the AI when it needs something, it might be sporadic and low-bandwidth. This “low and slow” approach is hard to differentiate from normal user behavior. Endpoint Blind Spots: It feels like some threat hunters prefer to start at the network and work out to the end point and some move in the opposite direction. So, what could an EDR see? A process, such as a script or an Office macro, makes outbound HTTPS calls to an allowed service and then spawns some system commands. Some EDRs might flag the system commands as suspicious, for example, flagging on “why is Word spawning cmd.exe?”. But if the malware injects into another process or uses a common process for execution, such as running commands through a trusted service binary, it gets harder to discern. Without understanding the longer context that an AI is instructing these actions, the EDR might just see a series of legitimate operations. No Operator Signatures: In many intrusions, human C2 has timing patterns or mistakes such as commands typed with certain typos, or working hours driving patterns. AI-driven C2 might be more consistent and continuous or unpredictable in timing. This removes some indicators that threat hunters look for, such as clusters of activity that suggest a human operator in a certain timezone. Another angle: If an organization tries to block misuse of AI services, they face a dilemma. Blocking them outright might not be feasible if users rely on them for work. And how do you differentiate malicious vs legitimate usage? It gets very tricky unless you have content inspection or advanced AI to analyze the conversations. And to analyze the conversations you have to have the ability to see that conversation, over time, and potentially across contexts.

Defensive Outlook: Fighting AI with AI (again)

Countering AI-obfuscated C2 likely requires behavioral and AI-assisted detection”Š”, “Šand more specifically deep learning: Network Anomaly Detection: tools could baseline what kind of external services each host typically talks to. If a desktop that never used AI APIs starts doing so, that’s worth an investigation, especially if combined with other context. Obviously existing anomaly detection systems have a reputation for enormous false positive rates. Endpoint Analysis of AI API Usage: On endpoints, enterprises might instrument or alert on specific library calls”Š”, “Šfor example, if a process loads an OpenAI SDK or if a PowerShell script starts doing web requests to known AI hostnames. These could be high-fidelity signals of something unusual since few business apps would dynamically query an AI from a macro or script. That said, each permutation will require another signature-based response. AI-enabled SOC”Š”, “Ša little over hyped? A Security Operations Center could deploy its own LLMs and AI SOC tools to assist analysts. An analyst might ask an AI assistant: “I see process Y making this series of web requests and then executing commands, does this pattern match any known AI-based attack?” The AI might correlate it with known cases like BlackMamba or others from threat reports, accelerating threat hunting. That said, the innocuous phrase “does this pattern match” is a little trickier to make real than it appears. Will your LLM seek to match on IPs? How exactly will this matching occur? At DeepTempo, we do exactly this sort of matching, however to do so we first embed the logs in a “hyperspace” and compare across 768 dimensions and over 3k events. Users can then simply ask”Š”, “Šgive me the top 50 sequences like this sequence and the system will look for those sequences that look most similar. There is no querying involved”Š”, “Šand crafting such a query in a traditional SIEM would be nearly impossible in any case. The broader defensive strategy will likely involve adaptive policy. If AI-based C2 becomes rampant, organizations might restrict servers from accessing external AI services or require the use of internal approved AI with logging. Zero-trust principles should also be applied: just because a call is to OpenAI doesn’t mean it’s safe”Š”, “Šwe can enforce that only certain apps/users can make such calls. And if we get as creative in imagining defenses to these sorts of imaginative attacks as the attackers are”Š”, “Šthen what about a deep learning solution that learns from all of the behaviors across all of the environments? I am reminded by Bentham’s theory of the panopticon. We are far from that utopia / dystopia. Today defenders attempt to share our understanding of what to look out for via rules and human-engineered ML systems. What if we also added a layer of deep learning that learned from all of us what is legitimate and hence could quickly discern what is problematic?

What to do now?

To counter obfuscated AI C2, security teams should update their threat models. Maybe more importantly, you need to start a process now to wean yourself from a sole reliance on rules-based indicators. Experts from Darpa and others have been warning that using human written rules leaves us vulnerable to more advanced attacks. And now surveys indicate that security professionals are increasingly concerned about AI-enabled attacks”Š”, “Šfor example, one study indicates that 74% believe they are already under attack, and a similar percentage admit they are likely unprepared. gca.isa.org/blog/most-cybersecurity-teams-are-unprepared-for-ai-cyberattacks As one of our larger users put it, we need to counter AI with deep learning and collective defense. And of course we agree! In the next blog I plan to dive further into the ways AI is being used to obscure attacks and evade legacy approaches. Your feedback and suggestions are deeply appreciated. Thank you for reading.

Invisible C2″Š”, “Šthanks to AI-powered techniques was originally published in DeepTempo on Medium, where people are continuing the conversation by highlighting and responding to this story.

First seen on securityboulevard.com

Jump to article: securityboulevard.com/2025/03/invisible-c2-thanks-to-ai-powered-techniques/

Invisible C2″Š”, “Šthanks to AI-powered techniques