Beating XLoader at Speed: Generative AI as a Force Multiplier for Reverse Engineering

November 3, 2025

Research by: Alexey Bukhteyev

Key takeaways

XLoader remains one of the most challenging malware families to analyze. Its code decrypts only at runtime and is protected by multiple layers of encryption, each locked with a different key hidden somewhere else in the binary. Even sandboxes are no help: evasions block malicious branches, and the real C2 (command and control) domains are buried among dozens of fakes. With new versions released faster than researchers can investigate, analysis is almost always a (losing) race against time.
Generative AI flips the balance. Instead of spending days on painstaking manual analysis and writing decryption routines and reverse-engineering scripts by hand, researchers can now use AI to examine complex functions, identify algorithms, and generate working tools in just hours, accelerating the path to decrypted code, strings, and IoCs.
Check Point Research (CPR) demonstrated a new way to use ChatGPT for malware analysis directly from the web interface. By exporting IDA data and analyzing it in the ChatGPT cloud, we showed that deep static reverse engineering with AI is possible without relying on Model Context Protocol (MCP) or a live disassembler session. This approach not only removes the dependency on local heavy tooling, but also makes the results reproducible, easier to share, and more collaborative across research teams.
In this research we use a fresh XLoader 8.0 sample to demonstrate how cloud-based static analysis with ChatGPT can be combined with MCP for runtime key extraction and live debugging validation. We documented the time spent on key tasks and included real prompt examples, showing how the workflow progresses from unpacking a fully encrypted binary to recovering hidden C2 domains.

Introduction

XLoader is a widely observed malicious loader with information-stealing capabilities. It first surfaced in 2020 as a rebrand of the FormBook code base, a well-known and capable information stealer, and has since undergone substantial hardening and feature growth. In addition to the Windows variant, its developers also marketed a macOS build, though it appears far less prevalent in the wild.

XLoader is a prime example of malware that is extremely difficult to analyze. It combines several layers of protection: customized encryption with additional mixing steps, encrypted blocks disguised as valid but meaningless assembly code, obfuscated API calls, injections into system processes, and a wide set of sandbox evasion techniques. In addition, XLoader encrypts its network traffic, and hides real C2 addresses among dozens of decoys and fake domains.

An important feature of XLoader is its ongoing development. The authors release new versions regularly, changing internal mechanisms beyond recognition and adding new anti-analysis methods. As a result, previous research quickly becomes outdated. In earlier versions, extracting the configuration required pulling out a few keys using intricate algorithms. At the same time, obtaining the decrypted data only required peeling off two layers of obfuscation and encryption. Version 5 introduced a built-in packer, and in versions 6 and 7 analysts had to work through dozens of chained functions that decrypt each other, extracting intermediate keys at every stage. For someone new to XLoader, the entry barrier has become very high: on top of the analysis itself, extra time is needed for onboarding. By the time one research cycle is completed, the next iteration of the malware may already be out – and if there are significant changes, another time-consuming investigation is required.

When we began this research, XLoader version 8.0 had just been discovered. It seemed the XLoader developers were winning the race. But with the rise of generative models, we asked ourselves: can AI change the rules of the game and help us analyze such complex malware more quickly? To explore this, we applied generative AI assistance in two ways: by directly integrating with our analysis tools through a live MCP connection, and by leveraging ChatGPT’s project and file-upload capabilities to work from exported data. Each approach turned out to have distinct benefits, and together they allow us to solve reverse engineering tasks more effectively.

In this study, we focus on the second approach and show how ChatGPT without MCP can be effectively used for reverse engineering tasks, using one of the latest XLoader samples as an example.

Motivation

To defend against XLoader, it is critical to extract up-to-date Indicators of Compromise (IoCs) from each new version — real C2 domains and URLs, cryptographic keys, and version identifiers. These IoCs feed into detection signatures and help track active campaigns. The primary way to obtain IoCs is by extracting and decrypting the malware’s configuration data from samples.

The challenge is that XLoader’s constantly shifting tactics break automated extraction tools and scripts almost as soon as they’re developed. The malware authors frequently tweak encryption schemes and packing methods specifically to thwart these efforts. An automated config extractor that worked yesterday might fail today, meaning each major version demands a fresh reverse-engineering cycle.

Sandboxes offer little relief:

Aggressive evasion: XLoader checks for signs of virtual machines and analysis tools. If it detects them, the malicious branch may never run at all.
Just-in-time decryption: Critical functions and data remain encrypted in memory until just before use and revert to encrypted form shortly afterward. A sandbox might never capture them in their decrypted state. Even if you manage to dump the process memory in a sandbox, you often end up with an incoherent snapshot: pieces of encrypted and decrypted data jumbled together, missing whatever wasn’t in memory at that exact time.
Useless memory dumps: When trying to get a memory dump at the exact moment of calling certain API functions, such as NtResumeThread or NtAllocateVirtualMemory, you only get almost completely encrypted code.
C2 camouflage: The malware’s real command-and-control domains are obscured among many decoys. Captured network traffic is usually incomplete and contains a lot of noise.

In short, a sandbox does not solve the problem. It does not provide a reproducible dump or a complete set of IoCs.

The most reliable method is still static analysis: unpack everything, function by function, decrypt the config, and extract the IoCs. The downside is that doing this manually for each new version is slow and painstaking. This is where we hoped generative AI could act as a force multiplier.

Two approaches to AI-assisted analysis

In recent months, many reverse engineers began integrating LLMs with IDA Pro via the Model Context Protocol (MCP) to create an AI-assisted workflow. This agentic approach allows a model to interface directly with the disassembler and debugger, but it has its own practical challenges. For example, some MCP client setups lack certain ChatGPT interface features (like Projects or file uploads), and they still rely on maintaining a live IDA session and stable connection.

We explored two complementary workflows to apply GPT-5 to unraveling XLoader:

Live MCP integration: Using MCP, we gave LLM direct access to our analysis tools (IDA Pro, x64dbg, and a VMware instance). This allowed the AI to query the disassembler, inspect memory, and even control the debugger in real time.
“Offline” data pipeline with ChatGPT: We exported the IDA Pro database (disassembly, decompilations, strings, etc.) and the malware binary to the ChatGPT environment. We then asked ChatGPT to perform static analysis on this data and generate, refine, and execute its own Python scripts against the binary, all within its cloud sandbox. No live connection to our tools was needed.

Each approach has its own strengths. MCP offers an agentic, interactive workflow, whereas the offline pipeline provides a self-contained analysis that’s easy to share and reproduce. These approaches aren’t mutually exclusive — you can use both, picking the appropriate tool for each task.

MCP: Agentic analysis with live tools

The idea of hooking an LLM into IDA isn’t new. For example, researchers at Cisco Talos demonstrated an IDA integration with an LLM acting as a “reverse engineering sidekick”. In our setup, we used MCP to bridge ChatGPT with IDA Pro and also interface with the x64dbg debugger and a VMware virtual machine. This gave LLM a live window into the malware’s execution.

Figure 1 – Integration of an LLM with the reverse engineering environment through MCP.

This live integration, in addition to static analysis and annotating IDA database, enabled the AI to perform these actions:

Pull live data on demand. Set a breakpoint at a critical function to grab a decrypted buffer or a cryptographic key from memory at runtime.
Perform “experiment and observe” cycles. Hypothesize about what a function does, then substitute real runtime data from the debugger and compare the output, adjusting the analysis based on the results.
Assist with unpacking in real time. If the sample self-decrypts or unpacks code, work through those routines, dumping out intermediate values or decrypted code as soon as they appear.

However, the MCP approach isn’t without drawbacks:

Setup and resource requirements: This requires a running instance of IDA Pro and other tools. The analyst’s machine effectively becomes part of the loop, and that environment must remain up and stable.
Single-task focus: Standard IDA doesn’t support multiple independent analyses in one interface. If we want to work on two samples in parallel with AI assistance, we need two separate IDA sessions and MCP connections.
Network dependency: The workflow hinges on a reliable internet connection. A drop in connectivity or an MCP glitch can disrupt the analysis mid-stream.
Limited ChatGPT UI features: When using custom MCP clients with API-based access, we can’t utilize some of the conveniences of ChatGPT’s own interface, such as long-term project history or easy file management.

For many scenarios, these issues are manageable and the benefits of live interaction outweigh the hassles. Some solutions, such as the MCP SuperAssistant browser extension, reduce friction by bringing the ChatGPT interface and MCP connectivity together. Recently, ChatGPT introduced a Developer Mode that can use MCP directly, without third-party plugins. Regardless of whether you use a plugin or the built-in mode, the workflow still depends on a live MCP session tied to a running toolchain and stable connection.

If any of the requirements listed above are difficult to fulfill, for example, you can’t keep IDA running constantly, or you need to easily share analysis progress with a colleague who doesn’t have the same setup, then a different approach might be preferable. That’s why we developed the “offline” data pipeline as an alternative.

Offline IDA export pipeline: reverse engineering with AI in the cloud

Our second approach ditches the live connection entirely. Here AI acts as a self-reliant analyst working from a full static snapshot of the sample.

The workflow is straightforward: we exported everything we could from our IDA Pro database into a structured format (JSON and text). This includes the disassembly and decompiler output of every function, the list of cross-references, the readable strings, and even the original binary itself. We uploaded the .zip file to ChatGPT.

For example, our export bundle included these files:

ida_export.zip             
├── meta.json           # basic info (sample name, hashes, image base, etc.)
├── index.json          # lookup tables mapping names/EAs to function indices
├── functions.jsonl     # NDJSON: disassembly, xrefs, bytes, prototypes, etc
├── strings.jsonl       # list of strings in the binary and their references
├── data.jsonl          # globals, arrays, named data references
├── decomp/             # decompiled pseudocode for functions (if available)
│   ├── func_or_sub_XXXXXXXX.c
│   ├── func_or_sub_YYYYYYYY.c
│   └── ...
└── sample.bin          # the malware sample itself

In practice, it is better to upload the archive to a ChatGPT project. Files attached only in the chat can disappear after a session restart, while files in a project stay available for the whole engagement, and can be reused in different chats.

We also wrote an initial prompt explaining how the data is organized and how the AI should format its outputs (for example, proposing new function names and comments in a machine-readable JSON that we could import back into IDA). Essentially, we taught the AI how to read the phonebook we gave it, and how we wanted its notes recorded.

Below is an approximation of our prompt:

You are my reverse-engineering copilot.

I will upload a ZIP produced by an IDA Pro 9 exporter. It contains:
- meta.json
- index.json
```json
  {
    "by_name": { "<funcName>": "0x40XXXXXX", ... },
    "by_ea":   { "0x40XXXXXX": <line_index_in_functions_jsonl>, ... }
  }
```
- functions.jsonl (NDJSON; one function per line, with mnemonics/operands already plain text)
```json
  {
    "ea": "0x40XXXXXX",
    "name": "func_or_sub_XXXXXXXX",
    "prototype": "int __cdecl ...",                // if available
    "ranges": [["0xstart","0xend"]],               // function address range(s)
    "xrefs_in": ["0x...","0x..."],                 // callers (function start)
    "xrefs_out": [{"ea":"0x...","name":"..."},...],// callees (from call sites)
    "comments": [{"ea":"0x...","kind":"...","text":"..."}],
    "bb": [{"start":"0x...","end":"0x...","succ":["0x..."]},...], // basic blocks
    "insn": [
      {"ea":"0x...","bytes":"8BEC","mnem":"mov","opstr":"...","size":2,"cmt":null},
      ...
    ],
    "bytes_concat": "....",                        // all function bytes hex, no spaces
    "decomp_path": "decomp/<name>_<EA>.c",         // if Hex-Rays available
  }
```
- decomp/*.c                                         // optional

// Optional:
- strings.jsonl                                      // readable strings with code xrefs
- data.jsonl                                         // named globals/arrays
- data_index.json
```json
  {
    "by_name": { "g_DomainKeys": "0x40YYYYYY", "var_X": "0x40ZZZZZZ", ... }
  }
```
- sample.bad   // malware sample binary


## On upload (INIT)
1) Parse meta.json & index.json.
2) Stream functions.jsonl just enough to build fast lookups by EA and by
 name, and to count functions; do NOT eagerly load all decomp/*.c.
3) Reply with an INIT REPORT:
   - file_name, imagebase, hashes (MD5/SHA256/CRC32), compiler (if present)
   - total function count and number with decomp_path
   - confirm you’ll use Canvas artifacts for tracking changes (see below)

...

## Live suggestions (function-level, stored on Canvas)
Keep human-readable **suggestions.json** (full JSON, not JSONL) with only proposed renames/comments (no auto-apply).

Schema:
```json
{
  "meta": { "file_name":"<from meta.json>", "imagebase":"0xXXXXXXXX", "input_sha256":"<SHA256>" },
  "changes": [
    {
      "ea": "0xXXXXXXXX",               // function start EA (required)
      "name": "sub_XXXXXXXX",           // current name (optional precondition)
      "new_name": "ai_better_name",     // MUST start with "ai_"
      "comments": [                     // only new/changed comments (optional)
        { "kind":"func"|"func_rep"|"anterior"|"repeatable",
          "text":"...", "ea":"0xYYYYYYYY", "mode":"set"|"append" }
      ]
    }
  ]
}
```
...

After it was set up, this pipeline allowed ChatGPT to perform deep static analysis entirely within its own environment. We could ask it to find cryptographic algorithms, trace complex control flows, or even write and execute a Python script to decrypt some data from sample.bin. Many such tasks can be done without any new information from us – the AI works off the data we provided, verifying its logic by running Python scripts as needed. If there is an error, it fixes the script and reruns the tests, repeating this until the result converges. Compared to our previous approach, all these steps (analysis, code, test, correction) run in a single loop without dozens of local MCP calls. Naturally, this works well when using GPT-5 in the “thinking” mode.

This approach had several clear benefits:

No persistent local session needed: If our IDA crashed or we closed our laptop, it didn’t matter as ChatGPT could access everything it needed from the cloud. We didn’t need to babysit a live connection.
Easily repeatable and shareable: Because the entire state of the analysis was captured in our export, anyone with the archive and the prompt could reproduce the analysis. We can even run multiple ChatGPT sessions on the same data in parallel (to explore different questions or use different prompt strategies) without interference.
Better use of ChatGPT’s features: Inside the ChatGPT interface, we can take advantage of file uploads, persistent chat history, and the editable canvas.
Collaboration-friendly: Several researchers can work on the same sample independently and later merge their results via sharing suggestions.json file, without the need to diff IDBs.
Safe script execution: ChatGPT can prototype and test analysis scripts directly on the “live” sample in a secure cloud environment and deliver a working output to the analyst.
Extremely fast onboarding: No need to set up an MCP server or configure complex integrations. The exported data and results can even be shared with colleagues who don’t have IDA installed and cannot open your IDB.
Broad applicability: The concept isn’t tied to x86 binaries or IDA. It can be adapted to virtually any platform or technology stack. For example, GoLang, .NET, or JavaScript samples can all be analyzed in the same way. The main challenge is properly preparing the data and providing a tailored prompt that explains how to work with it. The analysis process remains the same.

That said, the offline approach isn’t a universal magic wand. There were cases where we still needed to resort to actual debugging (and therefore MCP), for example, to confirm a guessed key or to dump something that our static analysis missed. In addition, while analyzing other malware families, we encountered situations where continuous work in IDA was required, involving constant modifications to the live database. Previously, we would have needed to export the database after every iteration of changes. In this instance, the MCP-based approach turned out to be a better i.e. more convenient alternative.

What went wrong and how we fixed it

Unsurprisingly, using an AI with an offline IDA export wasn’t without hiccups. We encountered a few issues with AI’s performance and solved them by adding strict rules to the prompt.

Guessing missing values.

Sometimes the model tried to invent missing data, for example, encryption keys that were computed dynamically at runtime. To prevent such “hallucinations”, we enforced an evidence-first rule: every numeric value and every algorithm must be backed by a quote from the export (functions.jsonl, decomp/*.c, or data.jsonl) with the exact EA address. If the data is not there, the model must produce a not-found report that explains where it looked and why nothing was found.
## Provenance & no-fabrication
- Any *specific* numeric/structural claim (modulus, key length, magic multipliers like 0x66666667, loop bounds) MUST be backed by direct 
evidence from the uploaded data:
  - Quote the exact line(s) from `functions.jsonl` (insn/mnem/opstr/bytes) or from `decomp/*.c`, and cite EA(s).
  - If the claim is not literally present, mark it **UNPROVEN** and offer a concrete verification plan.
- If you revise a claim, explicitly state what changed and show the new 
evidence quote. No silent edits.

Shaping output to match expectations.

For example, a string-decryption routine was expected to return printable text, but due to a mistake in extracting the key, the output was corrupted. To make the output “look right,” the model applied Base64. We banned any cosmetic transformations (such as Base64) used just to make results look valid. Instead, the model must find the actual error in the keys or in the algorithm and rerun the tests until the output is correct.
Verification contract
   * Define acceptance criteria from the task (properties/invariants).
   * Run self-checks (lengths, wrap-around, bounds monotonicity, step counts, round-trip where applicable).
   * Do NOT transform outputs to “look right”; if a check fails, proceed to the recovery loop.

Asking the user for data that is already in the archive.

Early on, the model sometimes requested data we had already provided. We fixed this with a local-first rule: search in the archive files first. It should produce a not-found report only if the data is truly missing.
## Local-first data usage
- Treat the uploaded dataset as the primary source of truth.
- Before requesting any bytes/strings/keys from the user, attempt to obtain them from the uploaded files
- Never ask the user for blobs that are present in data.jsonl/strings.jsonl or are trivially recoverable from functions.jsonl.
- Only if a needed EA/function is absent from the snapshot, say so and propose next steps (e.g., MCP call).

With these precautions in place, our AI “assistant” became a reliable analyst for the static portions of the work. In the next sections, we show how it performed on the real challenges within XLoader 8.0, such as decrypting the payload and API resolution and working with occasional MCP-powered dynamic checks.

GPT-5 in Practice: Analyzing XLoader’s Built-in Crypter

When working with older ChatGTP models such as o3, getting the right result required splitting the task into many small steps and explicitly telling the model what to do, down to pointing out exact code addresses and the algorithms to apply. Without this level of detail, the output was unpredictable. This approach was closer to “text-based programming” and required deep engagement on our side.

With GPT-5, however, we can pose broader and more abstract tasks. Below we show an example of XLoader’s built-in crypter analysis with a mixed approach: using the IDA export as the main data source, and MCP+x64dbg for result verification.

For this task we took a recently discovered XLoader sample with SHA256: 77db3fdccda60b00dd6610656f7fc001948cdcf410efe8d571df91dd84ae53e1. For the entire process we used GPT-5 in the “Thinking” mode.

After we gave the AI-assistant the instructions for processing the data, we received a short report:

Figure 2 – IDA export initial report.

Next, we deliberately formulated the tasks for the AI assistant as if we knew nothing at all about the sample under analysis, assuming this would reflect the actions of someone unfamiliar with XLoader.

The first prompt was written in the most abstract way possible:

Perform an initial analysis of the sample starting from the entry point and provide a short report.

Processing this simple prompt took 8 minutes and 46 seconds. As a result, our assistant correctly identified the RC4 implementation and concluded that the sample was packed. It is worth noting that, based only on the data available to the model, it suggested that the sample looked similar to XLoader. At the same time, there was nothing in the archive or in the initial prompt that explicitly pointed to this.

Figure 3 – Initial analysis report: Entry point analysis.

In addition, it detected API call obfuscation. While the assistant was not fully able to deobfuscate all API calls during the quick triage, in some cases it inferred the function being called from the context and its signature.

Figure 4 – Initial analysis report: Presumed call to the VirtualProtectEx function.

It also successfully identified the point where execution is handed over to the decrypted code.

Figure 5 – Initial analysis report: Call to the original entry point in the decrypted code.

At this stage, our priority was to reach the payload as quickly as possible. We therefore focused on this goal by first asking the assistant to find all cryptographic function calls, and then to analyze how exactly the payload was decrypted.

We found out that the main payload block goes through two rounds of RC4: first, an RC4 decryption of the entire buffer, and then a second pass in 256-byte chunks using a different key.

Figure 6 – Initial analysis report: Description of the two rounds of RC4 encryption.

In addition, the assistant managed to collect:

The virtual address of the original entry point in the decrypted code (0x00430CB3).
The offset of the encrypted block in the binary (0x3143).
The encrypted blob size (0x44A00).

The next step was to obtain the real-time keys and verify the result. At this point, we turned to MCP. In one of the steps, we also asked the assistant to read a section of decrypted data to validate the correctness of the static decryption.

As a result, it obtained the following keys:

Stage-1 Key	20EBC3439E2A201E6FC943EE95DACC6250A8A647
Stage-2 Key	86908CFE6813CB2E532949B6F4D7C6E6B00362EE

It also obtained a section of decrypted code:

Figure 7 – AI-controlled debugging in x86dbg.

After the final keys were read from memory, we asked the assistant to identify where and by which algorithms the keys were generated, and to verify the analysis was correct using the real-time data obtained in the previous step.

Find how the Stage-1 and Stage-2 RC4 keys are calculated: source key material and algorithms. Please note that ALL the required data is available to you in IDA export. 
Check your assumptions using the captured realtime values. Start with the Stage-1 key.

In the end, AI produced a working script that unpacked the analyzed sample. Unfortunately, the script was not universally applicable, as the patterns it used to locate the keys were tightly bound to this particular sample. As a result, it failed when we tried to apply it to samples from other versions, requiring further manual fine-tuning.

Excluding the final step of creating a generic unpacker, the entire analysis took about 40 minutes and required 39 MCP calls. The table below lists the prompts we used and the time spent on each analysis step.

Prompt	Time consumed	Number of MCP calls
Initial prompt (instructions).	37s	0
Perform an initial analysis of the sample starting from the entry point and provide a short report.	8m 46s	0
Find and carefully inspect all calls to the found cryptographic functions.	4m 13s	0
Analyze main payload blob decryption. Make sure you inspected all touches of the encrypted blob across different functions.	4m 13s	0
Set breakpoints on required addresses and capture the required data. Also, set a breakpoint before calling the OEP and capture a small decrypted block at OEP for using it later for the decryption verification. Before start, please provide a plan.	9m 7s	39
Find how the Stage-1 and Stage-2 RC4 keys are calculated: source key material and algorithms. Please note that ALL the required data is available to you in IDA export. Check your assumptions using the captured realtime values. Start with the Stage-1 key.	6m 56s	0
Move to Stage-2 key derivation.	4m 40s	0
Implement an offline reproducer for Stage-1 and Stage-2 keys. Then implement a complete static decryptor that works directly with sample.bin: extracts the encrypted payload from the binary and decrypts it. Verify it works correctly using the captured data (OEP bytes).	1m 13s	0
Total	39m 8s

Analysis of the unpacked sample

After manually creating a function at address 0x00430CB3 (original entry point: OEP) which we named oep_start, we opened the unpacked sample in IDA and applied the export script again. We also created a new project to analyze the unpacked sample.

Even before starting a deep dive, it was clear that IDA failed to recognize a large portion of the code, and many of the identified functions did not look valid.

Figure 8 – Unpacked XLoader sample in IDA.

This may indicate that the code is obfuscated in some way, or that the functions are encrypted. In fact, we know that XLoader uses on-the-fly function decryption, as we mentioned in the introduction. For some functions, multi-layer encryption is applied.

For the sake of the experiment, we wanted to see if the AI assistant could determine all this on its own, without our guidance. We started the analysis with the same type of prompt we used when analyzing the packed sample.

Perform an initial analysis of the sample starting from the `oep_start` (`0x00430CB3`) and provide a short report.

After initial analysis, we identified:

Use of obfuscated API calls.
Use of RC4 encryption with additional modifications before and after using RC4.
Some stage-1 builder (a decryptor of the encrypted function).

Figure 9 – Triage report and the “Stage-1 builder” function (a function decryptor stub).

Function decryption scheme I

Next, we asked the assistant to focus on the logic around the so-called stage-1 builder (sub_429143) and to locate cross-references of the functions involved. The AI assistant identified 90 similar functions. These functions derive 6-byte head and tail markers, use those markers to locate a target region in memory, overwrite the markers with six NOP instructions (90 90 90 90 90 90), transform the region, and then transfer execution to a hardcoded address unique to each wrapper.

Figure 10 – Results of the function encryption scheme analysis, together with the renamed functions from the analysis.

The assistant also implemented inline Python snippets and decrypted one of the encrypted functions, providing us with all the data and the keys, as well as the part of the decrypted code:

Figure 11 – Report on the successful decryption of one of the functions (0x00418DB3).

Interestingly, in this case, the use of MCP wasn’t even necessary, as the validity of the extracted keys can be easily verified by AI: if it’s possible to locate the start and end markers of the code after decryption, it means the keys and the algorithm were recovered correctly. Additionally, we can see that the decrypted data doesn’t appear to be random (it contains sequences like 00 00 and ff ff), which suggests the function was indeed decrypted correctly.

The AI performed very well in reimplementing the algorithms, including the modified RC4 with additional tweaks, as well as in locating the keys within the provided sample. It also successfully implemented functions for detecting 6-byte markers.

However, it was unable to fully implement a universal script capable of decrypting all functions without human assistance. The issue arose in locating all the XOR modifiers required to construct the 20-byte effective RC4 key.

The challenge lies in the fact that the effective RC4 key is derived by XOR-ing its 4-byte components with a 4-byte modifier, which is unique for each encrypted function and is calculated this way:

seed_external ^ seed_internal ^ 0x6CFC3E60

<strong>Figure 12</strong> - Calculation of the 4-byte XOR modifier
for the key required to decrypt the functions. — **Figure 12** – Calculation of the 4-byte XOR modifier for the key required to decrypt the functions.

While seed_internal is always located within the wrapper function near the markers, the assistant was unable to implement a universal method for finding seed_external (see Figure 13 below), as it could be placed in various locations within the calling function and might be deliberately mixed with other constants.

<strong>Figure 13</strong> - Report on the limitations encountered
while creating the function decryption script. — **Figure 13** – Report on the limitations encountered while creating the function decryption script.

We had to manually modify the script to ensure it could correctly locate all external seeds. Additionally, we modified the rules for locating the remaining constants to make the script truly robust and capable of working with other samples as well. Therefore, the AI significantly reduced the time required for analysis and script development, but at this stage, it could not fully replace a human.

It is clear that the creators of XLoader deliberately complicated the key construction process by scattering crucial constants across multiple functions, to the extent that even AI was unable to develop an algorithm to locate it. We are not disclosing how we derive the keys, so as not to give the XLoader developers any advantage.

Finally, after applying the script, we obtained 51 functions decrypted in the first pass. Many of the decrypted functions also contained similar calls to encrypted functions. Applying the script three times in a row, we got a total of 77 decrypted functions out of the 90 initially found.

Figure 14 – Decrypted function example with a patched 6-byte head marker (six NOP instructions).

After loading the resulting sample into IDA, we can see that a significant number of code blocks are still unrecognized:

Figure 15 – Significant number of code blocks remain unrecognized by IDA.

During a quick review, we also identified several functions that still remain encrypted:

Figure 16 – Functions that remained encrypted after applying the decryption script.

It’s also worth noting that in the scheme described above, encrypted functions are located using 6-byte sequences, which are replaced with six NOP instructions after decryption. This implies that the function must have a valid, unencrypted prologue. At the same time, in Figure 16, on the right we can see an encrypted function that lacks a valid prologue. This likely indicates that a different decryption method was used.

Function decryption scheme II

We recreated exporting the database and loaded it into ChatGPT. We initiated the analysis with the following prompt and uploaded the decryption log of the 77 functions:

In the analyzed scheme, encrypted functions are located using 6-byte sequences, which are replaced with six NOP instructions after decryption.
This implies that the function must have a valid, unencrypted prologue. Some of the encrypted functions do not have a valid prologue (e.g., `sub_407293`, `sub_411053`, `sub_415343`). 
This likely indicates that a different method is used for it. Try to find it.

It’s worth noting that we used a little trickery by pointing out the absence of a valid prologue in the prompt. Without this observation, the AI assistant was unable to identify the additional decryptors.

Figure 17 – Second decryption/patching scheme discovered.

Instead of 6-byte tags, the newly discovered scheme computes 4-byte head and tail markers using XOR. For example:

Figure 18 – Calculation of the head marker using XOR.

The head marker anchors one byte before the real entry. After decryption, the code writes a canonical prologue (55 8B EC) at the buffer pointer plus one byte and patches the tail with 90 90 90 90.

Figure 19 – Patching the head and tail markers with valid instructions.

The function body is decrypted in two layers using ai_xform_sub_rc4_sub.

Layer 1 key: Constructed by ai_rc4key20_build (0x00404543) and then XOR-ed with 1-byte modifier (0x36 in the example below):

Figure 20 – Calculation of the Layer 1 key for function decryption.

Layer 2 key: Constructed by concatenating the head marker with 16 zero bytes.

Example:

Wrapper 0x00415943 decrypts function 0x00415343 using:

Markers: 4c ed 39 65, a7 32 ca 6a
Key-1: 3f548513b8c7d376ec59d1a03e3313aaf6cd4262 (20B from sub_404543, then XOR 0x36)
Key-2: 4ced396500000000000000000000000000000000

Function decryption scheme III

In the latest scheme, 4-byte markers are also used to find the start and end of a function. As in the previous method, two layers of encryption are applied, and the same key is used to decrypt the first layer. However, the 20-byte key for the second layer is embedded within the wrappers (unique for each encrypted function) and is modified using a 4-byte value.

Example:

Wrapper 0x00418dc3 decrypts function 0x0040d543 using:

Markers: 9c da 5e e8, 0e e7 b2 36
Key-1: 3f548513b8c7d376ec59d1a03e3313aaf6cd4262 (20B from sub_404543, then XOR 0x36)
Key-2: c2b2622cf0608327d4e542bc4ac3d2f709e092dc (calculated in the wrapper)

Additionally, a separate function may be used to handle the decryption of the second layer and the patching process.

Figure 21 – Function used to handle the decryption of the second layer and the patching process.

We identified three distinct function decryption schemes in XLoader:

Scheme 1 – Most common. In this scheme, encrypted functions already begin with a valid prologue. A global 20-byte base key is reused across all functions, but each function also has a unique 4-byte XOR modifier. The base key and modifier are combined to derive a per-function RC4 key. A modified RC4 routine is then used with this key to decrypt two 6-byte markers that define the boundaries of the encrypted block, and the same key is subsequently applied to the function body itself. Once decrypted, the two markers are overwritten with NOPs.
Scheme 2. Uses 4-byte markers and two encryption layers. The first layer uses a 20-byte key produced by a dedicated key-builder function, then XOR-tweaked with a single byte. The second layer key is constructed from the 4-byte head marker concatenated with sixteen zero bytes. After decryption, the wrapper repairs the function by patching 55 8B EC at the prologue and filling the tail with NOPs.
Scheme 3. Similar to Scheme 2, but the second-layer key is not derived from the head marker. Instead, each wrapper embeds its own 20-byte constant (five DWORDs XORed with salt), which is used to decrypt the prefix up to a sentinel value before the same prologue/tail patching is applied.

It is worth noting that to implement universal static decryptors, we still had to break the task down into smaller steps ourselves: locating wrapper functions, extracting 20-byte keys, recovering 4-byte modifiers, and identifying and calculating marker positions. We combined them into a single decryptor only after confirming that each step worked reliably and produced the correct data for every function. At the same time, AI significantly reduced the time required to implement regular expressions (even though they had to be adjusted manually) as well as during the analysis and implementation of cryptographic algorithms.

With each decryption iteration we obtained a new batch of decrypted functions, some of which contained the keys required for decrypting additional functions. By applying all three decryptors sequentially over four iterations, we ultimately succeeded in decrypting 101 functions.

Unfortunately, it was not possible to accurately measure the time spent on this task, as it required a considerable amount of additional work and manual corrections. However, this stage turned out to be the most complex and time-consuming for both us and the LLM.

We also got suggestions.json with the suggested names for the analyzed functions. This is very useful, because we can keep this file for other analysis sessions and easily import it in the current IDB, or even in a new IDB (after decrypting the functions) without the need to diff it with the old database.

We now have a fully decrypted sample, which allows us to continue the analysis using the same methodology.

API Calls deobfuscation

Now that we have a fully decrypted sample, we can apply the same approach to it. We first load suggestions.json and then perform an export. As stated earlier, even during the very first analysis of the sample (before we had obtained the decrypted functions) the assistant pointed out the presence of obfuscated API calls. The import table in this sample is empty and there are no plaintext strings that might contain library or function names.

As we created a new clean session by uploading the decrypted sample, we decided to test how reproducible the analysis results were.

Therefore, in the very first prompt, without providing any hints, we asked the assistant to identify the API call obfuscation mechanisms. As in the previous case, we specified that the analysis should begin at the OEP rather than at the start function, so that the AI assistant would not get bogged down analyzing the packer.

The IAT is empty, no plaintext strings in the sample.
Determine how the sample resolves and invokes Windows APIs. Start the analysis from `oep_start` (`0x00430CB3`).

Four minutes later, we received a description of the algorithm:

Figure 22 – Description of one of the API resolution algorithms.

Interestingly, during the first analysis (when part of the functions was still encrypted) and the second analysis, our AI assistant identified different functions responsible for API hash decryption. In the first case, it pointed to sub_404603 (later renamed to ai_apiid_decrypt_salt), while in the second case it identified only sub_4045B3 (ai_apiid_decrypt).

Next, we used the following simple prompt to generate a script for API call deobfuscation:

Implement an IDAPython script for deobfuscating API calls. Annotate the resolver and every call site with Module!Function, original ID, and EA; IDA 9+. Log all deobfuscation attempts.

We got a script with the following functionality:

<strong>Figure 23</strong> - Description of the IDA Python script for
API call deobfuscation. — **Figure 23** – Description of the IDA Python script for API call deobfuscation.

As the assistant did not have access to IDA, we had to test the script manually. If there were errors, we sent the results back to the chat and asked for corrections. It took five iterations and about 20 additional minutes to obtain a fully working version.

Therefore, we also asked the assistant to analyze an alternative path:

Analyze routine `sub_404603` as an alternate API-hash decrypter. Recover its algorithm. Find call sites. Extend the IDAPython deobfuscator.

It took another five iterations (sending back errors and corrections) and 14 minutes before we obtained a fully functional script.

XLoader uses the same hash-decryption mechanism to look for sandbox artifacts, virtual machines, and processes typical of a researcher’s environment. While fixing issues, we also added dictionary-based hash brute forcing (loading the wordlist from a separate file), which let us automatically annotate not only functions but also certain strings corresponding to specific evasion techniques:

Figure 24 – Deobfuscated API function and string identifiers.

As a bonus, we received a summary of how API resolution works, describing two different methods:

<strong>Figure 25</strong> - Description of the two methods for
decrypting hashes and resolving API functions. — **Figure 25** – Description of the two methods for decrypting hashes and resolving API functions.

Overall, it took roughly one hour of the AI assistant’s work to go from the first prompt to a fully functional API deobfuscation script. This figure does not include local testing or the time spent writing prompts. For this task, the human effort was minimal.

The table below summarizes the prompts we used and the time required for each step:

Prompt	Time consumed	Notes
Initial prompt (instructions)	1m 26s
The IAT is empty, with no plaintext strings in the sample. Determine how the sample resolves and invokes Windows APIs. Start the analysis from `oep_start` (`0x00430CB3`).	4m 28s	Found only one path (ai_apiid_decrypt).
Implement an IDAPython script for deobfuscating API calls. Annotate the resolver and every call site with Module!Function, original ID, and EA; IDA 9+. Log all deobfuscation attempts.	9m 44s	The script did not work.
Multiple	20m	Local testing of the script and sending error reports, 5 iterations.
Analyze routine `sub_404603` as an alternate API-hash decrypter. Recover its algorithm. Find call sites. Extend the IDAPython deobfuscator.	4m 49s	Described the second API resolution path through ai_apiid_decrypt_salt (formerly sub_404603). The updated script did not work.
Multiple	14m 35s	Local testing of the script, sending error reports and notes on incorrect behavior, 5 iterations.
Please provide a summary on two algorithms `ai_apiid_decrypt_salt`, `ai_apiid_decrypt` -> `ai_resolve_export`	17s
Total time	55m 2s

Additional protection of critical API calls

While reviewing the API call deobfuscation results, we noticed that some functions are invoked through an interesting wrapper which was originally hidden among the encrypted functions at address 0x0040AC93 (ai_dec_func_16).

This function acts as a secure-call trampoline: it temporarily encrypts nearly the entire image before invoking a function pointer and then decrypts those same regions once the call returns. Only a tiny “island” (the space between the call-site’s return address and a marker) remains unencrypted by the per-call XOR so that execution can proceed.

Figure 26 – “Secure-call trampoline” decompiled function explained by LLM.

What makes this mechanism notable? Because the function stays encrypted nearly the entire time, it is difficult to even detect its existence without static decryption. In memory dumps, it appears only in encrypted form.

At the same time, if some security software or a sandbox hooks API calls protected by this wrapper and tries to analyze or dump the process memory at the time of the call, the mechanism effectively shields the malicious code.

In total, 20 functions are protected this way, including NTAPI routines related to processes, threads, memory, and file operations, as well as several WinSock functions. The full list is shown in the image below:

Figure 27 – The list of API calls protected by “secure-call trampoline.”

String decryption

The sample contains no readable strings. At the same time, the code features a series of short routines that share the same skeleton: the prologue allocates a small stack frame and initializes a local array (its size varies by function). This is followed by a recurring trio of calls with the same order and identical argument signatures:

Figure 28 – Example of an encrypted string in XLoader code.

Let’s analyze one of these functions. Because we started a fresh session for this task, we instructed the AI assistant to trust the existing comments that briefly describe each function’s behavior, so it doesn’t reanalyze routines that were already covered.

Analyze the functionality starting from `sub_405773`. Recurse into its callees. Trust the comments in the disassembly.

As a result, we determined that this function decrypts a string using the algorithm implemented in one of the previously decrypted routines (ai_dec_func14). Note how the AI built the call graph and described each routine’s behavior from a single short prompt, relying solely on the data in the previously prepared archive:

Figure 29 – Call graph and description of the string decryptor stub.

With a simple prompt, we readily obtain the encryption keys and the decrypted string.

Decrypt the string from `sub_405773`.

Figure 30 – String decryption result with derived key and ciphertext.

Now that we are confident the AI assistant understands how the encrypted strings are stored, knows how the key is derived, and already analyzed and correctly reimplemented the decryption algorithm (verifying it on real data), we can move on to implementing a script to decrypt the remaining strings.

This time we used a slightly more detailed prompt because we wanted specific information to appear both in the comments and in the console output:

Implement an IDAPython script (IDA 9+) that decrypts strings. Requirements:
- Find and annotate every call to the decrypter function `sub_4050F3` with the decrypted string.
- For each string, output debug info: encrypted buffer, XOR tweak byte (`BL`), length, and decrypted bytes.
- Print all binary data in hex.

This time we were lucky and immediately got a working script which we used to decrypt 175 strings:

Figure 31 – Decrypted strings in IDA.

The total time required for the analysis was about 20 minutes:

Prompt	Time consumed
Initial prompt (instructions)	1m 26s
Analyze the functionality starting from sub_405773. Recurse into its callees. Trust the comments in the disassembly.	7m 45s
Decrypt the string from sub_405773	5m 10s
Implement an IDAPython script (IDA 9+) that decrypts strings. Requirements:Find and annotate every call to the decrypter function sub_4050F3 with the decrypted string.For each string, output debug info: encrypted buffer, XOR tweak byte (BL), length, and decrypted bytes.Print all binary data in hex.	5m 30s
Total time	19m 51s

We now have a decrypted sample with deobfuscated strings and API calls that we can analyze like a regular binary. However, XLoader is not that simple: some data remains encrypted even at this stage.

Decrypting domain names

Extracting lists of Indicators of Compromise (IoCs) is always a critical task in malware analysis. Network indicators, such as domain names and URLs, are especially important because they help detect and classify malware through traffic analysis. That is why extracting domains is essential — even though some may be decoys or bait, or currently inactive but intended for later use.

Among the recovered strings, we see 64 Base64-encoded entries. Looking at the version history of XLoader, we find that starting from version 2.8 it began storing encrypted domain names in Base64 form. Without a doubt, these 64 Base64 strings represent domain names that we must decrypt. As early as version 4, XLoader added two additional layers of modified RC4 encryption with different keys, making the decryption more complicated. In later versions, this process became even more complex. In total, to reach the decrypted domain names we need to peel off at least five layers after first identifying where and how the keys are initialized: decrypt the functions that initialize encrypted strings, decrypt the strings themselves (which we already did earlier), base64-decode the results, and apply two more layers of decryption.

At the same time, obtaining the keys for each layer is the most difficult part, as the different pieces of data needed to generate them are scattered across multiple functions, making them hard to locate.

Before moving on, we updated our string deobfuscation script so that it also renames the functions responsible for retrieving decrypted strings and assigns a prefix ai_dec_domain_{NN} to all functions handling Base64-encoded strings. We then exported the database to prepare for domain decryption.

All calls to the ai_dec_domain_{NN} functions occur inside a single function, ai_dec_func_0 (0x00404913):

Figure 32 – Domain generation function.

We start the analysis of this function using the simplest prompt:

Analyze `ai_dec_func_0` (0x00404913)

As a result of our analysis, we obtained a detailed description showing that ai_dec_func_0 (later renamed to ai_dec_func_0_domain_tag_generate) is XLoader’s stage-1 domain builder. For a given domain index (1..64), it pulls the matching seed string from ai_dec_domain_NN, base64-decodes it, then runs a keyed RC4-with-diff transform whose key is a 20-byte secret stored at ctx+0x23D0 (where ctx is a global structure that stores keys, function addresses, and other data), and byte-XORed with the domain index. The result is re-encoded to base64 and written to the output buffer. This is an intermediate artifact, not the final ASCII domain.

<strong>Figure 33</strong> - Domain generation function analysis
report. — **Figure 33** – Domain generation function analysis report.

A special “token” branch is enabled when a5==222 is used for generating a path string for a URL. It emits a short pattern /<4-byte token>/ from a static table and applies the same keyed transform.

Next, we asked the assistant to reproduce the transformations implemented in ai_dec_func_0_domain_tag_generate. Because our script already retrieved the decrypted strings and added them as comments in the previous step, we requested that these strings also be used alongside the repeated string decryption:

Comments in the disassembly of ai_dec_domain_NN contain base64-encoded encrypted domain names.

Take the one from `ai_dec_domain_01`, then transform it with `ai_dec_func_0` and return the final string.

As a result, our assistant was unable to locate the key at ctx+0x23D0 on its own:

Figure 34 – AI was unable to locate the key.

That was not surprising, as none of the exported data contained references to the structure’s fields and can only be located by the offset in the structure, or captured in the debugger.

Figure 35 – Initialization of the RC4 key for decrypting the first layer.

We therefore had to manually locate where this key was initialized and provide that information to the AI assistant.

You can find `ctx+0x23D0` initialization in `ai_dec_func_4` (`0x00407293`), also check `sub_404453`.
After the keys were extracted, we tried again:Now you have the required key.
Take the sting from `ai_dec_domain_01`, then transform it with `ai_dec_func_0` and return the final string.

As a result, we obtained an intermediate value for the string returned by ai_dec_domain_01 (1Qo1bG/xbpI2gGY8lCzWBw==) after the first decryption layer, which was re-encoded in Base64 as (Qvm75Acm5NpYTbnYXdcvBw==):

Figure 36 – Result of reproducing the domain generation function’s behavior for index 2.

Because the string is still encrypted, and we do not yet know what happens to it next, we need to investigate further:

The string you returned is still encrypted. Trace the complete transformation chain from `ai_dec_func_0` (case 2, string `Qvm75Acm5NpYTbnYXdcvBw==`) to the final ASCII domain. 
Discover any remaining layers, locate and derive all required keys/parameters from the context/initializers, and cite function names with EAs used. 
Output the final domain, and a concise step-by-step pipeline, print all keys/IVs as hex. If any value is missing, state exactly what it is and where to read it.

As a result, we discovered that the obtained Base64 string is decoded again and then decrypted by a second layer using a 20-byte key generated inside the function ai_dec_func_11 (0x004095F3). This key is additionally XORed with SALT_DWORD. However, the initialization of SALT_DWORD is missing from ai_dec_func_11, and the assistant was therefore unable to retrieve it on its own.

Figure 37 – AI was unable to locate the Stage-2 key.

As in the previous case, we manually recovered the missing value. The complication was that instead of offset 0x25C8, the base 0x2000 was used, with 0x5C8 added later, which made the search a bit more difficult.

We provided not only the address of the function where the initialization of SALT_DWORD was assumed to occur, but also the relevant code snippets:

You can find SALT_DWORD in `ai_dec_func_20` (`0x00411053`). Please verify and continue:
.text:004111B3 81 C7 00 20 00 00                       add     edi, 2000h
...
.text:0041182B C7 87 C8 05 00 00 00 A6                 mov     dword ptr [edi+5C8h], 0C6EA600h

During the analysis, our AI assistant confirmed that the key was correct and successfully decrypted the domain corresponding to ai_dec_domain_01:

Figure 38 – Result of successful domain decryption.

Now the AI assistant can automatically decrypt all the domains. Let’s verify this by asking it to decrypt the first 16 domains:

Decrypt domains from `ai_dec_domain_00` to `ai_dec_domain_15`. Output: table (index, src_base64, final_domain). Take per-function base64 from `ai_dec_domain_NN` disassembly comments.

In the end, we obtained a table with the fully decrypted domain names:

Index	Base64 string	Domain
00	vzdmFbhw7yOvxdFkCmo=	taxi-in[.]online
01	1Qo1bG/xbpI2gGY8lCzWBw==	taskcomputer[.]xyz
02	pl3FzS0/qAShr9lpPklP	synergydrop[.]xyz
03	HjGg/gTdG4wotWUMc2FdaQM4	royal-bet-king[.]xyz
04	DfIa6p4SMpuTl9pz88upxCs=	goldenspoon[.]click
05	j5pycQ0Z6wgJNE5bRsgHZg==	spark-stack[.]shop
06	kVA0/yW+XRq9ONJ2DHOKcHWmYC5B	lecerisierenfleur[.]net
07	gqIXlaPvhlfqj/rC45qHCivuyg==	shhiajtdaz9bhau[.]top
08	zax+KZqwt6q9KWiv9Ao=	allslotvip[.]vip
09	VnrxgJfuzU1AL7YRsrhn	debatevxtlm[.]sbs
10	qjZURKL4uqdfIR6t/6ZRdBY/s9E/qw==	theexcelconundrum[.]info
11	LiJnz4B0FuWt3hyZKCg=	runsociety[.]org
12	L/EwBoA7HTg=	avvo[.]com
13	zV9rEdBGpqvEo3v3cAnEkw==	botbuilders[.]team
14	vk923CIZnNuJkgaN4qXyZA==	hawkingonsol[.]xyz
15	Q97vXR4iHn+C7yQMHTBY5Jk=	streamingsite[.]xyz

Let’s recall that at the very beginning of the analysis we discovered a separate branch of ai_dec_func_0 that activates when the last value equals 222. In this case, a 4-character tag is generated which later becomes part of the URL. Previously, this tag was the same for all domains and unique to a malware campaign. Now, however, each domain has its own tag.

We tried to decrypt the tags for the first 16 domains using the following prompt:

Reproduce the output of `ai_dec_func_0` for a5=222 and domain index 0..15.

In response to the prompt, the assistant provided a description of how the malware generates those 4-character tags, and the table containing the first 16 tags.

<strong>Figure 39</strong> - Description of the domain generation
function in “token” mode. — **Figure 39** – Description of the domain generation function in “token” mode.

Index	Source bytes (hex, 4B LE)	Output bytes (hex, 6B)	Output string
0	00 f6 76 09	2f 7a 62 75 75 2f	/zbuu/
1	62 64 68 53	2f 6e 74 64 35 2f	/ntd5/
2	2a bd 03 94	2f 78 65 72 73 2f	/xers/
3	82 7e 4e 51	2f 6c 30 6d 6c 2f	/l0ml/
4	0a 7c 87 3c	2f 38 67 6e 35 2f	/8gn5/
5	e1 5c f0 cb	2f 65 32 63 71 2f	/e2cq/
6	27 21 aa e4	2f 33 67 6c 64 2f	/3gld/
7	ca ca b2 c7	2f 6b 6d 39 34 2f	/km94/
8	51 7c 8a 6b	2f 7a 6b 33 32 2f	/zk32/
9	30 d3 9f 6a	2f 7a 74 6b 72 2f	/ztkr/
10	c8 9a be 30	2f 35 67 6d 69 2f	/5gmi/
11	2a 30 1c e3	2f 69 76 37 6e 2f	/iv7n/
12	51 7b 47 41	2f 37 74 6e 61 2f	/7tna/
13	d1 d5 0d d2	2f 6f 67 61 67 2f	/ogag/
14	8d 34 e0 6f	2f 71 30 73 66 2f	/q0sf/
15	df 4a 1e b6	2f 65 63 69 73 2f	/ecis/

The table below summarizes the prompts we used, and the time required for each step:

Prompt	Time consumed	Notes
Initial prompt (instructions)	1m 31s
Analyze `ai_dec_func_0` (0x00404913)	5m 22s
Comments in the disassembly of `ai_dec_domain_NN` contain base64-encoded encrypted domain names. Take the one from `ai_dec_domain_01`, then transform it with `ai_dec_func_0` and return the final string.	11m 33s	Failed to obtain the layer 1 RC4 key
You can find `ctx+0x23D0` initialization in `ai_dec_func_4` (`0x00407293`), also check `sub_404453`.	1m 34s
Now you have the required key. Take the sting from `ai_dec_domain_01`, then transform it with `ai_dec_func_0` and return the final string.	5m 14s
The string you returned is still encrypted. Trace the complete transformation chain from `ai_dec_func_0` (case 2, string `Qvm75Acm5NpYTbnYXdcvBw==`) to the final ASCII domain. Discover any remaining layers, locate and derive all required keys/parameters from the context/initializers, and cite function names with EAs used. Output the final domain, and a concise step-by-step pipeline, print all keys/IVs as hex. If any value is missing, state exactly what it is and where to read it.	10m 2s	Failed to obtain SALT_DOWRD
You can find SALT_DWORD in `ai_dec_func_20` (`0x00411053`). Please verify and continue: .text:004111B3 81 C7 00 20 00 00 add edi, 2000h … .text:0041182B C7 87 C8 05 00 00 00 A6 mov dword ptr [edi+5C8h], 0C6EA600h	3m 20s
Decrypt domains from `ai_dec_domain_00` to `ai_dec_domain_15`. Output: table (index, src_base64, final_domain). Take per-function base64 from `ai_dec_domain_NN` disassembly comments.	5m 36s
Reproduce the output of `ai_dec_func_0` for a5=222 and domain index 0..15.	3m 42s
Total time	47m 54s

Conclusion

From its initial appearance, XLoader has always been a moving target, with each new version raising the bar for security analysts and defenders. XLoader began as a two-layer puzzle but evolved into a maze of nested decryptors, scattered key material, and runtime-only code. For years, this meant that by the time researchers fully unraveled a sample, attackers were already one step ahead with the next version.

Generative AI changes this balance. Combining cloud-based analysis and occasional MCP-assisted runtime checks, we delegated a large part of the mechanical reverse engineering to LLM. Instead of spending hours rewriting decryption routines by hand, we asked our AI model to do it and received working prototypes in minutes.

The use of AI doesn’t eliminate the need for human expertise. XLoader’s most sophisticated protections, such as scattered key derivation logic and multi-layer function encryption, still require manual analysis and targeted adjustments. But the heavy lifting of triage, deobfuscation, and scripting can now be accelerated dramatically. What once took days can now be compressed into hours.

For defenders, this is more than a productivity boost. Faster turnaround means fresher IoCs, quicker detection updates, and a shorter window of opportunity for attackers. For researchers, it lowers the entry barrier to analyzing some of the most complex malware families in the wild.

Our research shows that with the right workflows, generative models can already serve as a force multiplier — helping security defenders keep pace with threats that were once considered prohibitively time-consuming to analyze.

However, it’s too soon to declare victory, as we expect malware authors to adapt their techniques in response to AI-assisted analysis. And in turn, we’ll need to come up with the next game-changer.

Protection

Check Point Threat Emulation and Harmony Endpoint provide comprehensive coverage of attack tactics, file types, and operating systems and protect against the attacks and threats described in this report.

GO UP

BACK TO ALL POSTS

CATEGORIES

Beating XLoader at Speed: Generative AI as a Force Multiplier for Reverse Engineering

Key takeaways

Introduction

Motivation

Two approaches to AI-assisted analysis

MCP: Agentic analysis with live tools

Offline IDA export pipeline: reverse engineering with AI in the cloud

What went wrong and how we fixed it

GPT-5 in Practice: Analyzing XLoader’s Built-in Crypter

Analysis of the unpacked sample

Function decryption scheme I

Function decryption scheme II

API Calls deobfuscation

String decryption

Decrypting domain names

Conclusion

Protection

POPULAR POSTS

BLOGS AND PUBLICATIONS

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

“The Next WannaCry” Vulnerability is Here

‘RubyMiner’ Cryptominer Affects 30% of WW Networks

We value your privacy!

CATEGORIES

Beating XLoader at Speed: Generative AI as a Force Multiplier for Reverse Engineering

Key takeaways

Introduction

Motivation

Two approaches to AI-assisted analysis

MCP: Agentic analysis with live tools

Offline IDA export pipeline: reverse engineering with AI in the cloud

What went wrong and how we fixed it

GPT-5 in Practice: Analyzing XLoader’s Built-in Crypter

Analysis of the unpacked sample

Function decryption scheme I

Function decryption scheme II

API Calls deobfuscation

String decryption

Decrypting domain names

Conclusion

Protection

POPULAR POSTS

BLOGS AND PUBLICATIONS

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

“The Next WannaCry” Vulnerability is Here

‘RubyMiner’ Cryptominer Affects 30% of WW Networks

SUBSCRIBE TO CYBER INTELLIGENCE REPORTS

We value your privacy!