Joe Desimone

How we caught the Axios supply chain attack

Joe Desimone erzählt, wie er den Axios-Lieferkettenangriff mit einem in einem Nachmittag entwickelten Prototyp-Tool aufdeckte.

Wie wir den Axios-Lieferkettenangriff aufgedeckt haben

Präambel

Last Monday night I was working late and a Slack alert came in from a monitoring tool I had built three days earlier. Axios compromised; one of the most popular npm packages in the world.

My heart started racing, I knew every second mattered to respond and limit the damage. But honestly it was so crazy that I thought it must be a false positive. I checked and rechecked everything a few times even though it seemed very obviously malicious.

It wasn't a false positive. It was one of the largest supply chain compromises ever on npm, with presumed attribution to DPRK state actors. We caught it with a proof of concept I hacked together on a Friday afternoon, running on my laptop, powered by AI reading diffs.

I want to share the whole story. How we got here, what I built, and why I think sharing it openly makes everyone a little safer.

I've been worried about supply chain for a while

Some recent supply chain incidents have genuinely had me up at night. Supply chain compromise is a hard problem. At Elastic we have so many developers, and our security customers are trusting us to protect them. It has been clear that the status quo is broken, and we need some new technology or procedures to help. I had some ideas around a more trusted, AI-vetted ecosystem, building on app control principles while limiting cost and friction.

But the Trivy compromise was really where I took notice. On March 19th, a group called TeamPCP compromised the aquasecurity/trivy-action GitHub Action (the one for the popular Trivy security scanner, yes, a security tool). They injected a credential stealer that harvested secrets from CI/CD pipelines. A massive amount of credentials were stolen.

That cascaded fast. On March 24th, LiteLLM got hit. TeamPCP had stolen LiteLLM's PyPI publishing credentials through the poisoned Trivy pipeline, and used them to push malicious versions that were aggressive credential stealers. SSH keys, cloud creds, API keys, wallet data, everything.

LiteLLM is a package I had used myself. So you could say at that point I was fully "up at night."

I knew that with all the credentials leaked from the Trivy breach, there was definitely going to be more. We needed to do something to stay ahead of it. Both for our customers and to protect Elastic.

Friday, after the red-eye

I had just flown back from RSAC 2026 in San Francisco. Red-eye flight Thursday night. If you've done a red-eye after four days of conference, you know the state I was in. However, I was excited as ever for a new project, so I sat down and hammered out v0.0.1.

The idea: monitor changes as they get pushed to package repos. Run a diff to see what changed. Use AI/LLM to determine if the changes are malicious. That's basically it.

The pipeline looks like:

  1. Poll PyPI's changelog API and npm's CouchDB _changes feed for new releases
  2. Filter against a watchlist of the top 15,000 packages by download count
  3. Download the old and new versions directly from the registry (no pip install, no npm install, no code execution)
  4. Diff them into a markdown report
  5. Send the diff to an LLM: "is this malicious?"
  6. If yes, alert to Slack

I wanted to focus mainly on top packages since that's most likely where attackers would go anyway, and it would be much less costly in terms of tokens and compute. It was completely manageable to run on my laptop.

Why Cursor

There are a lot of agent harnesses out there. I've written my own for projects like AI malware reverse engineering. But I was very short on time, so I chose to harness up Cursor since it's one of my main dev tools. The Agent CLI lets you invoke it programmatically: pass a workspace, an instruction, and a model. I run it in ask mode (read-only) so it can only read the diff, never modify anything. The whole analysis step is a single subprocess call.

The prompt is simple. I tell it what to look for (obfuscated code, base64, exec/eval, unexpected network calls, steganography, persistence mechanisms, lifecycle script abuse) and ask it to respond with Verdict: malicious or Verdict: benign. Parse the verdict, act on it.

On model selection

I normally use Opus 4.6 or GPT 5.4 for most things. Opus especially for cybersecurity-focused tasks. But I wanted to keep costs down for something that needs to analyze dozens of releases per hour.

There have been some really good blog posts from the Cursor team lately, one on fast regex search for agent tools and another on their real-time RL approach where they use actual production inference tokens as training signals and deploy improved checkpoints roughly every five hours. Genuinely impressive engineering.

So I wanted to give Composer 2 a shot. I used fast mode, which is truly fast. Perfect for a real-time use case. Low cost, fast, and effective (in my testing).

Testing on Telnyx

You have to test these things to know they'll actually work. Usually that means tweaking prompts a bunch.

I got lucky (or unlucky) with timing. On the same Friday I was building this, the telnyx PyPI package got compromised by TeamPCP. They injected 74 lines of malicious code into _client.py: payloads hidden inside WAV audio files (steganography), base64 obfuscation, a Windows persistence implant disguised as msbuild.exe, and exfiltration to a hardcoded C2.

I used the diff between the legitimate and malicious telnyx package to build out the initial prompt. The model was very good at identifying malicious changes like this. I also wanted to know immediately when a compromise was detected, so I added Slack alerting.

Monday night

I let it run over the weekend. It churned through releases, everything coming back benign.

I never got a single false positive, which is honestly strange if you've ever done detection work in cybersecurity. We're usually drowning in FPs. I intentionally instructed the LLM to only alert on "high confidence" supply chain compromises, as they are generally trigger-happy out of the box. Still catching the Telnyx test case, with no FPs. Could be overfitting with such a low sample size, but no time to build something more robust.

Then Monday night, working late, the Slack alert came in.

🚨 Supply Chain Alert: axios 0.30.4
Verdict: MALICIOUS
npm: https://www.npmjs.com/package/axios/v/0.30.4

Did it really just find one of the biggest supply chain compromises in recent memory?

I checked the analysis. Rechecked it. Checked it again. The attackers had compromised a maintainer's npm account, changed the email to a ProtonMail account they controlled, and published two malicious versions (1.14.1 and 0.30.4). They didn't inject code directly into Axios. Instead they added a phantom dependency called plain-crypto-js that ran a postinstall hook deploying cross-platform malware. It was obviously malicious.

The response

I reached out immediately to our infosec team and research team at Elastic to get them spun up. I knew every second mattered. It turns out that when I contacted them, they had already received Elastic Defend alerts on a host that had installed the malicious package and were actively responding. But at that point nobody had realized the extent of the issue or had a root cause understanding of how the machine became infected. The monitoring tool provided that missing context.

I tried sending an email to security@npmjs and got a bounce back. Tried submitting to their security portal and got an error. I tweeted out in desperation to get a hold of a human. I also quickly opened a security issue on the axios repo itself.

Later, I saw a tweet from another researcher who had observed the compromise, and I realized I was handling this more as a vulnerability than a supply chain incident. With a vulnerability you coordinate quietly. With an active compromise that is installing malware on people's machines right now, going wide and open is the right call. So I immediately shared all the details I had compiled to X.

We even started getting alerts from our telemetry showing impacted orgs in the wild. The thing was actively running.

Fortunately, the Axios team jumped on it and pulled the packages pretty quickly. Also, the attacker's C2 server was getting so many requests that it was falling over. It could have been a lot worse.

Our team at Elastic Security Labs published full technical write-ups on the compromise. The first covers the end-to-end attack chain, the cross-platform malware, and the C2 protocol: Inside the Axios supply chain compromise - one RAT to rule them all. The second covers hunting and detection rules across Linux, Windows, and macOS: Elastic releases detections for the Axios supply chain compromise.

Where we go from here

The state of things right now is not great and we need to do better as a whole software ecosystem, let alone the security industry.

In two weeks in March:

  • Trivy (a security scanner) was compromised to steal CI/CD secrets
  • LiteLLM was compromised using those stolen secrets
  • Telnyx was compromised in the same campaign
  • Axios, one of the most depended-upon packages in npm, was compromised by a suspected DPRK actor
  • and more

Package registries are critical infrastructure. The teams running PyPI and npm are doing great work, but the threat has moved past what current trust models can handle. We need better automated monitoring of package changes. Not just signature scanning but actually understanding what code does. LLMs are genuinely good at this, as this project shows. And we need credential rotation after breaches to happen faster. The Trivy to Litellm to Telnyx cascade happened because stolen creds weren't rotated quickly enough.

One practical thing you can do right now: don't pull in package updates immediately. Add a soak time. Let new versions sit for a period before your builds pick them up. We do this with our CI/CD systems at Elastic in response to shai-hulud. It won't stop everything, but it gives the community time to catch compromises before they hit your CI/CD pipelines and developer machines. The good news is the many package managers have added native support for this. For example, to enforce a 7-day delay:

npm config set min-release-age 7
pnpm config set minimum-release-age 10080
yarn config set npmMinimumReleaseAge 10080
uv --exclude-newer "7 days ago"

We're open sourcing this

We're releasing the tool: supply-chain-monitor

I want to be upfront. It's a proof of concept. I built it in an afternoon on no sleep. I don't expect anyone to run it at a production level. It requires a Cursor subscription for the LLM analysis, it processes releases sequentially, and the watchlists are static.

But the approach works. Diffing package releases in real-time and using AI to classify the changes caught a supply chain attack on one of the most popular packages in npm.

I'm sharing this because it's best for the community to learn from our experiences. If someone takes this idea and builds something better, great. If a package registry team builds it into their pipeline, even better. If it means someone else has a big save next time, this was worth it.

How it works (for the curious)

Monitoring: Two threads poll PyPI (via changelog_since_serial() XML-RPC) and npm (via CouchDB _changes feed). New releases matching the top-N watchlist get queued. State persists to last_serial.yaml so it picks up where it left off.

Diffing: Old and new versions downloaded directly from registry APIs. No pip/npm install, no code execution. Archives extracted, files hashed, unified diff report generated in markdown.

Analysis: Diff report goes to Cursor Agent CLI in read-only mode. Prompt asks it to look for supply chain indicators. Output parsed for the verdict.

Alerting: Malicious verdict fires a Slack message with the package name, rank, registry link, and analysis summary.

AI in security, beyond this project

Supply chain security is a big issue, but we aren’t powerless. AI gives us new tools to defend at scale at machine speed. This project is one example of using AI to help with a security problem, but we've been doing a lot of interesting work with AI across Elastic Security more broadly. One thing I'd highlight: our team recently published a post on using Attack Discovery, Workflows, and Agent Builder to automatically detect and confirm APT-level attacks. This shows the power of the Elastic Platform, delivering agentic security to meaningfully improve the efficiency and efficacy of your SOC in a time when we are collectively drowning in attacks.


The supply-chain-monitor project is available at github.com/elastic/supply-chain-monitor.

Thanks to the Elastic Infosec team for the rapid incident response, the axios maintainers for the quick takedown, and the security community for the collective effort that limited the blast radius.