Back to Blog
Vulnerability Research

(1/3) | vLLMania: How We Hacked Thousands of AI Model Servers Worldwide

January 22, 2025
15 min read
By ModelsGuard Research Team
CVE-2025-62164vLLMAI SecurityModel Theft

(1/3) | vLLMania: How We Hacked Thousands of AI Model Servers Worldwide

Your AI model might already be stolen. You just don't know it yet.

We discovered a critical vulnerability in vLLM—the most popular framework for serving large language models—that allows complete server takeover through a single malicious prompt. No authentication. No credentials. Just one HTTP request.

The vulnerability, tracked as CVE-2025-62164 (CVSS 9.3), affects thousands of production inference servers worldwide. We found 1,343 exposed endpoints across 14 countries, serving proprietary models worth an estimated $1.256 billion in training costs.

This is the story of how we discovered it, what it means for AI security, and why your $10M model might be at risk right now.

TL;DR

  • CVE-2025-62164: Critical RCE in vLLM's OpenAI-compatible API server
  • Attack Vector: Single malicious prompt achieves full server compromise
  • Impact: Complete model theft, data exfiltration, infrastructure takeover
  • Scale: 1,343+ vulnerable endpoints discovered, 14 countries affected
  • At Risk: Enterprise fine-tuned models, domain-specific models, unreleased architectures
  • Status: Patched in vLLM v0.8.3, but most servers remain vulnerable
  • Detection: Most organizations don't even know their models are exposed

The Discovery

It started with a routine pentest. A client asked us to assess their AI infrastructure. Within 15 minutes of scanning, we had:

  1. Full access to their inference server
  2. Downloaded their proprietary fine-tuned model
  3. Extracted their training configurations
  4. Gained shell access to the underlying infrastructure

The scary part? This wasn't a sophisticated attack. It was one HTTP request to their vLLM server's OpenAI-compatible API.

The Vulnerability Explained

vLLM is the de facto standard for serving LLMs in production. It's fast, efficient, and supports an OpenAI-compatible API that makes it a drop-in replacement for OpenAI's endpoints.

But there was a problem in how vLLM handled certain API parameters.

The technical breakdown:

The vulnerability exists in vLLM's OpenAI-compatible server implementation. By manipulating specific parameters in the chat completion request, an attacker can:

  1. Escape the model inference context and execute arbitrary code on the server
  2. Access the filesystem where model weights are stored
  3. Exfiltrate model weights (often 10-100GB files) to attacker-controlled infrastructure
  4. Maintain persistence via modified serving configurations
  5. Pivot to other systems in the network

The attack requires zero authentication if the vLLM server is exposed to the internet—which thousands are.

What We Found

After discovering the vulnerability, we did what any responsible security researchers do: we scanned the internet to understand the scope.

The numbers are staggering:

  • 1,343 exposed vLLM endpoints responding to OpenAI-compatible API requests
  • 14 countries with vulnerable infrastructure
  • 127 enterprise fine-tuned models with custom training visible in paths
  • 165 domain-specific models for healthcare, finance, legal, and other industries
  • 924 unreleased/proprietary architectures not publicly available

This isn't a theoretical problem. These are production servers serving real customers with real proprietary models.

Why This Matters

Your Competitive Advantage Just Became Worthless

You spent months fine-tuning a model on your proprietary data. Your competitor now has an exact copy. Your edge? Gone.

Regulatory Nightmare

That healthcare model trained on patient data? Now in the hands of unknown actors across borders. HIPAA violations. GDPR violations. Lawsuits incoming.

IP Theft at Scale

Model weights contain embedded knowledge from training data. Stealing the model means stealing:

  • Proprietary business logic
  • Industry-specific knowledge
  • Trade secrets encoded in parameters
  • Competitive intelligence

Trust Collapse

Your customers trusted you with their data for model inference. That trust is broken the moment they learn your infrastructure was compromised.

Real-World Attack Scenario

Here's what a real attack looks like:

Step 1: Discovery (5 minutes)

# Scan for exposed vLLM servers
shodan search "vllm" --fields ip_str,port,org,hostnames

# Or use Censys, ZoomEye, or any internet scanning tool

Step 2: Identify Target (2 minutes)

# Test if the server is running vLLM with OpenAI-compatible API
curl https://target-server.com/v1/models

# Response shows proprietary model names and paths

Step 3: Exploit (1 minute)

# Send malicious payload via chat completion endpoint
# (Exact exploit code withheld for responsible disclosure)

Step 4: Exfiltration (30-120 minutes)

# Download model weights (10-100GB)
# Upload to attacker-controlled storage
# Clean up traces

Total time from discovery to complete model theft: Under 2 hours.

Why Most Organizations Don't Know They're Vulnerable

Here's the problem: visibility.

Most organizations don't have a complete inventory of their AI infrastructure. They don't know:

  • Which inference servers are exposed to the internet
  • What models are running on those servers
  • Whether those servers have been patched
  • If unauthorized access has already occurred

Shadow AI is real. Development teams spin up inference servers for testing, forget about them, and leave them running—often with production models loaded.

The Patch (And Why It's Not Enough)

vLLM released a patch in v0.8.3 that addresses CVE-2025-62164.

From vLLM's release notes:

"Fixed critical security issue in OpenAI-compatible server allowing unauthorized code execution via malicious API parameters. All users should upgrade immediately."

But here's the reality:

  • Most organizations don't monitor vLLM releases
  • Many use containerized versions pinned to older releases
  • Patching requires redeployment and testing
  • Some organizations haven't even identified all their vLLM instances

We're finding vulnerable servers deployed months ago that are still unpatched.

What Organizations Should Do Right Now

Immediate Actions (Do This Today)

  1. Identify all vLLM instances in your infrastructure
  2. Check versions - anything below v0.8.3 is vulnerable
  3. Verify internet exposure - these servers should NOT be publicly accessible
  4. Review access logs for suspicious activity
  5. Upgrade to vLLM v0.8.3+ immediately

Short-term (This Week)

  1. Implement network segmentation - inference servers should be isolated
  2. Add authentication even for internal services
  3. Enable comprehensive logging and monitoring
  4. Conduct incident response if compromise is suspected
  5. Inventory all AI infrastructure assets

Long-term (This Month)

  1. Implement AI-specific security controls - traditional security isn't enough
  2. Deploy model security monitoring - detect unauthorized access and exfiltration
  3. Establish model governance - know what models you have and where they are
  4. Regular security assessments of AI infrastructure
  5. Incident response playbooks specifically for model theft

Coming Up Next

This is Part 1 of 3 in our vLLMania series.

Part 2: The Global Impact

  • Detailed analysis of the 1,343 vulnerable endpoints
  • Geographic distribution and industry breakdown
  • Case studies of real compromised servers
  • Cost analysis of models at risk

Part 3: Securing AI Infrastructure

  • Complete guide to securing vLLM deployments
  • Reference architectures for production AI serving
  • Monitoring and detection strategies
  • Building a model security program

Responsible Disclosure Timeline

  • December 1, 2024: Vulnerability discovered during client engagement
  • December 3, 2024: Reported to vLLM maintainers
  • December 10, 2024: Initial patch developed
  • December 20, 2024: vLLM v0.8.3 released with fix
  • January 5, 2025: Public disclosure after 35-day embargo
  • January 22, 2025: This blog post published

Get Protected

ModelsGuard provides comprehensive AI model security:

  • Continuous monitoring of AI infrastructure for vulnerabilities
  • Automatic detection of exposed inference endpoints
  • Real-time alerts on suspicious model access patterns
  • Model exfiltration prevention and DLP for AI
  • Compliance for HIPAA, GDPR, SOC2 in AI contexts

Check if your models are at risk - enter your email for a free security assessment.


This research was conducted by the ModelsGuard Security Research Team. For questions, contact research@modelsguard.com

CVE-2025-62164 details: NVD Entry | vLLM Release Notes