(1/3) | vLLMania: How We Hacked Thousands of AI Model Servers Worldwide

Your AI model might already be stolen. You just don't know it yet.

We discovered a critical vulnerability in vLLM—the most popular framework for serving large language models—that allows complete server takeover through a single malicious prompt. No authentication. No credentials. Just one HTTP request.

The vulnerability, tracked as CVE-2025-62164 (CVSS 9.3), affects thousands of production inference servers worldwide. We found 1,343 exposed endpoints across 14 countries, serving proprietary models worth an estimated $1.256 billion in training costs.

This is the story of how we discovered it, what it means for AI security, and why your $10M model might be at risk right now.

TL;DR

CVE-2025-62164: Critical RCE in vLLM's OpenAI-compatible API server
Attack Vector: Single malicious prompt achieves full server compromise
Impact: Complete model theft, data exfiltration, infrastructure takeover
Scale: 1,343+ vulnerable endpoints discovered, 14 countries affected
At Risk: Enterprise fine-tuned models, domain-specific models, unreleased architectures
Status: Patched in vLLM v0.8.3, but most servers remain vulnerable
Detection: Most organizations don't even know their models are exposed

The Discovery

It started with a routine pentest. A client asked us to assess their AI infrastructure. Within 15 minutes of scanning, we had:

Full access to their inference server
Downloaded their proprietary fine-tuned model
Extracted their training configurations
Gained shell access to the underlying infrastructure

The scary part? This wasn't a sophisticated attack. It was one HTTP request to their vLLM server's OpenAI-compatible API.

The Vulnerability Explained

vLLM is the de facto standard for serving LLMs in production. It's fast, efficient, and supports an OpenAI-compatible API that makes it a drop-in replacement for OpenAI's endpoints.

But there was a problem in how vLLM handled certain API parameters.

The technical breakdown:

The vulnerability exists in vLLM's OpenAI-compatible server implementation. By manipulating specific parameters in the chat completion request, an attacker can:

Escape the model inference context and execute arbitrary code on the server
Access the filesystem where model weights are stored
Exfiltrate model weights (often 10-100GB files) to attacker-controlled infrastructure
Maintain persistence via modified serving configurations
Pivot to other systems in the network

The attack requires zero authentication if the vLLM server is exposed to the internet—which thousands are.

What We Found

After discovering the vulnerability, we did what any responsible security researchers do: we scanned the internet to understand the scope.

The numbers are staggering:

1,343 exposed vLLM endpoints responding to OpenAI-compatible API requests
14 countries with vulnerable infrastructure
127 enterprise fine-tuned models with custom training visible in paths
165 domain-specific models for healthcare, finance, legal, and other industries
924 unreleased/proprietary architectures not publicly available

This isn't a theoretical problem. These are production servers serving real customers with real proprietary models.

Why This Matters

Your Competitive Advantage Just Became Worthless

You spent months fine-tuning a model on your proprietary data. Your competitor now has an exact copy. Your edge? Gone.

Regulatory Nightmare

That healthcare model trained on patient data? Now in the hands of unknown actors across borders. HIPAA violations. GDPR violations. Lawsuits incoming.

IP Theft at Scale

Model weights contain embedded knowledge from training data. Stealing the model means stealing:

Proprietary business logic
Industry-specific knowledge
Trade secrets encoded in parameters
Competitive intelligence

Trust Collapse

Your customers trusted you with their data for model inference. That trust is broken the moment they learn your infrastructure was compromised.

Real-World Attack Scenario

Here's what a real attack looks like:

Step 1: Discovery (5 minutes)

# Scan for exposed vLLM servers
shodan search "vllm" --fields ip_str,port,org,hostnames

# Or use Censys, ZoomEye, or any internet scanning tool

Step 2: Identify Target (2 minutes)

# Test if the server is running vLLM with OpenAI-compatible API
curl https://target-server.com/v1/models

# Response shows proprietary model names and paths

Step 3: Exploit (1 minute)

# Send malicious payload via chat completion endpoint
# (Exact exploit code withheld for responsible disclosure)

Step 4: Exfiltration (30-120 minutes)

# Download model weights (10-100GB)
# Upload to attacker-controlled storage
# Clean up traces

Total time from discovery to complete model theft: Under 2 hours.

Why Most Organizations Don't Know They're Vulnerable

Here's the problem: visibility.

Most organizations don't have a complete inventory of their AI infrastructure. They don't know:

Which inference servers are exposed to the internet
What models are running on those servers
Whether those servers have been patched
If unauthorized access has already occurred

Shadow AI is real. Development teams spin up inference servers for testing, forget about them, and leave them running—often with production models loaded.

The Patch (And Why It's Not Enough)

vLLM released a patch in v0.8.3 that addresses CVE-2025-62164.

From vLLM's release notes:

"Fixed critical security issue in OpenAI-compatible server allowing unauthorized code execution via malicious API parameters. All users should upgrade immediately."

But here's the reality:

Most organizations don't monitor vLLM releases
Many use containerized versions pinned to older releases
Patching requires redeployment and testing
Some organizations haven't even identified all their vLLM instances

We're finding vulnerable servers deployed months ago that are still unpatched.

What Organizations Should Do Right Now

Immediate Actions (Do This Today)

Identify all vLLM instances in your infrastructure
Check versions - anything below v0.8.3 is vulnerable
Verify internet exposure - these servers should NOT be publicly accessible
Review access logs for suspicious activity
Upgrade to vLLM v0.8.3+ immediately

Short-term (This Week)

Implement network segmentation - inference servers should be isolated
Add authentication even for internal services
Enable comprehensive logging and monitoring
Conduct incident response if compromise is suspected
Inventory all AI infrastructure assets

Long-term (This Month)

Implement AI-specific security controls - traditional security isn't enough
Deploy model security monitoring - detect unauthorized access and exfiltration
Establish model governance - know what models you have and where they are
Regular security assessments of AI infrastructure
Incident response playbooks specifically for model theft

Coming Up Next

This is Part 1 of 3 in our vLLMania series.

Part 2: The Global Impact

Detailed analysis of the 1,343 vulnerable endpoints
Geographic distribution and industry breakdown
Case studies of real compromised servers
Cost analysis of models at risk

Part 3: Securing AI Infrastructure

Complete guide to securing vLLM deployments
Reference architectures for production AI serving
Monitoring and detection strategies
Building a model security program

Responsible Disclosure Timeline

December 1, 2024: Vulnerability discovered during client engagement
December 3, 2024: Reported to vLLM maintainers
December 10, 2024: Initial patch developed
December 20, 2024: vLLM v0.8.3 released with fix
January 5, 2025: Public disclosure after 35-day embargo
January 22, 2025: This blog post published

Get Protected

ModelsGuard provides comprehensive AI model security:

Continuous monitoring of AI infrastructure for vulnerabilities
Automatic detection of exposed inference endpoints
Real-time alerts on suspicious model access patterns
Model exfiltration prevention and DLP for AI
Compliance for HIPAA, GDPR, SOC2 in AI contexts

Check if your models are at risk - enter your email for a free security assessment.

This research was conducted by the ModelsGuard Security Research Team. For questions, contact research@modelsguard.com

CVE-2025-62164 details: NVD Entry | vLLM Release Notes