Pypi/Vllm | GitLab Advisory Database

vLLM vulnerable to DoS with incorrect shape of multimodal embedding inputs

Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page). The issue has existed ever since we added support for image embedding inputs, i.e. #6613 (released in v0.5.5)

vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`

The /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests

vLLM deserialization vulnerability leading to DoS and potential RCE

A memory corruption vulnerability that leading to a crash (denial-of-service) and potentially remote code execution (RCE) exists in vLLM versions 0.10.2 and later, in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks …

vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the chat_template and chat_template_kwargs parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources.

vLLM is vulnerable to timing attack at bearer auth

The API key support in vLLM performed validation using a method that was vulnerable to a timing attack. This could potentially allow an attacker to discover a valid API key using an approach more efficient than brute force.

vLLM is vulnerable to Server-Side Request Forgery (SSRF) through `MediaConnector` class

A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target hosts. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources. This vulnerability is particularly critical in containerized environments like llm-d, where a compromised vLLM pod could …

vLLM has remote code execution vulnerability in the tool call parser for Qwen3-Coder

An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call.

vllm API endpoints vulnerable to Denial of Service Attacks

A Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user.

vLLM vulnerable to Regular Expression Denial of Service

A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking.

vLLM Tool Schema allows DoS via Malformed pattern and type Fields

The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted.

vLLM Tool Schema allows DoS via Malformed pattern and type Fields

The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted.

vLLM has a Weakness in MultiModalHasher Image Hashing Implementation

In the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This …

vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`

A Regular Expression Denial of Service (ReDoS) vulnerability exists in the file vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py of the vLLM project. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable.

vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`

A Regular Expression Denial of Service (ReDoS) vulnerability exists in the file vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py of the vLLM project. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable.

vLLM DOS: Remotely kill vllm over http with invalid JSON schema

Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server

vLLM DOS: Remotely kill vllm over http with invalid JSON schema

Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server

vLLM allows clients to crash the openai server with invalid regex

A denial of service bug caused the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg, but for regex instead of a JSON schema. Issue with more details: https://github.com/vllm-project/vllm/issues/17313

vLLM allows clients to crash the openai server with invalid regex

A denial of service bug caused the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg, but for regex instead of a JSON schema. Issue with more details: https://github.com/vllm-project/vllm/issues/17313

Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

vLLM Allows Remote Code Execution via PyNcclPipe Communication Service

vLLM supports the use of the PyNcclPipe class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the PyNcclCommunicator class, while CPU-side control message passing is handled via the send_obj and recv_obj methods on the CPU side. A remote code execution vulnerability exists in the PyNcclPipe service. Attackers can exploit this by sending malicious serialized data to gain server control …

Remote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration

In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a SUB ZeroMQ socket and connect to an XPUB socket on the primary vLLM host.

vLLM Vulnerable to Remote Code Execution via Mooncake Integration

vLLM integration with mooncake is vaulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. This is a similar to GHSA - x3m8 - f7g5 - qhm7, the problem is in

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs.

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs.

Data exposure via ZeroMQ on multi-node vLLM deployment

In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts. Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a …

CVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0

https://github.com/vllm-project/vllm/security/advisories/GHSA-rh4j-5rhw-hr54 reported a vulnerability where loading a malicious model could result in code execution on the vllm host. The fix applied to specify weights_only=True to calls to torch.load() did not solve the problem prior to PyTorch 2.6.0. PyTorch has issued a new CVE about this problem: https://github.com/advisories/GHSA-53q9-r3pm-6pq6 This means that versions of vLLM using PyTorch before 2.6.0 are vulnerable to this problem.

vLLM vulnerable to Denial of Service by abusing xgrammar cache

This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3 The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 …

vLLM deserialization vulnerability in vllm.distributed.GroupCoordinator.recv_object

vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability.

vLLM Deserialization of Untrusted Data vulnerability

vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload to the MessageQueue, causing the victim's machine to execute arbitrary code.

vLLM allows Remote Code Execution by Pickle Deserialization via AsyncEngineRPCServer() RPC server entrypoints

vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.

vLLM denial of service via outlines unbounded cache on disk

The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server.

vLLM Allows Remote Code Execution via Mooncake Integration

When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP will allow attackers to execute remote code on distributed hosts.

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior.

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.

vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator

The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.

Advisories for Pypi/Vllm package

vLLM vulnerable to DoS with incorrect shape of multimodal embedding inputs

vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`

vLLM deserialization vulnerability leading to DoS and potential RCE

vLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server

vLLM is vulnerable to timing attack at bearer auth

vLLM is vulnerable to Server-Side Request Forgery (SSRF) through `MediaConnector` class

vLLM has remote code execution vulnerability in the tool call parser for Qwen3-Coder

vllm API endpoints vulnerable to Denial of Service Attacks

vLLM vulnerable to Regular Expression Denial of Service

vLLM Tool Schema allows DoS via Malformed pattern and type Fields

vLLM Tool Schema allows DoS via Malformed pattern and type Fields

vLLM has a Weakness in MultiModalHasher Image Hashing Implementation

vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`

vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in `pythonic_tool_parser.py`

vLLM DOS: Remotely kill vllm over http with invalid JSON schema

vLLM DOS: Remotely kill vllm over http with invalid JSON schema

vLLM allows clients to crash the openai server with invalid regex

vLLM allows clients to crash the openai server with invalid regex

Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

vLLM Allows Remote Code Execution via PyNcclPipe Communication Service

Remote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration

vLLM Vulnerable to Remote Code Execution via Mooncake Integration

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

Data exposure via ZeroMQ on multi-node vLLM deployment

CVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0

vLLM vulnerable to Denial of Service by abusing xgrammar cache

vLLM deserialization vulnerability in vllm.distributed.GroupCoordinator.recv_object

vLLM Deserialization of Untrusted Data vulnerability

vLLM allows Remote Code Execution by Pickle Deserialization via AsyncEngineRPCServer() RPC server entrypoints

vLLM denial of service via outlines unbounded cache on disk

vLLM Allows Remote Code Execution via Mooncake Integration

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator

vLLM denial of service vulnerability

vLLM Denial of Service via the best_of parameter