Skip to content

Add opt-in prompt injection detection middleware with pluggable strategyΒ #3080

@dgenio

Description

@dgenio

Summary

As MCP servers increasingly handle untrusted input from LLM-generated tool calls, there's growing need for built-in safety mechanisms. This issue proposes an opt-in middleware for detecting potential prompt injection attacks before tool execution.

Problem

MCP servers may receive malicious inputs designed to:

  • Override system instructions ("ignore previous instructions and...")
  • Exfiltrate data through tool calls
  • Escalate privileges by manipulating tool arguments

Currently, FastMCP provides no built-in protection against these attacks. Each server author must implement their own detection logic.

Proposed Solution

Add a PromptInjectionMiddleware with a pluggable detection strategy:

from fastmcp.server.middleware import PromptInjectionMiddleware
from fastmcp.server.middleware.safety import HeuristicDetector, InjectionDetector

# Default: heuristic-based detection (fast, offline, no dependencies)
mcp = FastMCP("secure-server")
mcp.add_middleware(PromptInjectionMiddleware())

# Custom: LLM-based detection (more accurate, requires API key)
class LLMDetector(InjectionDetector):
    async def detect(self, input_text: str) -> DetectionResult:
        # Call classifier LLM
        ...

mcp.add_middleware(PromptInjectionMiddleware(detector=LLMDetector()))

Detection Strategy Protocol

from typing import Protocol
from dataclasses import dataclass

@dataclass
class DetectionResult:
    is_suspicious: bool
    confidence: float  # 0.0-1.0
    reason: str | None = None
    matched_pattern: str | None = None

class InjectionDetector(Protocol):
    async def detect(self, input_text: str) -> DetectionResult: ...

# Built-in heuristic detector
class HeuristicDetector:
    """Pattern-based detection using known attack signatures."""
    
    PATTERNS = [
        r"ignore\s+(all\s+)?(previous|prior|above)\s+instructions?",
        r"disregard\s+(everything|all)\s+(above|before)",
        r"you\s+are\s+now\s+(a|an)\s+",
        r"new\s+instructions?:\s*",
        r"system\s*:\s*",
        # ... more patterns
    ]
    
    async def detect(self, input_text: str) -> DetectionResult:
        for pattern in self.PATTERNS:
            if re.search(pattern, input_text, re.IGNORECASE):
                return DetectionResult(
                    is_suspicious=True,
                    confidence=0.7,
                    matched_pattern=pattern
                )
        return DetectionResult(is_suspicious=False, confidence=0.0)

Middleware Configuration

PromptInjectionMiddleware(
    detector=HeuristicDetector(),  # Pluggable strategy
    action="block",                # "block" | "warn" | "log"
    confidence_threshold=0.6,      # Minimum confidence to trigger action
    scan_arguments=True,           # Scan tool arguments
    scan_resources=False,          # Scan resource URIs
)

Scope (v1)

This first iteration focuses on:

  • βœ… Input detection only (not output sanitization - that's a follow-up)
  • βœ… Tool arguments (not resource content or prompt messages)
  • βœ… Heuristic default (LLM-based detector as example, not built-in)

Benefits

  • Opt-in: Zero overhead for servers that don't need it
  • Pluggable: Enterprise users can provide sophisticated detectors
  • Extensible: Community can contribute detector implementations
  • Configurable: Adjustable sensitivity and actions

Implementation Notes

  • Ship with HeuristicDetector as default (no external dependencies)
  • Document known bypass limitations of heuristic approach
  • Provide example LLMDetector implementation in docs
  • Consider publishing pattern database as separate updatable resource

Related

  • OWASP LLM Top 10: LLM01 Prompt Injection
  • Follows middleware patterns established in FastMCP
  • Complements auth middleware for defense in depth

This issue was identified during an architecture review of the FastMCP codebase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement to existing functionality. For issues and smaller PR improvements.proposalA proposal for a feature or enhancement either requiring or seeking comments on its design.serverRelated to FastMCP server implementation or server-side functionality.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions