Pulling on the thread of MCP risks

Back in November, Anthropic introduced the Model Context Protocol (MCP) as the standard way for connecting AI assistants to data sources and tools. The goal was to give LLM models the data context needed to be effective at scale.

MCP has been described as the "USB-C of AI agents" connecting AI agents to data and tools through a common interface. Instead of one-off integrations, MCP provides a consistent communication pattern, enabling flexible tool use and smarter systems.

Yet, with every new technology, fundamental risks emerge and are now being exploited.

Since the release in November, the AI and security community have detailed various risks arising from MCP — from rug pulls (silent redefinitions) to WhatsApp MCP server attacks allowing message history exfiltration. With every new technology, we see a pattern of old risks surfacing as we saw with smart homes, connected cars, and more.

In the long list of possible attacks, there's a consistent thread tying many risks together: the lack of authentication (authn) and authorization (authz).

Across every resource on securing MCP, you'll always see recommendations for access controls, least privilege, service identity authentication, and further access-related precautions.

Fundamentally, the risk is that an agent can both unintentionally or maliciously take action on data beyond its intended scope. At the end of the day, MCP is about data access; data access risks by an LLM is the prevalent theme.

How this played out with the Github MCP vulnerability

On 5/26, Invariant labs discovered a vulnerability in the Github MCP integration where an attacker can hijack a user's agent via a malicious GitHub Issue and prompt it into leaking data from private repositories.

How does it work?

In summary, the vulnerability is an example of a prompt injection where attackers can hijack model behavior by embedding malicious instructions in prompts, causing the model to execute unintended agent calls.

In this case, an attacker can create a malicious issue on a user's public GitHub repo containing a prompt injection. The owner of the repo queries their agent which fetches the malicious issue and coerces the agent to pull private repo data into context. The data is then leaked into the auto-created PR in the public repo.

Why could it happen?

At its core, what we're seeing is a lack of implemented granular permissions.

Connecting Claude Desktop to GitHub involves adding a JSON block to your User Settings (JSON) file:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-github"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "github_pat_..."
      }
    }
  }
}

Here, Claude Desktop authenticates to GitHub via a personal access token (PAT). In its current state, PATs are used to authenticate the agent to GitHub resources. These PATs are scoped on permissions you set to then authorize the agent to take scoped actions.

In the case with the GitHub MCP vulnerability, if you set your PAT to provide access to every repository in GitHub, a prompt injection can potentially be used to access data from any repo. However, if the PAT only grants access to public repos, a prompt injection to access private repos would fail.

Why permissions aren't enough

The challenge is that you can only apply permissions to the level of granularity expressed by the base system (e.g., GitHub). At the root of prompt injection is a lack of granular permissions to prevent exploitation.

In GitHub, when you create a PAT or GitHub App, you have the option to provide three levels of access: No access, Read-only, Read and Write

GitHub PAT permission levels showing No access, Read-only, and Read and Write options

When you give write access to the contents permission, for example, you provide authorization across multiple endpoints.

GitHub API endpoints accessible via the contents permission

With most GitHub apps or use cases for PATs (such as connecting to the Github MCP via Claude), you need a combination of permissions across various endpoints. In the case of a security code review bot that commits a fix, opens a PR, and comments a summary of the change, the bot needs read and write to both contents and pull requests.

The bot gains ability to make commits through the contents endpoint.

GitHub contents endpoint showing commit creation capability

It gets access to create a PR through the pull requests endpoint.

GitHub pull requests endpoint showing PR creation access

One of the risks that exists is that you will also have to give access to merge PRs through the contents endpoint.

GitHub contents endpoint showing PR merge capability

However, given there's no way to break out these permissions, an organization must accept the risk or implement additional mitigating controls (which often come with limitations).

Applying this back to the GitHub vulnerability, prompt injection can allow access to data from private repos. The way to prevent this is by correctly permissioning repo access.

The solution to prevent prompt injection that coerces an agent from merging PRs is not possible through GitHub's permissioning system without severely limiting its functionality.

What can we do?

A solution to a limited permissioning system is to provide a proxy layer that defines an additional layer of permissions on top of what the system natively gives. This way, you're not limited by GitHub but can go beyond to fully implement least privilege.

Take for example a PAT which has read and write access to contents and pull requests.

GitHub PAT configuration with read and write access to contents and pull requests

We set up the GitHub MCP server for Claude with the following JSON:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-github"
      ],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "github_pat_...",
        "GITHUB_API_URL": "https://formal-proxy.example.com/github"
      }
    }
  }
}

With the GitHub MCP server configured, we asked Claude to create a new PR to update the README with MCP security details (an approved action).

Claude Desktop chat requesting creation of a new PR with MCP security details Claude Desktop response showing PR creation in progress

And it was successful!

Successful PR creation confirmation in Claude Desktop

Then we asked Claude to merge a PR. This is an action we don't want it to take but by nature of the given permissions it has the ability to do.

Claude Desktop chat requesting to merge an open PR Claude Desktop confirming it can merge the PR with given permissions

One proxy to rule them all

In a normal GitHub MCP set up, Claude would have access to the MCP server which then through the PAT accesses GitHub to fetch data and take actions.

Architecture diagram showing normal MCP flow: Claude to MCP Server to GitHub

With Formal, you put a proxy between the GitHub MCP server and GitHub. In this scenario, the GitHub MCP server is accessing a GitHub resource in Formal. This allows you to put additional controls on the actions Claude can take through policies and also gain visibility into actions agents are taking.

Architecture diagram showing Formal proxy flow: Claude to MCP Server to Formal Proxy to GitHub

With this, we built a policy that lets the agent take actions on GitHub but specifically blocks the ability to merge PRs.

Formal policy configuration blocking PR merge capabilities

We then prompted Claude to take the same action as before to merge an open PR. This time we see that the agent didn't have the granular permissions to merge the PR!

Claude Desktop attempting to merge a PR with Formal policy enforcement active Formal policy preventing the unauthorized PR merge action

The value here is that the agent can still create new PRs since the policy allowed for it. The goal is to not limit the ability of the agent but to provide the right guardrails in which it can operate.

Takeaways

MCP is becoming the standard way for AI agents to obtain access to data sources. Yet, the push for granular permissions doesn't truly mitigate risks because of the inherent limitation of each system. Even with OAuth, the agent just inherits the permission of the user, which is often too broad.

We believe that there's a need for a centralized agent permission control system that allows users to granularly implement least privilege beyond the capabilities defined by a system.

This is why our focus at Formal has always been to decouple access from the underlying system itself. We're excited for the next evolution of data connections for LLMs, and we believe that no matter what shape of data security risks come next, Formal will be able to secure the flow of data.