March 26, 2026

AWS Keeps Breaking Its Own Trust Boundaries

blast-radiusiamprivilege-escalationlateral-movement

AWS publishes security bulletins when vulnerabilities are found in their services, SDKs, and open-source projects. They're typically short, technical, and narrowly scoped: here's the CVE, here's the fix, upgrade now.

We read all 20 bulletins published between October 2025 and March 2026. Individually, they look like isolated bugs. Read together, a pattern emerges that should concern anyone running workloads on AWS.

The dominant theme isn't memory corruption or cryptographic weakness. It's trust boundary failures that chain a minor foothold into major privilege escalation. The same blast radius problem we keep seeing in real-world breaches.

The pattern

A trust boundary failure occurs when a system grants access, privileges, or capabilities beyond what was intended because it doesn't properly verify who is asking or what they're allowed to reach. In AWS, this typically manifests as:

An IAM trust policy that's broader than intended
An API that exposes sensitive data to callers who shouldn't have it
A credential that's accessible from a context where it shouldn't be
A service that executes code on behalf of a principal without validating the source

Across 20 bulletins, we found at least 7 that clearly follow this pattern. Here are the most instructive.

Case 1: Any principal with sts:AssumeRole becomes admin

Bulletin: AWS-2025-031 (CVE-2025-14503): Overly Permissive Trust Policy in Harmonix on AWS

Harmonix is an AWS-published open-source developer platform built on CNCF Backstage. When it provisions EKS environments, it creates an IAM role named *-eks-*-provisioning-role with administrative privileges for cluster setup. The trust policy on that role looked something like this:

{
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::123456789012:root" },
  "Action": "sts:AssumeRole"
}

This is a subtlety that catches even experienced AWS engineers. In IAM, the account root principal (arn:aws:iam::ACCOUNT:root) isn't just the root user. It's a delegation to the account's IAM policies. Any principal in the account whose identity policy allows sts:AssumeRole on this role (or on *) can assume it. That includes Lambda execution roles, ECS task roles, EC2 instance profiles, and developer IAM users. The trust policy becomes the union of every identity policy in the account that grants sts:AssumeRole.

The attack chain:

Any IAM principal with sts:AssumeRole
  → Assume *-eks-*-provisioning-role
    → Administrative privileges over EKS cluster
      → Kubernetes cluster-admin
        → Secrets, workloads, service accounts

This is a textbook lateral movement path. The kind an automated assessment would find in seconds by attempting AssumeRole against every role in the account, which is exactly what our agent does.

AWS fixed it in Harmonix v0.4.2 by scoping the trust policy to specific service roles rather than the account root. But here's the uncomfortable question: how many roles in your account trust the account root? You can check:

aws iam list-roles --query 'Roles[?AssumeRolePolicyDocument.Statement[?Principal.AWS==`arn:aws:iam::YOUR_ACCOUNT_ID:root`]].[RoleName,Arn]' --output table

If the output is empty, don't assume you're safe. list-roles returns URL-encoded trust policy documents that JMESPath can't query reliably. A more thorough approach is to iterate with get-role per role, or use a tool like access-undenied or our own agent to enumerate assumable roles by actually testing the AssumeRole calls.

Case 2: A Describe call that enables code execution

Bulletin: 2026-004-AWS (CVE-2026-1777, CVE-2026-1778): Security Findings in SageMaker Python SDK

This bulletin contains two separate findings, and both are trust boundary failures.

CVE-2026-1777: SageMaker's remote functions feature lets you decorate a local Python function with @remote and have it execute as a SageMaker training job. The function arguments and return values are serialized using cloudpickle, Python's pickle protocol, which can execute arbitrary code during deserialization. The only thing preventing a malicious payload from executing was an HMAC signature, and the per-job HMAC key was stored as a training job environment variable.

The problem: the DescribeTrainingJob API returns environment variables in its response. Anyone with sagemaker:DescribeTrainingJob permission could extract the HMAC key. But extracting the key alone isn't enough. The attacker also needs s3:PutObject on the training job's output bucket to overwrite the serialized result with their forged payload. The full chain requires both read and write permissions:

DescribeTrainingJob (read-only API call)
  → Response includes Environment.REMOTE_FUNCTION_SECRET_KEY
    + s3:PutObject to the job's output bucket
      → Forge cloudpickle payload with valid HMAC
        → Overwrite serialized result in S3
          → Arbitrary code execution when caller deserializes

Affected versions: SageMaker Python SDK v3 (before v3.2.0) and v2 (before v2.256.0).

CVE-2026-1778: The same SDK's Triton inference backend globally disabled SSL certificate verification (verify=False) on all HTTPS connections within the container. This means any HTTPS call from the training job (to S3, to external APIs, to internal services) was vulnerable to man-in-the-middle interception. It's a single line of code that collapses the TLS trust boundary for the entire process.

DescribeTrainingJob looks like a harmless read operation. It appears in dozens of AWS-managed policies. It's the kind of permission that gets granted with a wildcard and never revisited. But through the HMAC chain, it became a path to arbitrary code execution. And the TLS finding shows how a trust boundary can be silently disabled deep inside an SDK where nobody notices.

This is what makes blast radius analysis hard. The danger isn't in any single permission. It's in the transitive chain of what that permission unlocks. Static IAM policy review would never flag DescribeTrainingJob as dangerous. You need to trace the actual runtime path from permission to impact.

Case 3: Low-privilege database user escalates to superuser

Bulletin: AWS-2025-028 (CVE-2025-12967): Privilege Escalation in Aurora PostgreSQL Wrappers

Aurora PostgreSQL's connection wrappers (JDBC, Go, Node.js, Python, ODBC) had a vulnerability where a low-privilege authenticated database user could create a crafted function that executed with the permissions of other RDS users, escalating all the way to rds_superuser.

The bulletin doesn't detail the exact mechanism, but the recommended workaround (removing public from the search_path) strongly suggests it exploits PostgreSQL's function name resolution. When the AWS wrappers execute internal functions, they resolve names using the default search_path, which includes the public schema. A low-privilege user with CREATE permission on public (the PostgreSQL default, though security-hardened environments may revoke it) could create a function that shadows an internal wrapper function. When a higher-privileged user connects via the wrapper, it would execute the attacker's function instead:

Low-privilege database user
  → CREATE FUNCTION in public schema (shadows internal function)
    → Higher-privileged user connects via AWS wrapper
      → Wrapper resolves attacker's function first via search_path
        → Function executes with caller's privileges
          → Escalate to rds_superuser

Affected versions: AWS JDBC Wrapper (< 2.6.5), AWS Go Wrapper (< 2025-10-17), AWS NodeJS Wrapper (< 2.0.1), AWS Python Wrapper (< 1.4.0), AWS PGSQL ODBC Driver (< 1.0.1).

An application service account connected to Aurora with minimal query permissions could, through this vulnerability, become a superuser with access to every database, table, and management operation on the instance.

This matters for blast radius because it breaks the assumption that database-level least privilege contains the damage. You might have carefully scoped your application's database user to a handful of tables, but if the escalation path to superuser exists, the effective blast radius is every table on the instance.

Case 4: Fake IMDS serves real credentials

Bulletin: AWS-2025-021: IMDS Impersonation

AWS SDKs and CLI resolve credentials through a credential provider chain, a fixed sequence of locations checked in order: environment variables, shared credentials file, web identity token, SSO, ECS container credentials, and finally the EC2 Instance Metadata Service (IMDS) at 169.254.169.254. If none of the earlier providers return credentials, the SDK queries IMDS as a last resort, even when the code isn't running on EC2.

This means that on non-EC2 compute (on-premises servers, other cloud VMs, GitHub Actions runners, developer laptops), an attacker with a privileged network position can stand up a fake IMDS endpoint on 169.254.169.254 and serve arbitrary AWS credentials to any AWS tool running on that network:

AWS CLI/SDK running on non-EC2 compute
  → Credential provider chain exhausts env/file/SSO/ECS
    → Falls through to IMDS at 169.254.169.254
      → Attacker-controlled IMDS responds with crafted credentials
        → Tool uses attacker-provided credentials
          → Actions attributed to attacker's chosen identity

The twist: this isn't about stealing credentials. It's about injecting them. The attacker provides credentials to a role they control, and the victim's tool starts making API calls as that role. If that tool is running a CDK deploy, it deploys attacker-controlled infrastructure. If it's pushing to ECR, the attacker can inject a malicious container image. The blast radius depends entirely on what the tool was about to do.

Note that IMDSv2's session-based tokens don't help here. The fake IMDS can implement the full IMDSv2 PUT/GET flow. We covered IMDSv2's limitations in the context of the Capital One breach, and this bulletin reinforces the point: IMDS protections are about the real metadata service, not about preventing a fake one.

AWS's fix was documentation and SIGMA detection rules for monitoring IMDS traffic on non-EC2 hosts, rather than a code change, because this is an architectural trust boundary issue. The credential provider chain assumes that if IMDS responds, it's legitimate.

Case 5: S3 bucket squatting enables code injection at build time

Bulletin: 2026-008-AWS (CVE-2026-4269): Improper S3 Ownership Verification in Bedrock AgentCore Starter Toolkit

The Bedrock AgentCore Starter Toolkit didn't verify S3 bucket ownership during the build process. An attacker who created an S3 bucket with the expected name before the legitimate user could inject arbitrary code that would execute in the AgentCore Runtime. Only builds performed after September 24, 2025 on toolkit versions before v0.1.13 are affected.

Attacker creates S3 bucket with predictable name
  → Legitimate build process pulls from attacker's bucket
    → Malicious code injected during build
      → Code executes in AgentCore Runtime
        → Access to runtime's IAM role and environment

If this sounds familiar, it's because we wrote about the broader S3 namespace problem two weeks ago. AWS's March 2026 account-regional namespace fix addresses the root cause for new buckets, but this bulletin, published the same month, shows the pattern was still appearing in AWS's own tooling. The fix is simple: verify that the bucket is owned by the expected account before pulling build artifacts. But until someone thinks to add that check, the trust boundary between "S3 bucket exists with the right name" and "S3 bucket is legitimate" doesn't exist.

The blast radius of this attack is whatever the AgentCore Runtime's IAM role can reach. For an AI agent runtime that interacts with Bedrock, S3, and potentially other AWS services, that's likely a broad set of permissions, making this a particularly high-value target for supply chain compromise.

The pattern across all 20 bulletins

Not every bulletin fits the trust boundary pattern. Some are genuine memory safety issues (FreeRTOS ICMPv6 buffer over-reads), cryptographic flaws (AWS-LC PKCS7 verification bypasses), or application bugs (Wickr audio stream not terminating). But the breakdown is telling:

Category	Count	Examples
Trust boundary / privilege escalation	7	Harmonix, SageMaker, Aurora, IMDS, AgentCore, WorkSpaces, MCP Server
Memory safety / input validation	5	FreeRTOS, Ion-C, Ion-Dotnet, runc, Firecracker
Cryptographic issues	3	AWS-LC (3 CVEs), S3 encryption key commitment
Application logic	2	Wickr audio, RES desktop preview
Code execution (dev tools)	2	Kiro IDE (2 separate vulns)
Build / supply chain	1	React Server Components RCE

Some bulletins span categories (the S3 encryption key commitment issue is both cryptographic and a trust boundary problem), but even conservatively, over a third involve a trust boundary gap that creates a privilege escalation or lateral movement path.

What this means for your environment

These bulletins are about AWS's own code. AWS found and fixed these issues through internal review, external research, and responsible disclosure. That's the system working as intended.

But consider what the pattern implies:

These are code bugs, not configuration mistakes, and there's a meaningful difference. But the trust boundary patterns behind them are the same ones that show up as configuration issues in customer environments. AWS engineers write an overly broad trust policy in Harmonix. Your team writes an overly broad trust policy on a deployment role. The bug is different. The blast radius problem is identical.

The Harmonix trust policy is a mistake any team could make. Trusting the account root in a role trust policy is a common pattern that feels scoped ("it's only principals in this account") but is effectively open to any identity with sts:AssumeRole. The SageMaker HMAC exposure is the kind of transitive risk that doesn't show up in a policy review. It requires tracing the actual data flow from API call to impact.

The patterns that AWS keeps finding in its own code are the same patterns that exist in real environments:

Roles with overly broad trust policies that allow lateral movement across the account
Read-only permissions that chain into write access or code execution through credential exposure
Service identities with permissions that far exceed their operational needs
Resource ownership assumptions that don't hold under adversarial conditions

Key takeaways

Trust boundary failures are the most common vulnerability class in AWS's own security bulletins. Over a third of the last 20 bulletins involve a trust boundary gap that enables privilege escalation or lateral movement. This isn't a coincidence. It reflects the fundamental complexity of IAM and trust relationships in cloud environments.
"Read-only" doesn't mean "harmless." The SageMaker bulletin demonstrates that a Describe* API call can be the first step in a chain to code execution. Static policy review that categorises permissions by action type misses these transitive risks.
Audit your role trust policies. The Harmonix pattern, trusting the account root, is common and dangerous. Every role that trusts arn:aws:iam::ACCOUNT:root is a potential lateral movement target for any principal in the account. Enumerate them. Tighten them.
Trace the blast radius, not just the permission. A permission's danger isn't determined by its IAM action name. It's determined by what that permission can reach through credential chains, resource-based policies, and service integrations. Automated assessment that follows these paths finds what static analysis cannot.
If AWS ships these bugs, assume you have them too. The consistent pattern across AWS's own tools and SDKs should calibrate your expectations for your own infrastructure. Regular, automated blast radius validation isn't paranoia. It's the only way to keep up with the rate at which trust boundaries silently erode.

Ready to find the trust boundary failures in your environment? Sign up for hackaws.cloud and let our autonomous agent trace every lateral movement and privilege escalation path from your AWS identities.