Cloud-Native Security
Cloud-native architecture — containers, Kubernetes, serverless, CI/CD pipelines — changes everything about how you secure applications. This module covers the full cloud-native security stack from container images to multi-cloud posture management.
Cloud-Native Architecture & Security Implications
Cloud-native isn't just "running in the cloud." It's a fundamentally different architecture: microservices instead of monoliths, containers instead of VMs, orchestration instead of manual deployment, and infrastructure defined as code rather than configured by hand. Each of these architectural shifts creates new attack surfaces — and new opportunities for security.
Monolith vs Microservices Security
| Aspect | Monolith | Microservices |
|---|---|---|
| Attack surface | Single application, fewer endpoints, simpler perimeter | Dozens to hundreds of services, each with its own API, each a potential entry point |
| Lateral movement | Compromise the app = compromise everything in it | Compromise one service ≠ automatic access to others (if properly segmented) |
| Authentication | Single auth system, session management | Service-to-service auth (mTLS, JWT, API keys), identity propagation across services |
| Secrets management | Config files, environment variables | Centralized secrets management (Vault, AWS Secrets Manager), rotation across many services |
| Patching | Patch one deployment | Patch potentially hundreds of independently deployed services |
| Monitoring | Single log stream, straightforward tracing | Distributed tracing (Jaeger, Zipkin), log aggregation across services, correlation complexity |
The security paradox of microservices: The attack surface is larger (more services, more APIs, more network communication), but the blast radius of a single compromise is smaller (proper segmentation isolates each service). The net security outcome depends entirely on how well you implement service isolation, authentication, and monitoring. Done well, microservices are more secure than monoliths. Done poorly, they're a nightmare — hundreds of unmonitored, unauthenticated services communicating over flat networks.
The Cloud-Native Security Stack
- Container security: Image scanning, runtime protection, rootless containers (Lesson 02)
- Orchestration security: Kubernetes RBAC, network policies, pod security (Lesson 03)
- Pipeline security: CI/CD hardening, artifact signing, supply chain integrity (Lesson 04)
- Infrastructure as Code: Policy-as-code, drift detection, secure state management (Lesson 05)
- Serverless security: Function hardening, event injection prevention (Lesson 06)
- Service mesh: mTLS everywhere, API gateway patterns (Lesson 07)
- Cloud-native monitoring: eBPF, runtime detection, container forensics (Lesson 08)
- Multi-cloud posture: CSPM, CNAPP, unified policy (Lesson 09)
Container Security
Containers are the building blocks of cloud-native applications. A container packages an application with all its dependencies into a standardized unit that runs consistently across environments. But containers also package vulnerabilities, misconfigurations, and potential security issues — and they deploy much faster than traditional infrastructure, which means bad configurations propagate at container speed.
Container Threat Model
| Attack vector | Risk | Control |
|---|---|---|
| Vulnerable base image | OS-level CVEs in the base image (Ubuntu, Alpine) propagate to every container built on it | Minimal base images (distroless, Alpine), regular rebuilds, image scanning in CI/CD |
| Vulnerable dependencies | Application libraries with known CVEs packaged into the image | SCA scanning (Trivy, Snyk), dependency pinning, automated rebuild on CVE disclosure |
| Embedded secrets | API keys, passwords, tokens hardcoded in image layers | Secret scanning in CI/CD, never put secrets in Dockerfiles, use runtime secret injection |
| Running as root | Container processes running as root can escalate to host if combined with a container escape vulnerability | Run as non-root user (USER directive), read-only filesystem, drop all capabilities |
| Container escape | Exploiting kernel vulnerability to break out of container isolation and access the host | Keep host kernel patched, use gVisor or Kata Containers for stronger isolation, seccomp profiles |
| Malicious images | Pulling images from untrusted registries that contain backdoors or cryptominers | Private registry only, image signing and verification (cosign/Notary), admission controllers that reject unsigned images |
Image Scanning Pipeline
Scan at four points:
1. Build time: Scan in CI/CD pipeline before pushing to registry. Gate on critical/high CVEs. Tools: Trivy, Snyk Container, Docker Scout.
2. Registry: Continuous scanning of images in your registry. New CVEs are discovered daily — an image that was clean yesterday might have a critical CVE today.
3. Admission: Kubernetes admission controller rejects pods with unscanned or non-compliant images. Only verified, scanned images can run in production.
4. Runtime: Monitor running containers for anomalous behavior — unexpected network connections, file modifications, process execution. Tools: Falco, Sysdig Secure, Aqua.
Secrets Management for Containers
Never put secrets in container images, environment variables visible in process lists, or Docker Compose files committed to Git. Instead:
- Kubernetes Secrets: Native but base64-encoded (not encrypted). Better than environment variables but not great. Encrypt at rest with a KMS provider.
- HashiCorp Vault: Dynamic secrets, automatic rotation, audit logging. The gold standard for container secrets management. Integrates with Kubernetes via sidecar injector or CSI driver.
- Cloud provider secrets: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager. Good for single-cloud deployments. Use CSI Secret Store driver for Kubernetes integration.
- Sealed Secrets / SOPS: Encrypt secrets in Git (GitOps-friendly). Decrypted only in-cluster. Good for teams practicing GitOps.
Kubernetes Security
Kubernetes is the dominant container orchestration platform — and one of the most complex systems a security team will encounter. Its power comes from flexibility, but that flexibility means Kubernetes is insecure by default. Nearly every security-relevant setting must be explicitly configured.
Kubernetes Attack Surface
The API server is the crown jewel. Every Kubernetes operation goes through the API server — deploying pods, reading secrets, modifying configurations. Compromise the API server and you control the entire cluster. Secure it: restrict network access (private endpoint), strong authentication (no anonymous auth), RBAC for authorization, audit logging enabled, TLS everywhere.
etcd stores all cluster state. Every secret, every configuration, every pod specification is stored in etcd. If etcd is compromised, the attacker has everything — including all Kubernetes Secrets in plaintext (they're only base64-encoded). Encrypt etcd at rest, restrict network access, enable TLS client certificates.
Essential Kubernetes Security Controls
| Control | What it does | Default |
|---|---|---|
| RBAC | Role-Based Access Control — who can do what in the cluster. Assign minimum permissions to users, service accounts, and applications. | Enabled but often over-permissioned. Audit and restrict. |
| Network Policies | Firewall rules for pod-to-pod communication. By default, every pod can talk to every other pod — no segmentation. | No policies = all traffic allowed. Must be explicitly defined. |
| Pod Security Standards | Three levels: Privileged (unrestricted), Baseline (prevents known escalations), Restricted (hardened). Applied per namespace. | Privileged (no restrictions). Set Baseline minimum, Restricted for production. |
| Admission Controllers | Intercept API requests before persistence. Can enforce policies: no privileged containers, require resource limits, reject unsigned images. | Several built-in enabled. Add OPA Gatekeeper or Kyverno for custom policies. |
| Secrets Encryption | Encrypt Kubernetes Secrets at rest in etcd using a KMS provider. | Secrets stored base64-encoded (NOT encrypted) by default. |
| Audit Logging | Log all API server requests — who did what, when, from where. | Disabled by default on many distributions. Must be explicitly enabled and configured. |
Network Policies — The Most Neglected Control
By default, Kubernetes allows all pod-to-pod communication within a cluster. This means a compromised pod in the web frontend can directly access the database pod, the secrets management pod, and every other service. Network policies create segmentation:
- Default deny: Start by denying all ingress and egress traffic, then explicitly allow only what's needed. This is the zero trust approach applied to Kubernetes networking.
- Namespace isolation: Pods in the "production" namespace shouldn't communicate with pods in "development" unless explicitly allowed.
- Requires a CNI that supports network policies: Calico, Cilium, Antrea. The default kubenet does not enforce network policies — they silently do nothing.
Tesla's Kubernetes cluster was cryptojacked in 2018. Attackers found an unsecured Kubernetes dashboard (no authentication), gained cluster access, deployed cryptocurrency miners, and accessed Tesla's AWS credentials stored in Kubernetes. The cluster had no RBAC restrictions on the dashboard, no network policies isolating the dashboard, and AWS credentials stored as plain Kubernetes Secrets (base64, not encrypted). Every one of these was a configuration default that should have been changed.
CI/CD Pipeline Security
Your CI/CD pipeline is the most privileged system in your organization. It has credentials to deploy to production, access to source code, secrets for external services, and the ability to modify running infrastructure. Compromising the pipeline is often easier and more impactful than compromising the production environment directly.
CI/CD Attack Vectors
| Vector | Attack | Example |
|---|---|---|
| Compromised dependency | Malicious code in a package pulled during build | event-stream (npm), ua-parser-js — attackers gained maintainer access and injected malware |
| Dependency confusion | Attacker publishes a malicious package with same name as internal package on public registry | Researcher Alex Birsan demonstrated this against Apple, Microsoft, PayPal (2021) |
| Pipeline poisoning | Attacker modifies pipeline configuration to inject malicious steps | Codecov (2021): modified Bash Uploader script exfiltrated CI/CD environment variables |
| Secret extraction | Pipeline secrets (API keys, deploy tokens) exfiltrated via modified build steps | Exposed GitHub Actions secrets via workflow modification in pull requests |
| Artifact tampering | Build artifacts modified between build and deployment | SolarWinds: malicious code inserted in build pipeline, distributed as legitimate update |
GitHub Actions Hardening
1. Pin action versions to SHA: Use actions/checkout@a1b2c3d4 not actions/checkout@v4. Version tags can be moved; commit SHAs cannot. A compromised action at @v4 affects everyone using it.
2. Minimize secrets scope: Use environment-level secrets, not repository-level. Require approval for deployments to production environments. Never log secrets — even accidentally.
3. Restrict pull request workflows: Don't run workflows that access secrets on pull requests from forks. Use pull_request_target with extreme caution — it runs with the base branch's secrets.
4. Enable branch protection: Require PR reviews before merge to main. Require status checks to pass. No force pushes to protected branches.
5. Use OIDC for cloud auth: Instead of storing long-lived cloud credentials as secrets, use OpenID Connect federation. GitHub proves identity to AWS/Azure/GCP without sharing credentials.
6. Audit workflow changes: Monitor changes to .github/workflows/ files. Any modification to the pipeline is a security-relevant event.
SLSA Framework (Supply Chain Levels for Software Artifacts)
SLSA (pronounced "salsa") defines four levels of supply chain security maturity:
| Level | Requirements | What it prevents |
|---|---|---|
| SLSA 1 | Build process documented, provenance generated | Prevents undocumented builds. You know what was built and by what process. |
| SLSA 2 | Hosted build service, authenticated provenance | Prevents tampering after build. Provenance is signed and verifiable. |
| SLSA 3 | Hardened build platform, non-falsifiable provenance | Prevents build platform compromise. Provenance generated by the platform, not the build script. |
| SLSA 4 | Two-person review, hermetic builds, reproducible | Prevents insider threats. No single person can modify the build without review. |
Most organizations should target SLSA 2-3. Level 1 is table stakes. Level 4 is for high-security environments (national security, financial infrastructure).
Infrastructure as Code Security
Infrastructure as Code (IaC) means defining your infrastructure — servers, networks, databases, permissions — in version-controlled configuration files rather than manual console clicks. Terraform, CloudFormation, Pulumi, and Ansible are the most common tools. IaC is a security gift (reproducible, auditable, reviewable) and a security risk (misconfigurations deployed at scale, secrets in state files).
IaC Security Scanning
Scan IaC before deployment, not after. A misconfigured S3 bucket in a Terraform file is a known risk before it's deployed. Finding it in a PR review costs nothing. Finding it after deployment (via CSPM) means the misconfiguration was live — potentially exposing data.
Tools:
Checkov: Open source, supports Terraform/CloudFormation/Kubernetes/Docker. 1000+ built-in policies. Integrates into CI/CD.
tfsec: Terraform-specific scanner. Fast, focused, good for Terraform-heavy shops. Now part of Trivy.
Terrascan: Multi-framework scanner with OPA-based policy engine. Good for organizations that want custom policies.
Snyk IaC: Commercial option with developer-friendly output and IDE integration.
What they catch: Public S3 buckets, overly permissive IAM policies, unencrypted databases, missing logging, security groups allowing 0.0.0.0/0, missing encryption at rest, default credentials.
Policy-as-Code with OPA
Open Policy Agent (OPA) lets you define security policies as code — in Rego, a declarative policy language. Instead of documenting that "all S3 buckets must be encrypted" in a wiki, you encode it as a policy that automatically rejects non-compliant Terraform plans. OPA integrates with Terraform (via conftest), Kubernetes (via Gatekeeper), CI/CD pipelines, and API gateways. One policy language, enforced everywhere.
State File Security
Terraform state files contain the complete current state of your infrastructure — including sensitive values like database passwords, API keys, and private IPs. They're often stored in S3 buckets or cloud storage. If the state file is compromised, the attacker has a blueprint of your entire infrastructure plus embedded secrets.
- Encrypt state at rest: Use encrypted S3 buckets or cloud storage with KMS encryption.
- Lock state: Use DynamoDB (AWS) or similar for state locking to prevent concurrent modifications.
- Restrict access: State files should be accessible only to the CI/CD pipeline and infrastructure team. Not all developers need state access.
- Avoid secrets in state: Use vault references instead of inline secrets. Terraform's
sensitiveattribute marks outputs but doesn't encrypt them in state.
Drift Detection
Configuration drift occurs when the actual infrastructure state diverges from the IaC definition — someone made a manual change in the console that bypasses code review. Drift is a security risk because it means undocumented, unreviewed changes exist. Detect drift with: terraform plan in CI/CD (alerts on any delta), CSPM tools that compare running config to IaC, and automated remediation that reverts unauthorized changes.
Serverless Security
Serverless computing (AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers) shifts more infrastructure responsibility to the cloud provider. You don't manage servers, OS patching, or scaling. But "serverless" doesn't mean "securityless" — the attack surface shifts from infrastructure to application logic, event triggers, and IAM permissions.
Serverless Attack Surface
| Attack vector | How it works | Defense |
|---|---|---|
| Event injection | Malicious data in event triggers (API Gateway, S3 events, SQS messages, CloudWatch events). The function processes untrusted input. | Input validation on every event source. Never trust event data — sanitize and validate. |
| Over-permissioned IAM | Function has broader permissions than needed. lambda:* or s3:* when it only needs to read one bucket. | Least-privilege IAM per function. Each function gets its own role with minimum permissions. |
| Dependency vulnerabilities | Vulnerable libraries in deployment packages. Same risk as containers but often overlooked because "it's just a function." | SCA scanning in CI/CD. Pin dependency versions. Minimal deployment packages. |
| Cold start data leakage | Lambda execution contexts can be reused between invocations. Data from previous execution (temp files, global variables) may persist. | Don't store sensitive data in global scope or /tmp without cleanup. Assume execution context is shared. |
| Function chaining abuse | Attacker triggers a chain of functions to amplify impact — one compromised function invokes others, each with different permissions. | Service-to-service authentication even within serverless workflows. Don't rely on "it's internal" as a security control. |
Serverless IAM — The Critical Control
In serverless, IAM is your primary security control. There's no network perimeter, no firewall rules, no host-based security. The function's IAM role determines what it can access. Over-permissioned functions are the #1 serverless security issue.
Principle of least privilege per function: Each Lambda/Cloud Function gets its own IAM role. A function that reads from one S3 bucket should have s3:GetObject on that specific bucket ARN — not s3:* on *. A function that writes to one DynamoDB table should have dynamodb:PutItem on that table — not dynamodb:*.
Why it's hard: Developers use broad permissions during development ("it works now, I'll tighten it later") and never tighten them. Tooling helps: AWS IAM Access Analyzer identifies unused permissions, and tools like Repokid can automatically right-size IAM policies based on actual usage.
A fintech company's Lambda function for processing payment webhooks had an IAM role with s3:*, dynamodb:*, and sqs:* across all resources. An event injection vulnerability in the webhook parser allowed an attacker to execute arbitrary commands within the Lambda context. Because the IAM role was over-permissioned, the attacker read customer financial data from S3, modified transaction records in DynamoDB, and injected malicious messages into the processing queue. With least-privilege IAM, the blast radius would have been limited to the single DynamoDB table the function actually needed.
Service Mesh & API Security
In a microservices architecture, services communicate constantly — hundreds or thousands of API calls per second between dozens of services. Securing this "east-west" traffic (service-to-service, within the cluster) is as important as securing "north-south" traffic (external requests coming in). A service mesh provides the infrastructure layer for this.
What a Service Mesh Provides
| Capability | Security value | Implementation |
|---|---|---|
| mTLS everywhere | All service-to-service communication encrypted and mutually authenticated. No plaintext traffic within the cluster. | Automatic certificate provisioning and rotation. Services don't need to manage TLS themselves. |
| Authorization policies | Fine-grained access control: "Service A can call Service B's /api/orders endpoint with GET only." | Policy defined in mesh configuration, enforced by sidecar proxies. |
| Observability | Every service call is logged with source, destination, latency, status code. Full visibility into communication patterns. | Distributed tracing, metrics, access logs — all without application code changes. |
| Rate limiting | Per-service rate limits prevent abuse and contain cascading failures. | Configured at the mesh level, enforced per-service or per-endpoint. |
| Traffic control | Canary deployments, circuit breaking, fault injection for testing resilience. | Mesh-level traffic management without application awareness. |
Service Mesh Options
- Istio: Most feature-rich, most complex. Good for large enterprises with dedicated platform teams. Steep learning curve. Based on Envoy proxy.
- Linkerd: Simpler, lighter, Rust-based data plane. Faster to deploy, lower resource overhead. Good for mid-market. CNCF graduated project.
- Cilium: eBPF-based networking and security. No sidecar proxies needed (lower overhead). Growing rapidly. Good for organizations already using Cilium for networking.
- AWS App Mesh / GCP Traffic Director: Cloud-native mesh options. Good if you're single-cloud and want managed infrastructure.
API Gateway for Microservices
The API gateway is the single entry point for external traffic into your microservices. It handles: authentication (validate JWT tokens, API keys), rate limiting (protect backend services from abuse), request validation (schema enforcement, size limits), routing (direct requests to appropriate services), and TLS termination (handle HTTPS at the edge).
The gateway is not a replacement for service-level security. It handles north-south traffic. The service mesh handles east-west. Both are needed. A request that passes the gateway still needs authorization checks at the service level — the gateway verifies "is this a valid user?" but each service verifies "can this user access this specific resource?"
Cloud-Native Monitoring & Detection
Traditional security monitoring was designed for persistent servers with stable IP addresses and predictable behavior. Cloud-native environments are ephemeral — containers spin up and down in seconds, serverless functions exist for milliseconds, and IP addresses are meaningless. Your monitoring strategy must adapt.
eBPF — The Game Changer
eBPF (extended Berkeley Packet Filter) allows running sandboxed programs inside the Linux kernel without modifying kernel source code or loading kernel modules. For security, this means: deep visibility into system calls, network traffic, and process behavior at kernel level — without the performance overhead of userspace agents.
Security applications: Runtime container monitoring (detect unexpected process execution, file access, network connections), network policy enforcement without iptables overhead, file integrity monitoring, and syscall filtering. eBPF sees everything a container does at the kernel level — it can't be evaded by userspace techniques.
Tools: Cilium Tetragon (runtime enforcement), Falco (runtime detection), Tracee (Aqua Security), and Isovalent Enterprise for full eBPF-based security.
Falco — Runtime Threat Detection
Falco (CNCF incubating project) monitors containers and hosts for anomalous behavior using rules that define what's normal and alert on deviations:
- Unexpected process execution: "A shell was spawned inside a container that should only run a web server." This catches container escapes, reverse shells, and post-exploitation activity.
- Sensitive file access: "A process read /etc/shadow" or "a container modified /etc/passwd." Detects privilege escalation attempts.
- Anomalous network activity: "A container established an outbound connection to a new IP on port 4444." Detects command-and-control communication.
- Kubernetes API abuse: "A service account listed all secrets in the cluster." Detects reconnaissance and privilege escalation within Kubernetes.
Container Forensics
When a container is compromised, traditional forensics tools don't apply — there's no persistent disk to image, no memory to dump (once the container is gone). Cloud-native forensics requires: pre-configured audit logging (container runtime, Kubernetes API, cloud provider), runtime snapshots before container termination, network flow data from the CNI or service mesh, and eBPF-captured syscall traces. Plan your forensic capability before you need it — during an incident is too late to enable logging.
A cryptocurrency exchange detected anomalous CPU usage in their Kubernetes cluster. Investigation found a cryptominer running inside a container that had been deployed through a compromised CI/CD pipeline. The attacker modified a GitHub Actions workflow to inject a cryptominer into the Docker image during build. Without runtime monitoring (Falco), the miner would have run indefinitely — it didn't trigger any vulnerability scans because it wasn't a known vulnerability, just unauthorized software. The detection came from a Falco rule alerting on unexpected process execution: "binary not in the original container image was executed."
Multi-Cloud Security Posture
Most organizations use multiple cloud providers — AWS for compute, Azure for Microsoft integration, GCP for data/AI. Each has its own security model, its own IAM, its own logging, and its own misconfigurations. Your job as CISO is maintaining a consistent security posture across all of them.
CSPM, CWPP, and CNAPP — The Alphabet Soup
| Tool category | What it does | Key vendors |
|---|---|---|
| CSPM (Cloud Security Posture Management) | Continuously scans cloud configurations for misconfigurations, compliance violations, and drift. The "audit scanner" for cloud. | Wiz, Orca, Prisma Cloud, Lacework, native (AWS Security Hub, Azure Defender, GCP SCC) |
| CWPP (Cloud Workload Protection Platform) | Protects workloads: VMs, containers, serverless. Vulnerability scanning, runtime protection, file integrity. | CrowdStrike, SentinelOne, Aqua, Sysdig |
| CNAPP (Cloud-Native Application Protection Platform) | Converges CSPM + CWPP + CIEM + pipeline security into a single platform. The "everything platform" for cloud security. | Wiz, Prisma Cloud, Orca, Sysdig, Lacework |
| CIEM (Cloud Infrastructure Entitlement Management) | Manages and audits cloud IAM: finds over-permissioned identities, unused permissions, cross-account access risks. | Wiz, Ermetic (now part of Tenable), Zscaler, CrowdStrike |
Unified Multi-Cloud Security
The challenge: Each cloud provider names things differently (Security Groups vs NSGs vs Firewall Rules), structures IAM differently (AWS IAM vs Azure Entra ID vs GCP Cloud IAM), and logs differently (CloudTrail vs Activity Log vs Audit Logs). Your security team needs to understand all three — or use tooling that normalizes them.
Two approaches:
Cloud-native per provider: Use each provider's native security tools (AWS Security Hub, Azure Defender, GCP SCC). Pros: deep integration, included in licensing. Cons: no cross-cloud visibility, three different dashboards, three different alert formats.
Third-party CNAPP: Single platform that spans all clouds. Pros: unified visibility, normalized alerts, cross-cloud correlation. Cons: additional cost, potential lag behind cloud provider features, another vendor to manage.
For organizations with 2+ cloud providers, a CNAPP is generally worth the investment. For single-cloud shops, native tools are usually sufficient.
The Multi-Cloud Security Baseline
- Identity: Federated identity across all clouds from a single IdP (Azure AD/Entra ID or Okta). SSO for all cloud consoles. MFA enforced everywhere. No long-lived access keys.
- Logging: All cloud audit logs forwarded to a central SIEM. CloudTrail + Activity Log + Audit Logs → Splunk/Sentinel/Elastic. Retention: minimum 12 months hot, 24 months cold.
- Network: Consistent segmentation model across clouds. No default-allow security groups. Private endpoints for all data services. No public S3 buckets/Blob containers/GCS buckets.
- Encryption: Customer-managed keys (CMK/CMEK) for all data at rest. TLS 1.2+ for all data in transit. Key management centralized (or at minimum, consistent policy across providers).
- Posture: CSPM scanning all accounts/subscriptions/projects continuously. Alerts on critical misconfigurations within 1 hour. Automated remediation for known-safe fixes.
AI in Cloud-Native Security
Cloud-native environments generate massive volumes of security data — container events, Kubernetes API calls, network flows, cloud audit logs, IaC scan results. The scale exceeds human analysis capacity. AI and ML are increasingly essential for making sense of this data, detecting anomalies, and automating response.
AI-Assisted Cloud Security
| Application | How AI helps | Maturity |
|---|---|---|
| Misconfiguration detection | ML models learn "normal" cloud configurations and flag deviations. More context-aware than rule-based scanning — understands that a public bucket is fine for a static website but dangerous for a data lake. | Production-ready |
| IAM anomaly detection | Baseline normal permission usage patterns, alert when an identity accesses resources it's never accessed before. Catches compromised credentials and insider threats. | Production-ready |
| Container behavioral analysis | Learn the expected behavior of each container image (processes, network connections, file access) and alert on deviations. Catches zero-day container escapes. | Production-ready |
| Automated remediation | AI triages alerts, determines confidence level, and auto-remediates high-confidence issues: revoke exposed credentials, isolate compromised containers, block suspicious network connections. | Emerging |
| Attack path analysis | Graph-based AI maps all possible attack paths through your cloud environment: "if this EC2 instance is compromised, the attacker can reach the database via this IAM role." Prioritizes fixes by attack path impact. | Production-ready (Wiz, Orca) |
| IaC fix suggestions | AI suggests specific code changes to fix IaC misconfigurations. Instead of "this S3 bucket is public," it provides the exact Terraform change to fix it. | Emerging |
AIOps for Cloud Security
AIOps applies ML to IT operations data to reduce alert noise, correlate events, and predict issues before they occur. For cloud security, this means:
Alert correlation: Instead of 50 separate alerts about IAM changes, network anomalies, and S3 access patterns, AIOps correlates them into one incident: "possible account compromise — unusual IAM activity followed by data access anomaly."
Noise reduction: ML learns which alert combinations are false positives in your environment and auto-suppresses them. Reduces SOC workload by 40-60%.
Predictive security: Based on configuration patterns, predict which resources are likely to be misconfigured or compromised next. Proactive hardening instead of reactive response.
A SaaS company running 200+ microservices on Kubernetes across AWS and GCP deployed Wiz as their CNAPP. Within the first scan, Wiz's attack path analysis identified a critical chain: an internet-facing pod with a known CVE could reach the Kubernetes API server (no network policy), which had a misconfigured RBAC role allowing secret enumeration, which included database credentials for their production customer database. No single finding was critical alone — the vulnerability was medium severity, the RBAC misconfiguration was a common pattern, and the secret was "properly" stored in Kubernetes. But the combination created a direct attack path from the internet to customer data. The fix took 30 minutes. Finding it without AI-powered attack path analysis would have required a manual audit that might never have connected these three dots.
CNAPP and Cloud Security Posture Management (CSPM)
As organizations scale multi-cloud operations, managing separate tools for infrastructure security, workload protection, and permissions becomes unmanageable. Cloud-Native Application Protection Platforms (CNAPP) emerged to solve this by consolidating disparate point solutions into a single integrated view from "code to cloud."
The Triad of Cloud Security (The Building Blocks of CNAPP)
A true CNAPP isn't just a marketing term; it requires the deep integration of three foundational pillars. When these pillars share a data lake, they can calculate "toxic combinations" of risk that isolated tools miss.
| Pillar | Full Name | Core Purpose | Example Finding |
|---|---|---|---|
| CSPM | Cloud Security Posture Management | Evaluates cloud control plane configurations against benchmarks (CIS, NIST). | "This S3 bucket is publicly readable." |
| CWPP | Cloud Workload Protection Platform | Secures the actual compute layers (VMs, containers, serverless) from runtime threats and vulnerabilities. | "Log4j vulnerability found in running EKS pod." |
| CIEM | Cloud Infrastructure Entitlement Management | Manages identities and access rights, enforcing least privilege across cloud providers. | "This IAM role has AdminAccess but only uses S3:GetObject." |
The real value of CNAPP is finding the intersection of risks. A vulnerability (CWPP finding) on an internal, air-gapped server is low risk. An over-permissive IAM role (CIEM finding) on a secure server is medium risk.
But when CNAPP analyzes the graph, it finds a Toxic Combination: An EC2 instance exposed to the internet (CSPM) running an OS with a critical remote code execution vulnerability (CWPP), and attached to an IAM role with wildcards `s3:*` allowing deletion of production backups (CIEM). CNAPP flags this exact intersection as a SEV1 priority, filtering out the noise of thousands of isolated low-risk alerts.
Auto-Remediation Strategies
Detecting misconfigurations is easy; fixing them without breaking production is the challenge. CSPM deployments should be rolled out in phases:
- Phase 1: Visibility & Notification: The CSPM reads the environment and generates alerts. Sent to Jira/Slack. No automated action.
- Phase 2: Automated "Guardrails" (Preventative): Shifting left. The CNAPP integrates with CI/CD pipelines to block Terraform or CloudFormation templates that violate policy before they are deployed.
- Phase 3: Auto-Remediation (Reactive): For highly mature environments, the CSPM detects state changes via EventBridge (e.g., an S3 bucket goes public), triggers a Lambda function, and forcibly removes the public ACL within seconds.
When deploying a CNAPP, do not turn on all rules at once. You will generate 50,000 alerts on day one and the engineering teams will ignore you. Start by selecting 10 critical "paved road" rules (e.g., "No public RDS instances", "No root user login without MFA"). Enforce those ruthlessly. Once the baseline is clean, slowly dial up the strictness of the policies.
Self-Check Quiz
Test your understanding of Module 06. Select the best answer for each question.