OCI Container Runtimes: containerd vs CRI-O vs Docker Engine
What a Container Runtime Does
A container runtime takes an image and runs it as a container — manages namespaces, cgroups, networking hooks, storage drivers. The low-level work that turns a container image into a running process.
Two layers exist in practice: the high-level runtime (containerd, CRI-O, Docker Engine) that the orchestrator talks to, and the low-level runtime (runc, crun) that actually executes containers. Most discussion is about the high-level layer. The OCI Runtime Specification defines what any compliant runtime must implement.
containerd: The Default in Most Places
containerd is the most widely deployed high-level runtime. It started as Docker’s internal runtime, was donated to CNCF, and is now the default in EKS, GKE, AKS, and most Kubernetes distributions. The containerd documentation covers configuration, plugins, and the CRI integration.
Pros: stable, broadly supported, performant. Cons: minimal opinions, so day-2 operational concerns (image scanning, signing, policy) need other tools layered on top.
CRI-O: Built for Kubernetes
CRI-O was built specifically to be the runtime for Kubernetes. The scope is narrower than containerd’s — no support for use cases outside Kubernetes — which makes the surface area smaller.
Pros: tightly integrated with Kubernetes, well-aligned with OpenShift, smaller attack surface. Cons: less broadly used than containerd outside the Red Hat ecosystem.
Docker Engine: The Past Default
Docker Engine was the original Kubernetes runtime via the dockershim adapter. Kubernetes 1.24 removed dockershim, making Docker Engine no longer a Kubernetes runtime by default (cri-dockerd exists as a separate adapter).
Docker remains useful for developer machines and standalone container hosts. For Kubernetes nodes, it’s no longer the path of least resistance.
Low-Level Runtime Choices
runc is the OCI reference low-level runtime. The default under containerd, CRI-O, and Docker.
crun is a faster, lighter alternative written in C. Used by default in some RHEL distributions; opt-in elsewhere. Worth evaluating if you’re optimizing for container startup latency.
gVisor (Google), Kata Containers (sandboxed VMs), and Firecracker provide stronger isolation at the cost of performance. Useful for multi-tenant workloads where namespace isolation alone is insufficient.
Related Reading
- See our deeper guide at /containers/container-registry-comparison/.
Image Build Tools
The runtime is one half of the story; build tooling is the other. Docker’s BuildKit, Buildah, Kaniko, and ko all build OCI-compatible images with different tradeoffs.
BuildKit is the broadly compatible choice. Buildah works without a daemon, useful for rootless or CI environments. Kaniko builds inside Kubernetes pods without privileged access. ko is specific to Go applications and builds without a Dockerfile. Pick by use case; the resulting images are interchangeable.
Image Layers and Storage Drivers
The container runtime’s storage driver affects performance and disk usage. overlay2 is the standard on modern Linux distributions and works well in most cases.
For specific workloads (frequent image pulls, large images, container-heavy nodes), evaluating storage driver behavior is worthwhile. Tools like dive show layer contents and help understand image size and pull behavior.
Container Security Considerations
Default container isolation uses Linux namespaces and cgroups. This is sufficient for most workloads but doesn’t provide hard isolation between mutually distrusting tenants.
For stronger isolation, sandboxed runtimes (Kata Containers, gVisor, Firecracker) add a lightweight VM or syscall interception layer. The performance cost is real (10-30% in many benchmarks); the isolation benefit is meaningful.
Use sandboxed runtimes selectively for workloads with specific isolation requirements. Don’t apply them across the board — the overhead doesn’t justify itself for trusted internal workloads.
Migration Considerations
Migrating between container runtimes (containerd to CRI-O, for example) is rarely necessary. The runtime is largely an implementation detail; OCI image compatibility means workloads don’t change.
Where migration happens: distribution changes (RHEL switching CRI-O default), specific feature requirements (rootless containers, GPU support quirks), or operational standardization across mixed environments.
The migration path is straightforward: drain nodes, change the runtime configuration, restart kubelet, return nodes to service. The complexity is in testing — different runtimes have subtle behavior differences.
Container Image Provenance
Knowing where your container images come from is foundational. Pinning by digest (not by tag) gives immutability. Signing with Sigstore or Notary provides authenticity.
Build provenance — recording how the image was built, from what source, by which CI system — adds an additional layer. SLSA attestations capture this in a standardized format.
For organizations subject to executive orders or regulatory frameworks requiring software supply chain controls, provenance becomes mandatory rather than optional. Building the practice into normal CI early is cheaper than retrofitting under audit pressure.
Observability for Kubernetes Workloads
Standard observability for Kubernetes includes: pod metrics (cAdvisor exposed via kubelet), node metrics (node-exporter), API server and controller metrics, and application metrics via service annotations or ServiceMonitor.
The kube-prometheus-stack Helm chart bundles all of this with pre-built dashboards and alerts. Most clusters that want quick observability install it and customize from there. For deeper observability — distributed tracing across pods, application-level instrumentation — OpenTelemetry layers on top.
Logs follow a similar pattern. Fluent Bit or Vector as the agent, shipping to a centralized log store (Loki, Elasticsearch, CloudWatch). Per-pod metadata enrichment makes logs searchable by deployment, namespace, and pod labels.
Capacity Planning and Right-Sizing
Kubernetes capacity planning has two layers: cluster capacity (how many nodes, what types) and workload capacity (resource requests and limits). Both deserve attention.
For cluster capacity, observe peak utilization and plan headroom. 70-80% peak utilization is a healthy target — below that, you’re paying for idle capacity; above that, autoscaling lag and burst patterns can cause issues.
For workload capacity, the right-sizing tools mentioned earlier surface candidates. Schedule quarterly right-sizing reviews. Service growth and traffic pattern changes mean yesterday’s right-size is today’s waste or saturation.
Image Optimization for Production
Beyond best practices in the Dockerfile, image optimization at the repository level pays back across many services. Standardize on a small set of base images, share optimization patterns across teams, and centralize the security-update process for those base images.
Internal base images that wrap upstream images with organization-specific additions (corporate certs, common tools, security agents) reduce per-service complexity. Build them with the same discipline as application images — pinned dependencies, signed, scanned.
Image size impacts pull time, which impacts pod startup, which impacts autoscaling responsiveness and rolling deploy duration. The end-to-end effect is larger than the per-image savings suggest.
Operational Recommendations
For teams running production Kubernetes workloads, a small set of disciplines pays back across nearly every dimension of cluster operation. Define resource requests and limits for all production workloads. Establish a network policy posture that defaults to deny. Run regular cluster upgrades on a defined cadence. Monitor cluster health alongside application health.
These aren’t novel recommendations — they appear in every Kubernetes best-practices guide. They’re rarely fully implemented in production clusters that grew organically. The work of bringing existing clusters to this baseline is significant but worthwhile.
For new clusters, build these in from the start. Templates and operators can enforce the baseline; documentation captures the intent. Each new service onboarded gets the right defaults rather than requiring later remediation.
Operational maturity in Kubernetes is incremental. Pick the next improvement, implement it, move on. The compounded effect over time is what separates well-operated clusters from clusters that work but feel fragile.
Key Takeaways
The most important point throughout this guide: practical engineering decisions depend on specific context. Best-practice recommendations are starting points, not destinations. The right answer for your team depends on your scale, your existing tooling investment, your team’s experience, and the specific constraints you face.
Three principles worth carrying forward regardless of specific tool choices. First, measure what you change. Engineering improvements without measurement become folklore — claims without evidence. Track the metrics that show whether interventions are working.
Second, default to simpler architectures and tools. Complexity has cost. Each additional moving part is something to monitor, debug, upgrade, and eventually replace. Choose the simplest thing that meets your actual requirements, not the most sophisticated thing you could build.
Third, invest continuously in the boring foundations. Reliable CI, good observability, sensible access controls, and clear documentation pay back across every project. Skipping these for short-term feature velocity accumulates debt that eventually consumes the velocity it was supposed to enable.
The teams that operate well over the long term are usually not the teams with the most exotic tooling. They’re the teams with disciplined fundamentals, deliberate decision-making, and continuous incremental improvement.
Frequently Asked Questions
Should I switch from containerd?
Probably not. containerd is the broadly supported default; switching adds risk without clear benefit for most workloads.
Is Docker really gone from Kubernetes?
Docker Engine isn’t a default Kubernetes runtime anymore. Container images built with Docker work fine — image format is OCI-standardized.
When should I look at gVisor or Kata?
Multi-tenant clusters running untrusted workloads. The performance cost is real; only worth it when isolation requirements demand it.
Does the runtime choice affect image compatibility?
No. All OCI-compliant runtimes run OCI-compliant images. The standardization has held up well.