Kyverno Chainsaw: declarative end-to-end testing for Kubernetes

Kyverno Chainsaw lets you write Kubernetes end-to-end tests as declarative YAML instead of Go boilerplate or brittle bash. What it does, who runs it in production, and where it falls short.

If you build Kubernetes operators, controllers, or Helm charts, Kyverno Chainsaw is the strongest tool today for writing end-to-end tests as declarative YAML instead of Go boilerplate or brittle bash. It runs your manifests against a real cluster, asserts on actual cluster state with JMESPath, and is already wired into the production CI of the OpenTelemetry Operator, Crossplane's testing tool uptest, and Kyverno itself. The docs explain the what in abstract terms; this article covers the why, the how, and the parts nobody puts on the landing page.

TL;DR

Chainsaw is a declarative end-to-end testing framework for Kubernetes. You describe tests as Test custom resources in YAML, and it runs them against a live cluster (kind, k3s, whatever your CI spins up).
Its standout feature is assertion trees: subset matching plus JMESPath expressions, so you assert only the fields you care about and can express conditions like "ready replicas equals desired replicas" inline.
Real adopters include Kyverno, kyverno/policies, Crossplane/uptest, the OpenTelemetry Operator, and the OT-CONTAINER-KIT Redis operator.
It is the de-facto choice for declarative operator testing, but it does not replace Go-based integration tests (Ginkgo + envtest) and it needs a real cluster, which makes CI slower.
Latest release is v0.2.15 (6 May 2026); it is a sub-project of Kyverno, which graduated in CNCF in March 2026.

TL;DR
Table of contents
What Chainsaw is, and the gap it fills
Where Chainsaw came from
What a Chainsaw test actually looks like
Assertion trees: the feature that earns the switch
Who actually runs Chainsaw in production
Chainsaw versus the alternatives
When Chainsaw is not the right tool

What Chainsaw is, and the gap it fills

Chainsaw tests Kubernetes the way you operate Kubernetes: declaratively. A test is a Test resource (apiVersion: chainsaw.kyverno.io/v1alpha1) that lists steps. Each step applies manifests, waits, and asserts that the cluster reached the state you expected. No Go, no test harness to compile, no in-process fake API server pretending to be Kubernetes.

The gap it fills is specific. If you write a controller, you have three bad options and one good one. You can write hundreds of lines of Go using Ginkgo, Gomega, and envtest, which is powerful but heavy and tied to a Go toolchain. You can use KUTTL, which is declarative but limited in what it can assert. You can string together bash and kubectl, which works until it doesn't and then produces failure output nobody can read. Chainsaw is the fourth option: declarative like KUTTL, expressive like code, and readable months later.

The official introduction states the original purpose plainly: it "was primarily developed to run end-to-end tests in Kubernetes clusters, meant to test Kubernetes operators by running a sequence of steps and asserting various conditions." That framing matters. Chainsaw is an end-to-end tool. It exercises the real reconcile loop against a real API server, so it catches the bugs that only show up when an actual controller acts on an actual resource.

Where Chainsaw came from

Chainsaw exists because the Kyverno team got tired of fighting their test tooling. They started on KUTTL for Kyverno's own e2e tests, hit its limits, and forked it. The fork diverged so far that, in their words, "the changes we were making was simply too large to have a chance to be incorporated upstream." So in 2023 they rebuilt it from scratch as Chainsaw, deliberately keeping the test file format close to KUTTL's so existing suites could migrate (introduction docs).

The lead author is Charles-Edouard Brétéché (@eddycharly), a Senior Staff Engineer at Nirmata and the architect of the assertion-tree model. The current maintainer set, per the Kyverno community MAINTAINERS.md, is Brétéché, Shubham Gupta (@shubham-cmyk), and Mariam Fahmy (@MariamFahmy98) of Cloudflare. Chainsaw is an explicit sub-project of Kyverno under the kyverno GitHub organization, which means it sits under the umbrella of a project that reached CNCF Graduated status on 16 March 2026. That is the highest CNCF maturity level, and it is a reasonable governance signal if you are betting infrastructure tooling on a project's longevity.

It ships often. The releases page shows 87 tagged releases, with v0.2.15 out on 6 May 2026, adding status subresource patch support and faster namespace deletion. For a testing tool, frequent releases against real-world bug reports matter more than a big star count.

What a Chainsaw test actually looks like

A test directory contains a chainsaw-test.yaml file. Chainsaw scans recursively, runs each test in its own ephemeral namespace by default, and tears it down afterward. Here is a complete, runnable test that applies a Deployment and asserts it becomes fully ready, with a catch block that dumps diagnostics if the assertion fails:

# yaml-language-server: $schema=https://raw.githubusercontent.com/kyverno/chainsaw/main/.schemas/json/test-chainsaw-v1alpha1.json
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: deployment-becomes-ready
spec:
  steps:
    - try:
        - apply:
            file: deployment.yaml          # your manifest under test
        - assert:
            resource:
              apiVersion: apps/v1
              kind: Deployment
              metadata:
                name: nginx
              status:
                (readyReplicas == replicas): true   # JMESPath expression
      catch:                                  # only runs if try fails
        - describe:
            apiVersion: apps/v1
            kind: Deployment
        - podLogs:
            selector: app=nginx

The four blocks a step can hold are worth knowing. try is the happy path. catch runs only when a try operation fails, which is where you put describe, events, and podLogs to make failures debuggable instead of cryptic. finally always runs after the step. cleanup is deferred until the whole test finishes (step docs).

The operation set is broad. Core operations are apply, create, assert, error, delete, update, patch, script, command, and sleep. On top of those sit kubectl-style helpers usable in any block: describe, events, get, podLogs, proxy, and wait (operations reference). The error operation is the one KUTTL never had a clean answer for: it asserts that a resource does not exist, or that applying a manifest gets rejected. If you write admission policies, that is how you prove a bad resource gets blocked, which is exactly what you want to verify on a cluster running Kyverno policies.

Timeouts are granular and cascade from global to test to step. There are six independent types: applyTimeout, assertTimeout, cleanupTimeout, deleteTimeout, errorTimeout, and execTimeout (timeout docs). That split is deliberate. Asserting that a Deployment rolls out can take far longer than deleting a ConfigMap, and a single global timeout forces you to pick one bad number for both.

The tooling around the file is good for a project this size. chainsaw lint validates test files, chainsaw create scaffolds new ones, and chainsaw export schemas writes JSON schemas you can wire into your editor. Drop a # yaml-language-server: $schema=... comment at the top of a test and VS Code gives you autocompletion and validation as you type (JSON schema docs). There is also a chainsaw migrate kuttl command that converts existing KUTTL tests and config, which is the on-ramp for teams leaving KUTTL.

Assertion trees: the feature that earns the switch

Everything above is convenient. Assertion trees are the reason to actually switch. The model comes from kyverno-json: at every node of the YAML you assert against, Chainsaw can apply a JMESPath projection before descending, and a leaf node triggers a comparison (assertion trees quick-start).

In practice that gives you three things KUTTL's structural equality cannot.

First, subset matching by default. Your assertion only lists the fields you care about. Every other field on the live resource is ignored. You are not forced to reproduce an entire Deployment spec just to check one status field.

Second, expressions at any node. Wrap a key in parentheses and it becomes a JMESPath expression. Want to assert that replicas land in a range rather than equal an exact number?

spec:
  (replicas > `1` && replicas < `4`): true

Third, filtering and iteration over arrays, which is where real controller status lives. Assert that the Ready condition is True without caring about the order or the other conditions:

(conditions[?type == 'Ready']):
- status: 'True'

Or fan an assertion out across every element of an array, for example that every container carries a securityContext:

~.(containers):
  securityContext: {}

That last pattern is the kind of check you want when you are validating that a mutating policy actually injected a field across all containers, not just the first one. With KUTTL you would write a separate assertion per container or drop to a script. With Chainsaw it is one node. The Kyverno blog post on assertion trees walks through the model in more depth, and it is worth reading before you write anything non-trivial, because the syntax ((expression), ~. for iteration, ->binding for variables) does not read intuitively the first time.

Who actually runs Chainsaw in production

This is the part the documentation underplays, and it is the part that should drive your decision. Chainsaw's 576 GitHub stars badly understate its reach, because adoption shows up in CI pipelines, not stars.

Kyverno itself. Chainsaw is Kyverno's primary e2e testing tool, the canonical dogfood case. In kyverno/kyverno issue #12065, opened by Brétéché in February 2025, he states: "Currently Kyverno uses Chainsaw as the primary testing tool, which executes end-to-end tests on a real cluster." The same issue is candid that the suite "takes a long time to be executed," which is a useful tell about the trade-off, and one I will come back to.

The Kyverno policy catalog. The kyverno/policies repository uses Chainsaw to validate the official policy library, with an umbrella issue tracking migration of all sample-policy tests to it.

Crossplane and Upbound. Crossplane's end-to-end testing tool, uptest, runs on Chainsaw under the hood and generates Chainsaw test cases from input. Upbound migrated uptest from KUTTL to Chainsaw and documented why: "while kuttl meets the basic requirements for Uptest so far, chainsaw is a more up-to-date and maintainable tool, with responsive maintenance, detailed documentation, and frequent releases." Every Crossplane provider tested through uptest is therefore tested on Chainsaw.

The OpenTelemetry Operator. This is the cleanest example of production CI. The repo has a committed .chainsaw.yaml, e2e tests under tests/e2e*/, and a GitHub Actions workflow that runs chainsaw test across a matrix of Kubernetes versions (from 1.25 up to 1.33), emitting JUnit XML reports. If you want a reference implementation to copy, this is it.

The OT-CONTAINER-KIT Redis operator. It runs all e2e tests through Chainsaw under tests/e2e-chainsaw/. It is also the source of the most honest data point in this whole article: an October 2024 issue noting the Chainsaw tests are "quite flakey and take a significant amount of time to run." More on that below.

There are first-person accounts too. Maram El-Sayed, a Quality Coach at OutSystems, adopted Chainsaw to test 15+ Flux-based GitOps repositories that previously had no automated testing, and spoke about it in April 2025: "Chainsaw helped us catch Kubernetes issues early and saved us hours of debugging." And a Medium write-up by Shivam Tiwarri describes replacing 200+ lines of Go test code for a custom controller with roughly four lines of Chainsaw YAML, citing flaky tests, opaque logic, and hard onboarding as what pushed the move.

Chainsaw versus the alternatives

KUTTL. Still actively maintained (v0.26.0 shipped on 11 May 2026), and the tool Chainsaw was originally forked from. What it lacks is the expressiveness: no JMESPath assertions, no error operation to prove a resource gets rejected, no catch/finally debugging hooks, and no bindings. If your tests are simple structural checks, KUTTL is still fine. The moment you need conditional assertions, negative tests, or readable failure output, Chainsaw is the stronger tool, and the migration command exists precisely because so many teams cross that line.

Ginkgo/Gomega with envtest. This is the kubebuilder-recommended path for integration tests, and it is genuinely more powerful for in-process controller logic. But envtest spins up an API server and etcd with no scheduler and no kubelet, so controllers that depend on real node behavior will pass tests and fail in production. It also demands Go fluency, which excludes a lot of platform teams. controller-runtime's own docs warn that fake-client tests "gradually re-implement poorly-written impressions of a real API server." Chainsaw runs against a real cluster, so it catches scheduler and node behavior that envtest structurally cannot. The trade-off is speed: a real cluster is slower than an in-process one.

Bash plus kubectl. Fine for a smoke test, miserable at scale. No retry or backoff on assertions, no structured cleanup, and failure output that takes longer to read than the code took to write. Every team I have seen move to Chainsaw cites escaping exactly this.

Terratest and operator-sdk's scaffolding. Terratest is a Go library aimed at Terraform and cloud resources; it can poke a cluster but offers no Kubernetes-native assertion model. operator-sdk scaffolds Ginkgo/envtest by default and explicitly documents Chainsaw as a valid alternative e2e path, which is a quiet endorsement.

The honest summary: for declarative, YAML-native e2e testing of operators, controllers, and charts, Chainsaw is the de-facto choice. It has not displaced Ginkgo/envtest in Go-centric projects, and it is not trying to. They test different layers. A mature operator project often runs both: fast Go integration tests for reconcile logic, Chainsaw for full e2e against a real cluster.

When Chainsaw is not the right tool

The biggest constraint is structural: Chainsaw needs a live cluster. Spinning up kind or k3s adds minutes per CI run, and that overhead is unavoidable for any meaningful operator test. Kyverno's own maintainer flagged this in issue #12065, noting the e2e suite is slow and is even used for "very basic/simple cases" where a faster in-process layer would be better. v0.2.15 added fast namespace deletion to take some of the edge off, but the cluster requirement is the cost of testing real behavior. If you want millisecond unit tests of reconcile logic, Chainsaw is the wrong layer.

Flakiness is real, as the Redis operator issue shows. This is not unique to Chainsaw; every cluster-backed e2e suite is timing-sensitive. But the retry and timeout model does not fully shield you from flakes in a resource-contended CI runner, and you will spend time tuning timeouts.

The JMESPath syntax has a learning curve. The expressiveness that makes assertion trees great also makes them opaque on day one. Engineers new to both Chainsaw and JMESPath face a double climb. Brétéché himself has publicly discussed the difficulty of driving adoption, which tells you the friction is acknowledged, not hidden.

It is e2e only, and it is Kubernetes only. Chainsaw has no opinion about your application's HTTP responses, database state, or message queues. When YAML assertions run out, you fall back to script and command blocks running arbitrary bash, which reintroduces the imperative fragility Chainsaw was built to remove. Use it for the Kubernetes resource graph; do not stretch it into a general integration harness.

Two smaller notes for anyone making a long-term bet. Test-suite organization is thin: there is no first-class concept of named suites or tags beyond regex filtering on test names, with an open discussion requesting it, so large repos lean on directory structure. And the maintainer pool is three people, two of them at Nirmata, which is a normal bus-factor profile for a CNCF sub-project this size but worth weighing if Chainsaw becomes load-bearing for your release pipeline. For a tool that turns operator testing from a Go project into a folder of readable YAML, those are trade-offs I would take.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy