Running and debugging tests
Quick start
The most common development workflows:
# Unit tests
make test
# Integration tests
make test-integration
# E2E: build image and run
# Note: PLATFORM=linux/arm64 is only needed on Apple Silicon (arm64).
# On amd64, run: make kind-image-build test-e2e-baseline
PLATFORM=linux/arm64 make kind-image-build test-e2e-baseline
# E2E: build, run, and keep the kind cluster alive between runs
E2E_MODE=dev PLATFORM=linux/arm64 make kind-image-build test-e2e-baseline
Quick filter reference
# Focus on tests matching a name pattern
GINKGO_ARGS="--focus=Scheduler" make test-integration
GINKGO_ARGS="--focus='Creating a Pod requesting TAS'" make test-e2e-baseline
# Filter integration tests by label
INTEGRATION_FILTERS="--label-filter=controller:workload" make test-integration
INTEGRATION_FILTERS="--label-filter=area:jobs" make test-integration
# Filter e2e tests by label
GINKGO_ARGS="--label-filter=feature:jobset" make test-e2e-extended
# Run only a specific integration test directory
INTEGRATION_TARGET='test/integration/singlecluster/scheduler' make test-integration
For the full label taxonomy and more examples, see Running a subset of tests.
Running unit tests
To run all unit tests:
make test
To run unit tests for webhooks:
go test ./pkg/webhooks
To run tests that match TestValidateClusterQueue regular expression ref:
go test ./pkg/webhooks -run TestValidateClusterQueue
Running unit tests with race detection
Use -race to enable Go’s built-in race detector:
go test ./pkg/scheduler/preemption/ -race
Running unit tests with stress
To run unit tests in a loop and collect failures:
# install go stress
go install golang.org/x/tools/cmd/stress@latest
# compile tests (you can add -race for race detection)
go test ./pkg/scheduler/preemption/ -c
# it loops and reports failures
stress ./preemption.test -test.run TestPreemption
Running integration tests
make test-integration
For running a subset of tests, see Running subset of tests.
Running e2e tests
E2E tests build a Kueue image and load it into a local Kind cluster. The build step must run before the test target:
make kind-image-build
On Apple Silicon (arm64), set PLATFORM:
PLATFORM=linux/arm64 make kind-image-build
Then run the desired test target:
make test-e2e-baseline
make test-e2e-extended
make test-e2e-sequential-baseline
make test-e2e-sequential-extended
make test-e2e-certmanager
make test-e2e-kueueviz
make test-tas-e2e-baseline
make test-tas-e2e-extended
make test-multikueue-e2e-main
make test-multikueue-e2e-sequential
You can specify the Kubernetes version:
E2E_K8S_FULL_VERSION=1.35.0 make test-e2e-baseline
For running a subset of tests, see Running subset of tests.
DEV mode (keep the cluster)
Use E2E_MODE=dev to create-or-reuse a kind cluster, rebuild/redeploy Kueue, run tests, and keep the cluster running for fast reruns and post-test investigation:
# Create if missing, otherwise reuse cluster. Rebuild image, run tests, keep the cluster.
E2E_MODE=dev make kind-image-build test-e2e-baseline
# MultiKueue dev mode
E2E_MODE=dev make kind-image-build test-multikueue-e2e-main
# Loop a suite (until it fails) while keeping the cluster
E2E_MODE=dev GINKGO_ARGS="--until-it-fails" make kind-image-build test-e2e-baseline
# Skip reinstallation of kueue (works only in dev mode)
E2E_MODE=dev E2E_SKIP_REINSTALL=true make kind-image-build test-e2e-baseline
E2E_MODE=dev E2E_SKIP_REINSTALL=true make kind-image-build test-multikueue-e2e-main
# Skip re-pulling dependency images and re-importing them into kind when already present (dev mode only)
E2E_MODE=dev E2E_SKIP_IMAGE_RELOAD=true make kind-image-build test-e2e-baseline
Note
When reusing a kept cluster in E2E_MODE=dev, external operators (MPI, KubeRay, etc.) are installed only once.
To force re-installing them on every run, set E2E_ENFORCE_OPERATOR_UPDATE=true.
Set E2E_SKIP_IMAGE_RELOAD to a truthy value (for example true) to skip docker pull for dependency images
that are already in your local Docker cache, and to skip loading an image into kind worker nodes when that
image reference is already present in the node containerd store.
That makes repeat runs faster, especially on multi-node clusters.
The Kueue controller image is always reloaded into the cluster unless E2E_SKIP_REINSTALL=true, because you may
rebuild it with make kind-image-build under the same tag.
To delete the kept cluster(s) afterwards:
- For regular e2e tests, run:
kind delete clusters kind - For MultiKueue tests, run:
kind delete clusters kind kind-manager kind-worker1 kind-worker2
Using a released or staging image
To use a released or staging Kueue image instead of building from source (no kind-image-build needed), pass IMAGE_TAG:
# Released version
E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-e2e-baseline
E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-multikueue-e2e-main
# Staging image (e.g. from a PR or nightly)
E2E_MODE=dev IMAGE_TAG=us-central1-docker.pkg.dev/k8s-staging-images/kueue/kueue:main make test-e2e-baseline
Using a released version with matching manifests: The e2e framework deploys CRDs and other resources from the repo’s config and overrides only the controller image via IMAGE_TAG. To run e2e against a specific release with manifests that match that image:
- Check out that version’s tag (e.g.
git checkout v0.16.0). The CRD and deployment config in the repo are committed at each release, so nomake manifestsstep is needed. - Run the command above with the same image tag, e.g.
E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-e2e-baseline.
This is useful to reproduce issues on a specific released version (e.g. for on-call debugging). For installing a released version into a real cluster (not e2e), see Install a released version.
Legacy: interactive attach mode
Run E2E_RUN_ONLY_ENV=true make kind-image-build test-multikueue-e2e-main and wait for the Do you want to cleanup? [Y/n] to appear (CI-style behavior).
The cluster is ready, and now you can run tests from another terminal:
<your_kueue_path>/bin/ginkgo --json-report ./ginkgo.report -focus "MultiKueue when Creating a multikueue admission check Should run a jobSet on worker if admitted" -r
or from VSCode.
Running subset of integration or e2e tests
Use label filters for integration tests
Integration tests are labeled by controller, job type, feature, and area to enable targeted test execution. You can use INTEGRATION_FILTERS with --label-filter to run specific test subsets:
Label Taxonomy:
- Controllers:
controller:workload,controller:localqueue,controller:clusterqueue,controller:admissioncheck,controller:resourceflavor,controller:provisioning - Job Types:
job:batch,job:pod,job:jobset,job:pytorch,job:tensorflow,job:mpi,job:paddle,job:xgboost,job:jax,job:train,job:ray,job:appwrapper,job:sparkapplication - Features:
feature:tas,feature:multikueue,feature:provisioning,feature:fairsharing,feature:admissionfairsharing - Areas:
area:core,area:jobs,area:admissionchecks,area:multikueue
Examples:
# Run only LocalQueue tests
INTEGRATION_FILTERS="--label-filter=controller:localqueue" make test-integration
# Run all job tests
INTEGRATION_FILTERS="--label-filter=area:jobs" make test-integration
# Run PyTorch job tests
INTEGRATION_FILTERS="--label-filter=job:pytorch" make test-integration
# Run all tests except slow
INTEGRATION_FILTERS="--label-filter=!slow" make test-integration
# Run core tests except slow
INTEGRATION_FILTERS="--label-filter=area:core && !slow" make test-integration
# Run TAS-related tests
INTEGRATION_FILTERS="--label-filter=feature:tas" make test-integration
# Run FairSharing tests
INTEGRATION_FILTERS="--label-filter=feature:fairsharing" make test-integration
Use label filters for e2e singlecluster tests
SingleCluster tests are labeled by feature and area. You can use GINKGO_ARGS with --label-filter to run specific tests:
Label Taxonomy:
- Features:
appwrapper,certs,deployment,job,fairsharing,jaxjob,jobset,kuberay,kueuectl,leaderworkerset,metrics,pod,pytorchjob,statefulset,tas,trainjob,visibility,e2e_v1beta1,ha
Examples:
# Run only appwrapper tests
GINKGO_ARGS="--label-filter=feature:appwrapper" make test-e2e-extended
# Run only deployment tests with helm
GINKGO_ARGS="--label-filter=feature:deployment" make test-e2e-baseline-helm
# Run only jobset and trainjob tests with helm
GINKGO_ARGS="--label-filter=feature:jobset,feature:trainjob" make test-e2e-extended
Use label filters for e2e sequential tests
Sequential tests (Baseline and Extended) are labeled by feature. You can use GINKGO_ARGS with --label-filter to run specific tests:
Label Taxonomy (Baseline):
- Features:
admissionfairsharing, certs, failurerecoverypolicy, localqueuemetrics, objectretentionpolicies, podintegrationautoenablement, reconcile, visibility, waitforpodsready
Label Taxonomy (Extended):
- Features:
managejobswithoutqueuename, spark
Examples:
# Run only admissionfairsharing tests (Baseline)
GINKGO_ARGS="--label-filter=feature:admissionfairsharing" make test-e2e-sequential-baseline
# Run only spark tests (Extended)
GINKGO_ARGS="--label-filter=feature:spark" make test-e2e-sequential-extended
Use Ginkgo –focus arg
GINKGO_ARGS="--focus=Scheduler" make test-integration
GINKGO_ARGS="--focus='Creating a Pod requesting TAS'" make test-e2e-baseline
Use ginkgo.FIt
If you want to focus on specific tests, you can change
ginkgo.It to ginkgo.FIt for these tests.
For more details, see here.
Then the other tests will be skipped.
For example, you can change
ginkgo.It("Should place pods based on the ranks-ordering", func() {
to
ginkgo.FIt("Should place pods based on the ranks-ordering", func() {
and then run
# build and pull image
make test-tas-e2e-baseline
make test-tas-e2e-extended
to test a particular TAS e2e test.
Use INTEGRATION_TARGET
INTEGRATION_TARGET='test/integration/singlecluster/scheduler' make test-integration
Flaky integration/e2e tests
You can use –until-it-fails or –repeat=N arguments to Ginkgo to run tests repeatedly, such as:
GINKGO_ARGS="--until-it-fails" make test-integration
GINKGO_ARGS="--repeat=10" make test-e2e-baseline
See more here
Adding stress
You can run stress tool to increase CPU load during tests. For example, if you’re on Debian-based Linux:
# install stress:
sudo apt install stress
# run stress alongside tests
/usr/bin/stress --cpu 80
Analyzing logs
Kueue runs as a regular pod on a worker node, and in e2e tests there are 2 replicas running. The Kueue logs are located in kind-worker/pods/kueue-system_kueue-controller-manager*/manager and kind-worker2/pods/kueue-system_kueue-controller-manager*/manager folders.
For each log message you can from which file and line the message is coming from:
2025-02-03T15:51:51.502425029Z stderr F 2025-02-03T15:51:51.502117824Z LEVEL(-2) cluster-queue-reconciler core/clusterqueue_controller.go:341 ClusterQueue update event {"clusterQueue": {"name":"cluster-queue"}}
Advanced
Running presubmission verification tests
make verify
Increase logging verbosity
TEST_LOG_LEVEL controls test logging uniformly for all targets:
go test,make test(unit tests)make test-integration(integration tests)make test-*-e2e(e2e tests)
Use more negative values for more verbose logs and higher (positive) values for quieter logs. For example:
TEST_LOG_LEVEL=-5 make test-integration # more verbose
TEST_LOG_LEVEL=-1 make test # less verbose than default
Default is TEST_LOG_LEVEL=-3.
Debug tests in VSCode
It is possible to debug unit and integration tests in VSCode.
You need to have the Go extension installed.
Now you will have run test | debug test text buttons above lines like
func TestValidateClusterQueue(t *testing.T) {
You can click on the debug test to debug a specific test.
For integration tests, an additional step is needed. In settings.json, you need to add two variables inside go.testEnvVars:
- Run
ENVTEST_K8S_VERSION=1.35 make envtest && ./bin/setup-envtest use $ENVTEST_K8S_VERSION -p pathand assign the path to theKUBEBUILDER_ASSETSvariable - Set
KUEUE_BINto thebindirectory within your cloned Kueue repository
"go.testEnvVars": {
"KUBEBUILDER_ASSETS": "<path from output above>",
"KUEUE_BIN": "<path-to-your-kueue-folder>/bin",
},
For e2e tests, you can also use Ginkgo Test Explorer. You need to add the following variables to settings.json:
"ginkgotestexplorer.testEnvVars": {
"KIND_CLUSTER_NAME": "kind",
"WORKER1_KIND_CLUSTER_NAME": "kind-worker1",
"MANAGER_KIND_CLUSTER_NAME": "kind-manager",
"WORKER2_KIND_CLUSTER_NAME": "kind-worker2",
"KIND": "<your_kueue_path>/bin/kind",
},
and then you can use GUI of the Ginkgo Test Explorer to run individual tests, provided you started kind cluster (see here for the instructions).
Debugging metrics with Prometheus
To provision a Kind cluster with Prometheus pre-configured for metrics debugging:
E2E_MODE=dev GINKGO_ARGS="--label-filter=feature:prometheus" make kind-image-build test-e2e-baseline
For more details, see Setup Dev Monitoring.
See also
- Kubernetes testing guide
- Integration Testing in Kubernetes
- End-to-End Testing in Kubernetes
- Flaky Tests in Kubernetes
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.