Running and debugging tests

Running and debugging tests

Quick start

The most common development workflows:

# Unit tests
make test

# Integration tests
make test-integration

# E2E: build image and run
# Note: PLATFORM=linux/arm64 is only needed on Apple Silicon (arm64).
# On amd64, run: make kind-image-build test-e2e-baseline
PLATFORM=linux/arm64 make kind-image-build test-e2e-baseline

# E2E: build, run, and keep the kind cluster alive between runs
E2E_MODE=dev PLATFORM=linux/arm64 make kind-image-build test-e2e-baseline

Quick filter reference

# Focus on tests matching a name pattern
GINKGO_ARGS="--focus=Scheduler" make test-integration
GINKGO_ARGS="--focus='Creating a Pod requesting TAS'" make test-e2e-baseline

# Filter integration tests by label
INTEGRATION_FILTERS="--label-filter=controller:workload" make test-integration
INTEGRATION_FILTERS="--label-filter=area:jobs" make test-integration

# Filter e2e tests by label
GINKGO_ARGS="--label-filter=feature:jobset" make test-e2e-extended

# Run only a specific integration test directory
INTEGRATION_TARGET='test/integration/singlecluster/scheduler' make test-integration

For the full label taxonomy and more examples, see Running a subset of tests.


Running unit tests

To run all unit tests:

make test

To run unit tests for webhooks:

go test ./pkg/webhooks

To run tests that match TestValidateClusterQueue regular expression ref:

go test ./pkg/webhooks -run TestValidateClusterQueue

Running unit tests with race detection

Use -race to enable Go’s built-in race detector:

go test ./pkg/scheduler/preemption/ -race

Running unit tests with stress

To run unit tests in a loop and collect failures:

# install go stress
go install golang.org/x/tools/cmd/stress@latest
# compile tests (you can add -race for race detection)
go test ./pkg/scheduler/preemption/ -c
# it loops and reports failures
stress ./preemption.test -test.run TestPreemption

Running integration tests

make test-integration

For running a subset of tests, see Running subset of tests.

Running e2e tests

E2E tests build a Kueue image and load it into a local Kind cluster. The build step must run before the test target:

make kind-image-build

On Apple Silicon (arm64), set PLATFORM:

PLATFORM=linux/arm64 make kind-image-build

Then run the desired test target:

make test-e2e-baseline
make test-e2e-extended
make test-e2e-sequential-baseline
make test-e2e-sequential-extended
make test-e2e-certmanager
make test-e2e-kueueviz
make test-tas-e2e-baseline
make test-tas-e2e-extended
make test-multikueue-e2e-main
make test-multikueue-e2e-sequential

You can specify the Kubernetes version:

E2E_K8S_FULL_VERSION=1.35.0 make test-e2e-baseline

For running a subset of tests, see Running subset of tests.

DEV mode (keep the cluster)

Use E2E_MODE=dev to create-or-reuse a kind cluster, rebuild/redeploy Kueue, run tests, and keep the cluster running for fast reruns and post-test investigation:

# Create if missing, otherwise reuse cluster. Rebuild image, run tests, keep the cluster.
E2E_MODE=dev make kind-image-build test-e2e-baseline

# MultiKueue dev mode
E2E_MODE=dev make kind-image-build test-multikueue-e2e-main

# Loop a suite (until it fails) while keeping the cluster
E2E_MODE=dev GINKGO_ARGS="--until-it-fails" make kind-image-build test-e2e-baseline

# Skip reinstallation of kueue (works only in dev mode)
E2E_MODE=dev E2E_SKIP_REINSTALL=true make kind-image-build test-e2e-baseline
E2E_MODE=dev E2E_SKIP_REINSTALL=true make kind-image-build test-multikueue-e2e-main

# Skip re-pulling dependency images and re-importing them into kind when already present (dev mode only)
E2E_MODE=dev E2E_SKIP_IMAGE_RELOAD=true make kind-image-build test-e2e-baseline

To delete the kept cluster(s) afterwards:

  • For regular e2e tests, run:
    kind delete clusters kind
    
  • For MultiKueue tests, run:
    kind delete clusters kind kind-manager kind-worker1 kind-worker2
    

Using a released or staging image

To use a released or staging Kueue image instead of building from source (no kind-image-build needed), pass IMAGE_TAG:

# Released version
E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-e2e-baseline
E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-multikueue-e2e-main

# Staging image (e.g. from a PR or nightly)
E2E_MODE=dev IMAGE_TAG=us-central1-docker.pkg.dev/k8s-staging-images/kueue/kueue:main make test-e2e-baseline

Using a released version with matching manifests: The e2e framework deploys CRDs and other resources from the repo’s config and overrides only the controller image via IMAGE_TAG. To run e2e against a specific release with manifests that match that image:

  1. Check out that version’s tag (e.g. git checkout v0.16.0). The CRD and deployment config in the repo are committed at each release, so no make manifests step is needed.
  2. Run the command above with the same image tag, e.g. E2E_MODE=dev IMAGE_TAG=registry.k8s.io/kueue/kueue:v0.16.0 make test-e2e-baseline.

This is useful to reproduce issues on a specific released version (e.g. for on-call debugging). For installing a released version into a real cluster (not e2e), see Install a released version.

Legacy: interactive attach mode

Run E2E_RUN_ONLY_ENV=true make kind-image-build test-multikueue-e2e-main and wait for the Do you want to cleanup? [Y/n] to appear (CI-style behavior).

The cluster is ready, and now you can run tests from another terminal:

<your_kueue_path>/bin/ginkgo --json-report ./ginkgo.report -focus "MultiKueue when Creating a multikueue admission check Should run a jobSet on worker if admitted" -r

or from VSCode.

Running subset of integration or e2e tests

Use label filters for integration tests

Integration tests are labeled by controller, job type, feature, and area to enable targeted test execution. You can use INTEGRATION_FILTERS with --label-filter to run specific test subsets:

Label Taxonomy:

  • Controllers: controller:workload, controller:localqueue, controller:clusterqueue, controller:admissioncheck, controller:resourceflavor, controller:provisioning
  • Job Types: job:batch, job:pod, job:jobset, job:pytorch, job:tensorflow, job:mpi, job:paddle, job:xgboost, job:jax, job:train, job:ray, job:appwrapper, job:sparkapplication
  • Features: feature:tas, feature:multikueue, feature:provisioning, feature:fairsharing, feature:admissionfairsharing
  • Areas: area:core, area:jobs, area:admissionchecks, area:multikueue

Examples:

# Run only LocalQueue tests
INTEGRATION_FILTERS="--label-filter=controller:localqueue" make test-integration

# Run all job tests
INTEGRATION_FILTERS="--label-filter=area:jobs" make test-integration

# Run PyTorch job tests
INTEGRATION_FILTERS="--label-filter=job:pytorch" make test-integration

# Run all tests except slow
INTEGRATION_FILTERS="--label-filter=!slow" make test-integration

# Run core tests except slow
INTEGRATION_FILTERS="--label-filter=area:core && !slow" make test-integration

# Run TAS-related tests
INTEGRATION_FILTERS="--label-filter=feature:tas" make test-integration

# Run FairSharing tests
INTEGRATION_FILTERS="--label-filter=feature:fairsharing" make test-integration

Use label filters for e2e singlecluster tests

SingleCluster tests are labeled by feature and area. You can use GINKGO_ARGS with --label-filter to run specific tests:

Label Taxonomy:

  • Features: appwrapper,certs,deployment,job,fairsharing,jaxjob,jobset,kuberay,kueuectl,leaderworkerset,metrics,pod,pytorchjob,statefulset,tas,trainjob,visibility,e2e_v1beta1,ha

Examples:

# Run only appwrapper tests
GINKGO_ARGS="--label-filter=feature:appwrapper" make test-e2e-extended

# Run only deployment tests with helm
GINKGO_ARGS="--label-filter=feature:deployment" make test-e2e-baseline-helm

# Run only jobset and trainjob tests with helm
GINKGO_ARGS="--label-filter=feature:jobset,feature:trainjob" make test-e2e-extended

Use label filters for e2e sequential tests

Sequential tests (Baseline and Extended) are labeled by feature. You can use GINKGO_ARGS with --label-filter to run specific tests:

Label Taxonomy (Baseline):

  • Features: admissionfairsharing, certs, failurerecoverypolicy, localqueuemetrics, objectretentionpolicies, podintegrationautoenablement, reconcile, visibility, waitforpodsready

Label Taxonomy (Extended):

  • Features: managejobswithoutqueuename, spark

Examples:

# Run only admissionfairsharing tests (Baseline)
GINKGO_ARGS="--label-filter=feature:admissionfairsharing" make test-e2e-sequential-baseline

# Run only spark tests (Extended)
GINKGO_ARGS="--label-filter=feature:spark" make test-e2e-sequential-extended

Use Ginkgo –focus arg

GINKGO_ARGS="--focus=Scheduler" make test-integration
GINKGO_ARGS="--focus='Creating a Pod requesting TAS'" make test-e2e-baseline

Use ginkgo.FIt

If you want to focus on specific tests, you can change ginkgo.It to ginkgo.FIt for these tests. For more details, see here. Then the other tests will be skipped. For example, you can change

ginkgo.It("Should place pods based on the ranks-ordering", func() {

to

ginkgo.FIt("Should place pods based on the ranks-ordering", func() {

and then run

# build and pull image
make test-tas-e2e-baseline
make test-tas-e2e-extended

to test a particular TAS e2e test.

Use INTEGRATION_TARGET

INTEGRATION_TARGET='test/integration/singlecluster/scheduler' make test-integration

Flaky integration/e2e tests

You can use –until-it-fails or –repeat=N arguments to Ginkgo to run tests repeatedly, such as:

GINKGO_ARGS="--until-it-fails" make test-integration
GINKGO_ARGS="--repeat=10" make test-e2e-baseline

See more here

Adding stress

You can run stress tool to increase CPU load during tests. For example, if you’re on Debian-based Linux:

# install stress:
sudo apt install stress
# run stress alongside tests
/usr/bin/stress --cpu 80

Analyzing logs

Kueue runs as a regular pod on a worker node, and in e2e tests there are 2 replicas running. The Kueue logs are located in kind-worker/pods/kueue-system_kueue-controller-manager*/manager and kind-worker2/pods/kueue-system_kueue-controller-manager*/manager folders.

For each log message you can from which file and line the message is coming from:

2025-02-03T15:51:51.502425029Z stderr F 2025-02-03T15:51:51.502117824Z	LEVEL(-2)	cluster-queue-reconciler	core/clusterqueue_controller.go:341	ClusterQueue update event	{"clusterQueue": {"name":"cluster-queue"}}

Advanced

Running presubmission verification tests

make verify

Increase logging verbosity

TEST_LOG_LEVEL controls test logging uniformly for all targets:

  • go test, make test (unit tests)
  • make test-integration (integration tests)
  • make test-*-e2e (e2e tests)

Use more negative values for more verbose logs and higher (positive) values for quieter logs. For example:

TEST_LOG_LEVEL=-5 make test-integration   # more verbose
TEST_LOG_LEVEL=-1 make test               # less verbose than default

Default is TEST_LOG_LEVEL=-3.

Debug tests in VSCode

It is possible to debug unit and integration tests in VSCode. You need to have the Go extension installed. Now you will have run test | debug test text buttons above lines like

func TestValidateClusterQueue(t *testing.T) {

You can click on the debug test to debug a specific test.

For integration tests, an additional step is needed. In settings.json, you need to add two variables inside go.testEnvVars:

  • Run ENVTEST_K8S_VERSION=1.35 make envtest && ./bin/setup-envtest use $ENVTEST_K8S_VERSION -p path and assign the path to the KUBEBUILDER_ASSETS variable
  • Set KUEUE_BIN to the bin directory within your cloned Kueue repository
"go.testEnvVars": {
    "KUBEBUILDER_ASSETS": "<path from output above>",
    "KUEUE_BIN": "<path-to-your-kueue-folder>/bin",
  },

For e2e tests, you can also use Ginkgo Test Explorer. You need to add the following variables to settings.json:

 "ginkgotestexplorer.testEnvVars": {
        "KIND_CLUSTER_NAME": "kind",
        "WORKER1_KIND_CLUSTER_NAME": "kind-worker1",
        "MANAGER_KIND_CLUSTER_NAME": "kind-manager",
        "WORKER2_KIND_CLUSTER_NAME": "kind-worker2",
        "KIND": "<your_kueue_path>/bin/kind",
    },

and then you can use GUI of the Ginkgo Test Explorer to run individual tests, provided you started kind cluster (see here for the instructions).

Debugging metrics with Prometheus

To provision a Kind cluster with Prometheus pre-configured for metrics debugging:

E2E_MODE=dev GINKGO_ARGS="--label-filter=feature:prometheus" make kind-image-build test-e2e-baseline

For more details, see Setup Dev Monitoring.

See also