Overview

In modern CI/CD environments, especially those with strong dependencies on external services (APIs, networks, third-party systems), transient failures are inevitable. Retrying failed pipelines becomes essential to improve reliability and reduce manual intervention.

This article presents a robust and production-ready retry strategy for Jenkins pipelines, based on:

  • Self-triggered pipeline re-execution
  • Parameter propagation across builds
  • Controlled retry logic via Shared Library
  • Compatibility with dynamic parameters (including Active Choices)

The solution ensures deterministic retries while preserving execution context.


Quick Summary

Advantages

  • Fully automated retry mechanism (no manual intervention)
  • Preserves all parameters across retries
  • Deterministic and reproducible executions
  • Stateless design (no dependency on previous workspace)
  • Works with dynamic parameters (Active Choices, Reactive)
  • Centralized logic via Shared Library
  • Clear and traceable build history (each retry = new build)

Disadvantages

  • Re-executes the entire pipeline (including successful stages)
  • No re-evaluation of dynamic parameters (Active Choices)
  • Requires explicit parameter propagation logic
  • Potential for overlapping builds if concurrency is not controlled
  • Slight delay overhead (sleep + scheduling)

When This Approach Is Useful

This retry strategy is particularly beneficial in scenarios such as:

External Dependency Instability

  • APIs with intermittent failures
  • Third-party services with rate limits or timeouts
  • Cloud provider transient issues

Network-Sensitive Pipelines

  • Artifact downloads (S3, Nexus, Artifactory)
  • Git operations under unstable connectivity
  • Remote deployments over SSH

Flaky Test Environments

  • Integration tests with external systems
  • End-to-end tests with shared environments
  • Race conditions in distributed systems

Long Pipelines Where Manual Retry Is Costly

  • Complex CI/CD pipelines
  • Multi-stage deployments
  • Pipelines triggered overnight or unattended

Stateless Build Environments

  • Ephemeral agents (Kubernetes, cloud runners)
  • Non-persistent workspaces
  • Immutable infrastructure setups

Core Concept

Instead of retrying individual steps or stages, this approach:

Re-executes the entire pipeline as a new build, preserving the original parameters.

This is functionally equivalent to a manual “Build Again”, but automated, controlled, and consistent.


Architecture

The solution is composed of two main parts:

  1. A Shared Library function (retryPipeline)
  2. A pipeline that invokes it in post { failure }

Shared Library Implementation

def call(Map config = [:]) {

    int maxRetries = config.get('maxRetries', 3)
    int baseDelaySeconds = config.get('baseDelaySeconds', 60)

    int currentRetry = 0

    if (params.containsKey('RETRY_COUNT')) {
        currentRetry = params.RETRY_COUNT.toString().toInteger()
    }

    // Avoid retry on manual abort
    if (currentBuild.rawBuild.getResult().toString() == 'ABORTED') {
        echo "Build was aborted. Skipping retry."
        return
    }
    echo "Internal retry count: ${currentRetry}"

    if (currentRetry < maxRetries) {

        int nextRetry = currentRetry + 1
        int delaySeconds = baseDelaySeconds 

        currentBuild.displayName = "#${env.BUILD_NUMBER} (retry ${nextRetry})"

        echo "Retry ${nextRetry}/${maxRetries} in ${delaySeconds} seconds..."

        sleep time: delaySeconds, unit: 'SECONDS'

        // Propagate parameters
        def currentParams = params
            .findAll { key, value -> key != 'RETRY_COUNT' }
            .collect { key, value ->
                if (value instanceof Boolean) {
                    return booleanParam(name: key, value: value)
                } else {
                    return string(name: key, value: value.toString())
                }
            }

        currentParams += string(name: 'RETRY_COUNT', value: "${nextRetry}")

        build job: env.JOB_NAME,
              parameters: currentParams,
              wait: false

    } else {
        echo "Max retries reached (${maxRetries}). Not retrying."
    }
}

Source: https://github.com/faustobranco/devops-db/blob/master/infrastructure/lib-utilities/vars/retryPipeline.groovy


Example Pipeline

@Library('devopsdb-global-lib') _

def default_maxRetries = 3
def default_baseDelaySeconds = 20

properties([
    parameters([
        string(name: 'ENV', defaultValue: 'dev'),
        booleanParam(name: 'RUN_TESTS', defaultValue: true),
        choice(name: 'REGION', choices: ['eu-central-1', 'eu-west-1', 'eu-north-1']),

        [$class: 'CascadeChoiceParameter',
            name: 'SERVICE',
            choiceType: 'PT_SINGLE_SELECT',
            script: [
                $class: 'GroovyScript',
                script: [script: "return ['payments','billing','orders']", sandbox: true],
                fallbackScript: [script: "return ['fallback']", sandbox: true]
            ]
        ],

        [$class: 'CascadeChoiceParameter',
            name: 'ENDPOINT',
            choiceType: 'PT_SINGLE_SELECT',
            referencedParameters: 'REGION',
            script: [
                $class: 'GroovyScript',
                script: [script: """
                    if (REGION == 'eu-central-1') {
                        return ['api.prod.local', 'api2.prod.local']
                    } else {
                        return ['api.dev.local', 'api2.dev.local']
                    }
                """, sandbox: true],
                fallbackScript: [script: "return ['fallback-endpoint']", sandbox: true]
            ]
        ]
    ])
])

pipeline {
    agent any

    stages {
        stage('Main') {
            steps {
                script {
                    echo "=== PARAMETERS PRINTS ==="
                    params.each { k, v ->
                        echo "${k} = ${v} (type=${v?.getClass()?.getName()})"
                    }                   
                    if (params.RETRY_COUNT.toInteger() < 2) {
                        error "Failing on purpose (simulate flaky dependency)"
                    }
                }
            }
        }
    }

    post {
        failure {
            retryPipeline(
                maxRetries: default_maxRetries,
                baseDelaySeconds: default_baseDelaySeconds
            )
        }
    }
}

Source: https://github.com/faustobranco/devops-db/tree/master/infrastructure/pipelines/retry-pipeline


How the Retry Mechanism Works

  1. Initial build starts with RETRY_COUNT = 0
  2. On failure, post { failure } triggers the retry
  3. Shared Library:
    • increments retry counter
    • waits (sleep)
    • triggers a new build
  4. All parameters are copied and passed forward
  5. New build runs from scratch

Parameter Propagation Explained

Jenkins does not automatically pass parameters between builds.

This solution explicitly:

  • Reads current params
  • Recreates them (string, booleanParam)
  • Overrides RETRY_COUNT
  • Passes them to the next build

This guarantees:

  • identical inputs
  • no fallback to defaults
  • consistent execution

Design Philosophy

A retry must reproduce the same execution context, not recompute it.

This means:

  • No dynamic parameter recalculation
  • No dependency on UI logic
  • No reuse of previous workspace

Conclusion

This retry strategy provides a robust and production-ready approach for handling transient failures in Jenkins pipelines.

By prioritizing:

  • consistency
  • reproducibility
  • simplicity

it ensures that retries behave exactly as expected — reliable, predictable, and fully automated.