Fix: Docker Cleanup Fails With Package Not Found

Alex Johnson
-
Fix: Docker Cleanup Fails With Package Not Found

The automated Docker cleanup workflow is crucial for maintaining a clean and efficient container registry. However, a persistent "Package not found" error can disrupt this process, causing unnecessary failures despite the successful deletion of old images. This article delves into the root cause of this issue, examines the current state of the container registry, and provides several recommended solutions to resolve this problem effectively. Understanding and implementing these solutions will ensure a smooth and error-free cleanup process, keeping your container registry optimized and manageable.

Problem

The Docker cleanup workflow, specifically the one found at https://github.com/SlavaMelanko/smela-front/actions/workflows/docker-cleanup.yml, is designed to run weekly. However, it consistently reports as failed, even when it's functioning correctly by deleting old Docker images. Recent failures include:

  • 2025-11-09: Failed after deleting 39 versions
  • 2025-11-02: Failed after deleting 38 versions
  • 2025-10-26: Failed after 12 seconds

The error message associated with these failures is:

[error]delete version API failed. Package not found.
38 versions deleted till now.

This perplexing situation indicates a discrepancy between the actual outcome (successful deletion) and the reported status (failure). To thoroughly address this, it's critical to investigate the underlying cause and implement a solution that ensures accurate workflow reporting.

The key to understanding this issue lies in the interaction between the deletion process and the API calls used to manage package versions. When the workflow attempts to delete multiple versions in quick succession, it can lead to timing conflicts. The workflow might attempt to delete a package version that has already been removed by a previous step, resulting in the "Package not found" error. This error, though technically accurate, doesn't reflect the overall success of the cleanup operation, leading to the misleading failure status. Addressing this requires careful consideration of how the deletion process is managed and how errors are handled within the workflow.

Furthermore, the weekly recurrence of these failures underscores the need for a robust and reliable solution. A temporary fix might address the immediate issue, but a long-term solution should prevent future occurrences. This involves not only addressing the error handling within the workflow but also optimizing the deletion process to minimize the risk of timing conflicts. By implementing a comprehensive approach, the Docker cleanup workflow can be transformed from a source of frustration into a dependable tool for maintaining a clean and efficient container registry.

Current State

The current state of the container registry, accessible at https://github.com/SlavaMelanko/smela-front/pkgs/container/smela-front-ci, reveals the following:

  • 55 total package versions
  • 47 untagged versions (layer manifests from multi-arch builds)
  • 8 tagged versions (dev, main, dev-*, main-*)

This distribution of package versions highlights the importance of the cleanup workflow. The high number of untagged versions, which are essentially layer manifests from multi-arch builds, can quickly accumulate and consume significant storage space. Without a regular cleanup process, the container registry can become cluttered and inefficient, leading to slower image pulls and increased storage costs.

The tagged versions, on the other hand, represent the stable and actively used images. These versions are typically associated with specific releases or branches, such as dev and main. It's crucial to retain these tagged versions to ensure that the application can be deployed and rolled back to previous states as needed. Therefore, the cleanup workflow must be carefully configured to preserve these important images while effectively removing the unnecessary untagged versions.

The balance between retaining tagged versions and removing untagged versions is critical for maintaining a healthy container registry. An overly aggressive cleanup policy could inadvertently delete necessary images, while a too conservative approach could lead to excessive storage consumption. The ideal cleanup workflow should be intelligent enough to distinguish between important and disposable images, ensuring that the registry remains both lean and functional. This requires a clear understanding of the tagging conventions used and the role of each image in the overall application lifecycle.

Root Cause Analysis

The root cause of the failed workflow, despite its successful deletion of images, is a race condition within the actions/delete-package-versions@v5 action. This issue arises from the way the action handles bulk deletions and stale version ID references.

  1. The first cleanup step successfully deletes approximately 38-39 untagged versions.
  2. The action continues iterating through its package list.
  3. It attempts to delete a package version that was already deleted (stale ID).
  4. The GitHub API returns a 404 "Package not found" error.
  5. The workflow incorrectly marks itself as failed, even though the intended cleanup was successful.

This behavior is a known issue with the action. The bulk deletion process can lead to stale version ID references, causing the action to attempt to delete already-deleted packages. This results in the 404 error and the subsequent workflow failure. The core problem lies in the lack of synchronization between the deletion process and the action's internal state, leading to inconsistencies and errors.

To mitigate this race condition, several strategies can be employed. One approach is to introduce error handling that allows the workflow to continue even if a "Package not found" error is encountered. Another strategy is to limit the scope of each deletion operation, reducing the likelihood of stale version ID references. By carefully managing the deletion process and handling errors gracefully, the workflow can be made more resilient to the race condition and provide more accurate reporting of its success or failure.

Furthermore, understanding the asynchronous nature of the deletion process is crucial for addressing this issue. The GitHub API might not immediately reflect the changes made by a deletion request, leading to inconsistencies in the action's view of the package versions. This delay can exacerbate the race condition, as the action might attempt to delete a package version that is still reported as existing but is in the process of being removed. Therefore, any solution should consider the potential for delays and ensure that the action's state is synchronized with the actual state of the container registry.

Additional Issue

There's also a logic problem in the second workflow step, which aims to delete old feature branch tagged images:

- name: Delete old feature branch tagged images
  uses: actions/delete-package-versions@v5
  with:
    package-name: 'smela-front-ci'
    package-type: 'container'
    min-versions-to-keep: 0
    ignore-versions: '^(dev|main|dev-.*|main-.*){{content}}#39;
    delete-only-pre-release-versions: false

This step is ineffective because:

  • ignore-versions protects all dev-* and main-* tags.
  • All our images follow this pattern.
  • Therefore, nothing gets deleted by this step.

The ignore-versions parameter is intended to prevent the deletion of specific tagged versions. However, in this case, it's configured to ignore all tags that match the pattern ^(dev|main|dev-.*|main-.*)$. Since all the images in the container registry follow this pattern, the step effectively excludes all images from deletion. This renders the step useless and contributes to the overall inefficiency of the workflow.

To rectify this issue, the ignore-versions parameter needs to be adjusted to target only the specific tags that should be protected. If the intention is to delete old feature branch images, the ignore-versions parameter should be modified to exclude only the dev and main tags. This would allow the step to effectively delete the remaining feature branch images while preserving the important release versions. Alternatively, the step could be removed entirely if the goal is to only delete untagged images, as the first step already handles this task.

Furthermore, clarifying the purpose of this step is crucial for determining the appropriate solution. If the intention is to delete old feature branch images, then the ignore-versions parameter needs to be adjusted accordingly. However, if the intention is to delete only untagged images, then this step is redundant and can be safely removed. By clearly defining the goals of each step in the workflow, the overall process can be streamlined and made more efficient.

Recommended Solution

Here are several options to resolve the issues discussed:

Option 1: Add continue-on-error (Quick Fix)

- name: Delete old untagged images
  uses: actions/delete-package-versions@v5
  continue-on-error: true  # ← Add this
  with:
    package-name: 'smela-front-ci'
    package-type: 'container'
    min-versions-to-keep: 8
    delete-only-untagged-versions: true

This option adds continue-on-error: true to the first step. This tells the workflow to continue even if the actions/delete-package-versions@v5 action returns an error. While this will mask the "Package not found" error and allow the workflow to complete successfully, it doesn't address the underlying race condition. This is a quick and easy fix, but it's not the most robust solution.

Option 2: Limit Deletion Scope (Safer)

- name: Delete old untagged images
  uses: actions/delete-package-versions@v5
  with:
    package-name: 'smela-front-ci'
    package-type: 'container'
    min-versions-to-keep: 8
    delete-only-untagged-versions: true
    num-old-versions-to-delete: 20  # ← Add this to limit batch size

This option adds num-old-versions-to-delete: 20 to the first step. This limits the number of versions that the action will attempt to delete in a single run. By reducing the batch size, the likelihood of encountering stale version ID references is reduced, thereby mitigating the race condition. This is a safer solution than Option 1, as it addresses the underlying issue to some extent. However, it might not completely eliminate the "Package not found" error, especially if there are a large number of untagged versions to delete.

Option 3: Simplify Workflow (Recommended)

Remove the ineffective second step and add continue-on-error to the first:

jobs:
  cleanup:
    runs-on: ubuntu-latest
    permissions:
      packages: write

    steps:
      - name: Delete old untagged images
        uses: actions/delete-package-versions@v5
        continue-on-error: true
        with:
          package-name: 'smela-front-ci'
          package-type: 'container'
          min-versions-to-keep: 8
          delete-only-untagged-versions: true

This is the recommended solution because it addresses both the race condition and the ineffective second step. By removing the second step, the workflow is simplified and made more efficient. By adding continue-on-error: true to the first step, the workflow is allowed to complete successfully even if a "Package not found" error is encountered. This solution is the most robust and provides the best balance between simplicity and reliability.

Action Items

To implement the recommended solution, follow these steps:

  • [ ] Apply the recommended solution (Option 3).
  • [ ] Test the workflow manually via workflow_dispatch.
  • [ ] Verify that old untagged images are still being cleaned up.
  • [ ] Monitor that only the latest dev and main tagged versions are retained.

Impact

The impact of implementing the recommended solution will be significant:

  • Current: Workflow appears broken but is actually working.
  • After Fix: Workflow shows as successful and continues cleaning up old images.
  • Result: Container registry stays clean with only necessary images retained.

By resolving the "Package not found" error and streamlining the workflow, the container registry will be maintained more effectively, ensuring that only the necessary images are retained and that storage space is optimized. This will lead to improved performance, reduced costs, and a more manageable container registry.

For more information on Docker container registries, visit the official Docker documentation: https://docs.docker.com/registry/introduction/

You may also like