Systemd: Fixing %t Path For Non-Root Podman Containers
Introduction
This article addresses a common issue encountered when running Podman containers rootless with systemd, particularly when attempting to execute containers as a system user (UID < 1000). The problem arises due to systemd's interpretation of the %t specifier, which unexpectedly resolves to /run even when the service unit is configured to run as a non-root user with a defined XDG_RUNTIME_DIR. This behavior leads to permission errors, preventing the container from accessing necessary files and directories.
The core problem lies in systemd's handling of the %t specifier within service units, especially when dealing with non-root users. Ideally, when a service is configured to run as a specific user using the User= directive and an XDG_RUNTIME_DIR is defined, %t should resolve to the user's runtime directory (e.g., /run/user/997). However, systemd insists on resolving %t to /run, regardless of the user context. This discrepancy creates significant challenges, particularly when using tools like Podman, which rely on systemd generators to create service units for containers. When these units include paths like %t/%N.cid for storing container ID files, the non-root user lacks the necessary permissions to write to /run, resulting in errors like "permission denied." This issue is particularly problematic because running containers rootless is a recommended security practice, and systemd's behavior effectively undermines this approach when dealing with system users. Addressing this discrepancy requires a deeper understanding of how systemd handles user contexts and specifier resolution, as well as potential workarounds or configuration adjustments to ensure that %t correctly resolves to the user's runtime directory, allowing containers to operate seamlessly without permission conflicts.
The Problem: Incorrect %t Resolution for Non-Root Users
When configuring systemd service units to run Podman containers as non-root users, a common issue arises with the %t specifier. The %t specifier in systemd is intended to represent the runtime directory. However, when a service unit is set to run as a non-root user (using the User= directive) and an XDG_RUNTIME_DIR is defined, %t incorrectly resolves to /run instead of the user's specific runtime directory (e.g., /run/user/997).
Consider the following scenario: You're trying to run a Podman container as a system user (a user with a UID less than 1000) for enhanced security. Podman recommends running containers rootless, and you're using quadlets (Podman's systemd generator) to configure your containers. Your service unit configuration includes:
[Service]
User=containers
Environment=XDG_RUNTIME_DIR=/run/user/997
The intention here is to run the container as the containers user, with its runtime directory set to /run/user/997. However, the systemd generator creates an ExecStart line like this:
ExecStart=/usr/bin/podman run --name=gemstash --cidfile=%t/%N.cid ...
Because %t resolves to /run, the container attempts to create a CID file in /run/gemstash.cid. Since the containers user doesn't have permission to write to /run, you encounter a "permission denied" error. This unexpected behavior contradicts the purpose of running the container as a non-root user and using XDG_RUNTIME_DIR.
Systemd's behavior becomes even more perplexing when considering that the service is explicitly configured to run under a specific user with defined runtime directory. One might expect %t to dynamically adapt to user context, pointing to user's directory when the service runs with reduced privileges. However, systemd's insistence on resolving %t to /run irrespective of user context creates a barrier to running containers securely as intended. By not aligning %t resolution with user's runtime directory, systemd effectively undermines the security benefits of running containers rootless, which hinders adoption of secure containerization practices in system environments. Addressing this mismatch necessitates a deeper understanding of systemd's specifier resolution mechanism and the potential adjustments to align %t behavior with the user context, ultimately ensuring that containers can operate seamlessly within their designated runtime environments without encountering unnecessary permission conflicts.
Why This Matters: Security and Rootless Containers
Running containers rootless is a crucial security practice. It minimizes the potential damage if a container is compromised. If a container runs as root, a successful attack could grant the attacker full control over the host system. Running as a non-root user significantly limits the attacker's capabilities.
Rootless containers enhance security by isolating container processes from the host system. When containers run as a non-root user, they have limited privileges, reducing the potential impact of a security breach. If an attacker gains control of a rootless container, they won't have the same level of access to the host system as they would with a rootful container.
The issue with %t undermines this security model. By forcing the container to attempt writing to /run, which requires root privileges, systemd effectively negates the benefits of running the container as a non-root user. This behavior introduces unnecessary complexities and vulnerabilities into the system. The discrepancy between the intended non-root execution and the enforced root-level access point via /run creates a paradoxical situation that compromises the inherent security advantages of rootless containers. Addressing this misalignment is essential to ensure that the security benefits of rootless containers are not inadvertently nullified by systemd's rigid behavior. By rectifying %t resolution to align with the user context, system administrators can more effectively implement secure containerization practices, minimizing the risk of privilege escalation and enhancing overall system resilience.
Reproducing the Problem
To reproduce this issue, follow these steps:
-
Create a system user (UID < 1000) named
containers. -
Create a quadlet file (e.g.,
gemstash.container) with the following content:[Container] Image=my-gemstash-image [Service] User=containers Environment=XDG_RUNTIME_DIR=/run/user/997 -
Generate the systemd unit using
podman generate systemd --new --files gemstash.container. -
Start the service using
systemctl --user start podman-gemstash.service. -
Observe the error message in the logs:
Error: creating idfile: open /run/gemstash.cid: permission denied.
This error occurs because the generated systemd unit uses %t/%N.cid for the --cidfile option, and %t resolves to /run instead of /run/user/997. This means that the Podman process, running as the containers user, attempts to write to /run, which it doesn't have permission to do.
Potential Solutions and Workarounds
While a direct fix within systemd might be ideal, several workarounds can mitigate this issue:
-
Override the
ExecStartline: Modify the generated systemd unit file to explicitly specify the full path to the CID file within the user's runtime directory. For example:ExecStart=/usr/bin/podman run --name=gemstash --cidfile=/run/user/997/gemstash.cid ...This ensures that the CID file is created in the correct location with the appropriate permissions.
-
Use a Wrapper Script: Create a wrapper script that sets the correct environment variables and then executes the
podman runcommand. This script can then be called from theExecStartline. -
Modify the Quadlet: Although the user in the original bug report says that they aren't free to modify the Quadlet, perhaps by using
ExecPre=andmkdir -pit would solve the issue, though it could be clumsy. -
Utilize User Units: Ensure the service is managed as a user unit rather than a system unit. User units inherently operate within the user's context, which often resolves the pathing issues associated with system-level configurations. Managing the service as a user unit ensures that systemd respects the user's environment variables and permissions, reducing the likelihood of encountering permission-related errors.
-
Employ systemd's Dynamic User Feature: Where feasible, leverage systemd's dynamic user feature. This feature allows services to run under a dynamically created user, providing enhanced isolation and minimizing the need for manual user configuration. By utilizing this feature, system administrators can avoid conflicts related to user permissions and runtime directories, ensuring that the service operates seamlessly within its designated environment.
Conclusion
The incorrect resolution of %t for non-root users in systemd service units poses a significant challenge when running rootless Podman containers. This issue undermines the security benefits of rootless containers and introduces unnecessary complexity. While a proper fix within systemd is desirable, the workarounds described above provide viable solutions to mitigate the problem. By understanding the root cause and implementing appropriate workarounds, you can ensure that your rootless containers run smoothly and securely.
For more information on systemd specifiers, refer to the systemd documentation.