Recent from talks
Contribute something
Nothing was collected or created yet.
Linux namespaces
View on WikipediaThis article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
| namespaces | |
|---|---|
| Original author | Al Viro |
| Developers | Eric W. Biederman, Pavel Emelyanov, Al Viro, Cyrill Gorcunov et al. |
| Initial release | 2002 |
| Written in | C |
| Operating system | Linux |
| Type | System software |
| License | GPL and LGPL |
Namespaces are a feature of the Linux kernel that partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple namespaces. Examples of such resources are process IDs, host-names, user IDs, file names, some names associated with network access, and inter-process communication.
Namespaces are a required aspect of functioning containers in Linux. The term "namespace" is often used to denote a specific type of namespace (e.g., process ID) as well as for a particular space of names. [1]
A Linux system begins with a single namespace of each type, used by all processes. Processes can create additional namespaces and can also join different namespaces.
History
[edit]This section needs expansion. You can help by adding missing information. (September 2016) |
Linux namespaces were inspired by the wider namespace functionality used heavily throughout Plan 9 from Bell Labs.[2] The Linux Namespaces originated in 2002 in the 2.4.19 kernel with work on the mount namespace kind. Additional namespaces were added beginning in 2006[3] and continuing into the future.
Adequate container support functionality was finished in kernel version 3.8[4][5] with the introduction of User namespaces.[6]
Namespace kinds
[edit]Since kernel version 5.6, there are 8 kinds of namespaces. Namespace functionality is the same across all kinds: each process is associated with a namespace and can only see or use the resources associated with that namespace, and descendant namespaces where applicable. This way, each process (or process group thereof) can have a unique view on the resources. Which resource is isolated depends on the kind of namespace that has been created for a given process group.
Mount (mnt)
[edit]Mount namespaces control mount points. Upon creation the mounts from the current mount namespace are copied to the new namespace, but mount points created afterwards do not propagate between namespaces (using shared subtrees, it is possible to propagate mount points between namespaces[7]).
The clone flag used to create a new namespace of this type is CLONE_NEWNS - short for "NEW NameSpace". This term is not descriptive (it does not tell which kind of namespace is to be created) because mount namespaces were the first kind of namespace and designers did not anticipate there being any others.
Process ID (pid)
[edit]The PID namespace provides processes with an independent set of process IDs (PIDs) from other namespaces. PID namespaces are nested, meaning when a new process is created it will have a PID for each namespace from its current namespace up to the initial PID namespace. Hence, the initial PID namespace is able to see all processes, albeit with different PIDs than other namespaces will see processes with.
The first process created in a PID namespace is assigned the process ID number 1 and receives most of the same special treatment as the normal init process, most notably that orphaned processes within the namespace are attached to it. This also means that the termination of this PID 1 process will immediately terminate all processes in its PID namespace and any descendants.[8]
Network (net)
[edit]Network namespaces virtualize the network stack. On creation, a network namespace contains only a loopback interface. Each network interface (physical or virtual) is present in exactly 1 namespace and can be moved between namespaces.
Each namespace will have a private set of IP addresses, its own routing table, socket listing, connection tracking table, firewall, and other network-related resources.
Destroying a network namespace destroys any virtual interfaces within it and moves any physical interfaces within it back to the initial network namespace.
Inter-process Communication (ipc)
[edit]IPC namespaces isolate processes from SysV style inter-process communication. This prevents processes in different IPC namespaces from using, for example, the SHM family of functions to establish a range of shared memory between the two processes. Instead, each process will be able to use the same identifiers for a shared memory region and produce two such distinct regions.
UTS
[edit]UTS (UNIX Time-Sharing) namespaces allow a single system to appear to have different host and domain names to different processes. When a process creates a new UTS namespace, the hostname and domain of the new UTS namespace are copied from the corresponding values in the caller's UTS namespace.[9]
User ID (user)
[edit]User namespaces are a feature to provide both privilege isolation and user identification segregation across multiple sets of processes, available since kernel 3.8.[10] With administrative assistance, it is possible to build a container with seeming administrative rights without actually giving elevated privileges to user processes. Like the PID namespace, user namespaces are nested, and each new user namespace is considered to be a child of the user namespace that created it.
A user namespace contains a mapping table converting user IDs from the container's point of view to the system's point of view. This allows, for example, the root user to have user ID 0 in the container but is actually treated as user ID 1,400,000 by the system for ownership checks. A similar table is used for group ID mappings and ownership checks.
Control group (cgroup) Namespace
[edit]The cgroup namespace type hides the identity of the control group of which the process is a member. A process in such a namespace, checking which control group any process is part of, would see a path that is actually relative to the control group set at creation time, hiding its true control group position and identity. This namespace type has existed since March 2016 in Linux 4.6.[11][12]
Time
[edit]The time namespace allows processes to see different system times in a way similar to the UTS namespace. It was proposed in 2018 and was released in Linux 5.6, in March 2020.[13]
Proposed namespaces
[edit]syslog namespace
[edit]The syslog namespace was proposed by Rui Xiang, an engineer at Huawei, but wasn't merged into the Linux kernel.[14] systemd implemented a similar feature called “journal namespace” in February 2020.[15]
Administrative hierarchy
[edit]To facilitate privilege isolation of administrative actions, each namespace type is considered owned by a user namespace based on the active user namespace at the moment of creation. A user with administrative privileges in the appropriate user namespace will be allowed to perform administrative actions within that other namespace type. For example, if a process has administrative permission to change the IP address of a network interface, it may do so as long as its own user namespace is the same as (or ancestor of) the user namespace that owns the network namespace. Hence, the initial user namespace has administrative control over all namespace types in the system.[16]
Implementation details
[edit]Namespaces are represented by virtual file objects within the kernel. An open filedescriptor on such a file may be used to associate a process with the corresponding namespace.
Visibility in /proc
[edit]The kernel makes the namespaces of each process visible at /proc/pid/ns/kind. Like all non-file resources in /proc, these can be read as symbolic links, yielding kind:[inode_number], or accessed as ordinary files. (These files are unreadable but are useful in other ways. Their inode numbers match the textual numbers yielded by readlink.) These files are in one-to-one correspondence with namespaces in the kernel, so the inode numbers act as unique identifiers.
As of Linux 6.1.0, kind can be any of cgroup, ipc, mnt, net, pid, time, user, uts. Inheritance of some namespaces can be controlled separately from the effective namespace of the process itself, and that is visible as /proc/pid/ns/kind_for_children.
Syscalls
[edit]Three syscalls can directly manipulate namespaces:
clone, with flags to specify which new namespace the new process should be migrated to.unshare, to disassociate parts of a process's or thread's execution context that are currently being shared with other processes (or threads)setns, to place the current process into the namespace specified by a file descriptor.
Destruction
[edit]If a namespace is no longer referenced, it will be deleted, the handling of the contained resource depends on the namespace kind. A namespace is considered referenced when:
- it has at least one member process;
- it has at least one referenced child namespace; or
- its virtual file (
/proc/pid/ns/kind) is in use, including:- via an open filedescriptor;
- being a process' current directory;
- being a process' root directory; or
- underpinning a bind mount.
Adoption
[edit]Various container software use Linux namespaces in combination with cgroups to isolate their processes, including Docker[17] and LXC.
Other applications, such as Google Chrome make use of namespaces to isolate its own processes which are at risk from attack on the internet.[18]
There is also an unshare wrapper in util-linux. An example of its use is:
SHELL=/bin/sh unshare --map-root-user --fork --pid chroot "${chrootdir}" "$@"
References
[edit]- ^ Heddings, Anthony (2020-09-02). "What Are Linux Namespaces and What Are They Used for?". How-To Geek. Retrieved 2024-08-22.
- ^ "The Use of Name Spaces in Plan 9". 1992. Archived from the original on 2014-09-06. Retrieved 2016-03-24.
- ^ "Linux kernel source tree". kernel.org. 2016-10-02.
- ^ "LKML: Linus Torvalds: Linux 3.8". lkml.org. Retrieved 2024-03-22.
- ^ "Linux_3.8 - Linux Kernel Newbies". kernelnewbies.org. Retrieved 2024-03-22.
- ^ "Namespaces in operation, part 5: User namespaces [LWN.net]".
- ^ "Namespaces in operation, part 3: PID namespaces". lwn.net. 2013-01-16.
- ^ "uts_namespaces(7) - Linux manual page". www.man7.org. Retrieved 2021-02-16.
- ^ "Namespaces in operation, part 5: User namespaces [LWN.net]".
- ^ Heo, Tejun (2016-03-18). "[GIT PULL] cgroup namespace support for v4.6-rc1". lkml (Mailing list).
- ^ Torvalds, Linus (2016-03-26). "Linux 4.6-rc1". lkml (Mailing list).
- ^ "It's Finally Time: The Time Namespace Support Has Been Added To The Linux 5.6 Kernel - Phoronix". www.phoronix.com. Retrieved 2020-03-30.
- ^ "Add namespace support for syslog [LWN.net]". lwn.net. Retrieved 2022-07-11.
- ^ "journal: add concept of "journal namespaces" by poettering · Pull Request #14178 · systemd/systemd". GitHub. Retrieved 2022-07-11.
- ^ "Namespaces in operation, part 5: User namespaces". lwn.net. 2013-02-27.
- ^ "Docker security". docker.com. Retrieved 2016-03-24.
- ^ "Chromium Linux Sandboxing". Archived from the original on 2019-12-19. Retrieved 2019-12-19.
External links
[edit]- namespaces manpage
- Namespaces — The Linux Kernel documentation
- Linux kernel Namespaces and cgroups by Rami Rosen
- Namespaces and cgroups, the basis of Linux containers (including cgroups v2) - slides of a talk by Rami Rosen, Netdev 1.1, Seville, Spain (2016)
- Containers and Namespaces in the Linux Kernel by Kir Kolyshkin
Linux namespaces
View on Grokipediaclone(2), unshare(2), or setns(2) with specific CLONE_NEW* flags, and they can be managed via files in /proc/[pid]/ns/, which serve as handles for joining or querying namespaces.[1] Most namespace types require the CAP_SYS_ADMIN capability, though user namespaces have been unprivileged since Linux 3.8.[1] A namespace persists until its last process exits or it is explicitly unpinned, such as by closing associated file descriptors.[1]
There are eight main types of Linux namespaces, each isolating a distinct set of kernel resources:
- Mount namespace (
CLONE_NEWNS, since Linux 2.4.19): Isolates filesystem mount points, allowing independent mount tables.[2] - UTS namespace (
CLONE_NEWUTS, since Linux 2.6.19): Isolates the hostname and NIS domain name.[2] - IPC namespace (
CLONE_NEWIPC, since Linux 2.6.19): Isolates System V IPC resources and POSIX message queues.[2] - PID namespace (
CLONE_NEWPID, since Linux 2.6.24): Isolates the process ID number space, making processes appear as init (PID 1) in child namespaces.[2] - Network namespace (
CLONE_NEWNET, since Linux 2.6.24): Isolates network devices, IP addresses, ports, routing tables, and firewall rules.[2] - User namespace (
CLONE_NEWUSER, since Linux 3.8): Isolates user and group IDs, enabling unprivileged mapping of root inside the namespace.[2] - Cgroup namespace (
CLONE_NEWCGROUP, since Linux 4.6): Isolates the view of the cgroup root directory and hierarchy.[1] - Time namespace (
CLONE_NEWTIME, since Linux 5.6): Isolates the system boot time and monotonic clocks.[1]
Introduction
Definition and Core Concepts
Linux namespaces are a Linux kernel feature that enable the partitioning of kernel resources, allowing processes in different namespaces to perceive isolated views of global system resources such as processes, network interfaces, mount points, and user identifiers.[1] This abstraction wraps a global system resource in a way that makes it appear to processes within the namespace as if they possess their own private instance, thereby confining changes and interactions to within that namespace.[1] By creating multiple instances of these resources, namespaces facilitate resource separation without duplicating the underlying kernel structures.[3] At their core, Linux namespaces operate by associating processes with specific namespace instances, where each type of namespace targets isolation of a particular resource category.[1] Processes can create new namespaces or join existing ones, typically through kernel interfaces that allow for flexible management of these isolated environments.[3] Namespaces exhibit a hierarchical structure in certain cases, where child namespaces inherit properties from parent namespaces unless explicitly configured for isolation, enabling nested or layered isolation schemes.[1] For instance, in a scenario involving process isolation, a process launched within its own dedicated namespace might perceive itself as process ID 1, viewing only co-namespaced processes and remaining unaware of others on the system, which demonstrates the effectiveness of resource view separation.[3] This capability delivers key benefits, including lightweight virtualization that supports secure multi-tenancy and application containment with minimal overhead compared to full virtual machines, enhancing system security by limiting the scope of untrusted code.[3]Role in Isolation and Virtualization
Linux namespaces play a pivotal role in process isolation by wrapping global system resources in per-process abstractions, allowing processes to perceive private instances of these resources while remaining invisible to those outside the namespace. For example, a process in a dedicated mount namespace sees only its own filesystem hierarchy, preventing interference with the host's mounts, and similarly, network namespaces isolate interfaces and routing tables to avoid cross-process network conflicts. This mechanism ensures that modifications within one namespace do not propagate globally, enhancing security and resource segregation in multi-tenant environments.[1][2] In virtualization, namespaces enable OS-level virtualization, a lightweight alternative to full-system emulation, by partitioning kernel resources without the need for a separate guest kernel. Paired with control groups (cgroups) for resource limiting, namespaces underpin container technologies, permitting multiple isolated user-space instances to run efficiently on a single host kernel, which reduces overhead compared to hypervisor-based systems. This combination supports scalable deployments, as seen in container orchestration platforms where namespaces delineate boundaries for applications.[4][5] Unlike traditional virtualization via hypervisors, which simulates hardware and incurs significant performance costs from running independent OS instances, namespaces achieve isolation at the kernel level, sharing the host OS for greater efficiency and density. Virtual machines provide stronger hardware-level separation but at the expense of resource duplication, whereas namespace-based approaches excel in scenarios requiring rapid provisioning and minimal footprint.[6] Container runtimes typically invoke namespaces during process creation to establish isolated views; for instance, spawning a process with PID and network namespaces simulates a standalone system, where the process tree appears rooted at PID 1 and network traffic is confined to virtual interfaces, all while leveraging the host kernel for execution.[1][2]History
Early Development (2000s)
The early development of Linux namespaces during the 2000s focused on enhancing process isolation within the kernel to support lightweight virtualization, addressing the shortcomings of earlier mechanisms like chroot, which offered limited filesystem isolation and was prone to security escapes. This work was motivated by the growing demand for running multiple isolated environments on a single physical machine, enabling efficient resource sharing in server consolidation, high-performance computing, and secure application deployment without the overhead of hypervisors. Influences included FreeBSD's jails for per-process resource views and Sun Microsystems' Solaris Zones for OS-level virtualization, which demonstrated the benefits of namespace-like separation for security and manageability.[3] Eric W. Biederman played a pivotal role in conceptualizing namespaces as multiple instances of global kernel resources, proposing this framework in his 2006 paper "Multiple Instances of the Global Linux Namespaces" presented at the Ottawa Linux Symposium. The proposal aimed to create distinct views of kernel objects—such as process IDs, network stacks, and user IDs—for groups of processes, facilitating container technologies like those in OpenVZ. Biederman's efforts, initially under Linux Networx and later at Red Hat, emphasized unprivileged user isolation to mitigate privilege escalation risks, laying the groundwork for broader kernel adoption. OpenVZ, originating from Virtuozzo's commercial kernel patches since 2005, contributed early implementations of isolation features akin to namespaces, with developers like Pavel Emelyanov pushing for mainline integration to enable virtual private servers (VPS) with shared kernel resources.[3][7] The initial practical implementations emerged with the mount namespace, introduced by Al Viro in Linux kernel 2.4.19 on August 3, 2002, via the CLONE_NEWNS flag in clone(2). This allowed processes to maintain independent mount tables, isolating filesystem views and enabling chroot-like environments with greater flexibility, inspired by Plan 9's per-process namespaces. Building on this, the PID namespace was added in Linux kernel 2.6.24, released on January 24, 2008, providing separate process ID numbering to prevent PID conflicts across isolated groups and support nested process trees. Biederman's work on user namespaces began around 2005–2006, with prototype patches discussed in kernel mailing lists by 2008, focusing on mapping user and group IDs to enable unprivileged container roots, though full mainline merging occurred later. These foundational additions, often developed through out-of-tree patches from projects like OpenVZ, gradually converged into the upstream kernel, establishing namespaces as a core isolation primitive.[2][8][7]Major Additions and Kernel Integrations
The development of Linux namespaces accelerated in the late 2000s and 2010s, with several key types integrated into the kernel to support advanced isolation features for containerization. The UTS namespace, which isolates hostname and domain name views, was merged in kernel version 2.6.19 in December 2006, though its practical maturity evolved with subsequent refinements in later releases. Similarly, the IPC namespace, providing isolation for interprocess communication resources such as System V IPC objects and POSIX message queues, was also introduced in Linux 2.6.19. The network namespace followed in kernel 2.6.24 in January 2008, enabling separate network stacks, interfaces, and routing tables per namespace, with fuller integration and usability enhancements completed in Linux 3.8 in February 2013.[2] A pivotal addition was the user namespace, developed starting around 2007 by Eric Biederman to enable unprivileged container creation by mapping user and group IDs across namespaces. Despite ongoing security debates on the kernel mailing lists regarding potential privilege escalation risks and the complexity of capability mappings, it was merged into Linux 3.8 in 2013 after extensive review and patching. This integration marked a significant milestone, allowing non-root users to create namespaces without full system privileges, thereby facilitating safer container deployments. Subsequent expansions included the cgroup namespace in Linux 4.6 in May 2016, which virtualizes the view of control groups to prevent container processes from accessing host cgroup hierarchies. The time namespace arrived in Linux 5.6 in March 2020, offering isolated monotonic and boottime clocks with adjustable offsets, primarily to support checkpoint/restore functionality in containers. These additions were largely driven by the rise of container technologies, including the Linux Containers (LXC) project launched in 2008, which relied on namespaces for OS-level virtualization, and Docker's debut in 2013, which popularized namespaces through its lightweight container runtime and influenced kernel discussions on completeness and security. Kernel mailing list threads, particularly around user namespaces, highlighted tensions between isolation benefits and attack surface concerns, leading to iterative security improvements. As of 2025, no major new namespace types have been merged since the time namespace in 2020, with development efforts focusing on refinements such as enhanced user namespace mappings and security mitigations in kernels 5.10 and later. Ongoing proposals, like a syslog namespace for isolating logging resources, remain in discussion without upstream integration.Namespace Types
Mount (mnt) Namespace
The mount namespace, also known as the mnt namespace, provides isolation of the filesystem mount table and tree, ensuring that processes in different mount namespaces perceive distinct sets of mounted filesystems. Each process views only the mounts established within its own namespace, and operations such as mounting or unmounting filesystems affect solely that namespace without impacting others. This isolation allows for independent filesystem hierarchies, where the root directory (/) can vary per namespace, enabling processes to operate within customized root environments.[1][9][2] Key features of the mount namespace include support for bind mounts, which permit remounting a directory subtree at another location within the same or different namespace, facilitating flexible filesystem reconfiguration. It also integrates seamlessly with advanced filesystems like OverlayFS, which leverages mount namespaces to create layered, union-mounted filesystems for read-write overlays on read-only bases, commonly used in container images. Mount propagation mechanisms further enhance control: mounts can be configured as private (MS_PRIVATE, the default), shared (MS_SHARED, for bidirectional propagation), or slave (MS_SLAVE, for unidirectional reception from a master), allowing selective sharing of mount events across related namespaces while maintaining isolation where needed.[9][10][11] A new mount namespace is created by invoking clone(2) or unshare(2) with the CLONE_NEWNS flag, requiring the CAP_SYS_ADMIN capability in the caller's user namespace (except when nested under a user namespace with mapped privileges). To join an existing mount namespace, setns(2) is used with a file descriptor obtained from /proc/[pid]/ns/mnt, which serves as the namespace's identifier and can be bind-mounted to persist it beyond process lifetime. The kernel enforces a per-user limit on mount namespaces via /proc/sys/user/max_mnt_namespaces, with creation failing via ENOSPC if exceeded.[1][12][13] In practice, mount namespaces are essential for container runtimes, such as providing a container with its own /proc and /sys mounts populated from the host but isolated to prevent interference, allowing safe introspection of container-specific kernel interfaces without exposing or altering the host's view. However, limitations arise with mount propagation: if a mount is not explicitly set to private (MS_PRIVATE), shared or slave configurations can cause unintended visibility of mounts across namespaces, potentially leaking filesystem changes unless propagation types are carefully managed at namespace creation or via mount(2) flags. Within a user namespace, this isolation extends to unprivileged mounting, where root privileges map to the caller's user ID in the parent namespace, enabling non-root users to establish private mount trees.[2][11][14]Process ID (pid) Namespace
The process ID (PID) namespace provides isolation of the process ID number space, enabling processes in different namespaces to have identical PIDs without interference. In this namespace, each instance maintains its own view of the process hierarchy, where the init process (PID 1) serves as the root, and processes outside the namespace are invisible to those within it. This isolation ensures that process identifiers, such as those used in system calls like kill(2) or wait(4), are confined to the local namespace, preventing cross-namespace process management.[15] Key features include support for nested hierarchies, allowing up to 32 levels of PID namespaces since Linux 3.7, where child namespaces inherit visibility of parent processes but maintain separate PID assignments. The /proc filesystem reflects this isolation, displaying only processes local to the viewing namespace, while signals sent to PIDs are confined within the namespace unless explicitly handled across boundaries. Additionally, orphaned processes in a namespace are reparented to the namespace's init process rather than the global init.[15] PID namespaces are created using the CLONE_NEWPID flag in clone(2) or unshare(2) system calls, which places new processes into a fresh PID space starting from PID 1. Since Linux 5.3, pidfd_open(2) provides a file descriptor-based handle to a process, facilitating namespace-aware management, such as joining via setns(2) or signaling with pidfd_send_signal(2), without relying on /proc paths. The /proc/sys/kernel/ns_last_pid file tracks the last allocated PID in the current namespace and can be adjusted with appropriate capabilities (CAP_SYS_ADMIN or CAP_CHECKPOINT_RESTORE since Linux 5.9).[15][16] A representative example is in containerization, where the container's init process runs as PID 1 within its PID namespace, unaware of host processes, allowing the container to manage its own process lifecycle independently while the host views the container processes under higher PIDs.[15] Limitations include the inability to namespace the host's global init process (PID 1 in the root namespace), which remains visible and cannot be isolated, potentially requiring careful signal handling to avoid unintended propagation. Furthermore, joining a PID namespace via setns(2) affects only future children of the calling process, not the caller itself, necessitating forking for full immersion. PID namespaces are often combined with control groups (cgroups) to enforce resource limits on isolated process sets.[15][13]Network (net) Namespace
The network namespace in Linux provides isolation for networking resources, allowing processes within a namespace to have their own set of network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewall rules (such as those managed by iptables), and other related configurations.[17] This separation ensures that network operations in one namespace do not interfere with those in another, enabling independent network environments on the same host.[17] Key isolated views include the /proc/net and /sys/class/net directories, which reflect only the resources visible to the namespace, as well as port numbers (to prevent conflicts) and UNIX domain sockets bound to namespace-local paths.[17] Physical network devices are bound to a single namespace at a time; when a device is moved to a new namespace, it becomes invisible in the original one, and freed devices revert to the initial (root) namespace.[17] Virtual devices, particularly virtual Ethernet (veth) pairs, facilitate communication between namespaces by acting as tunnels: one end resides in one namespace and the other in another, with packets transmitted on one immediately received by its peer.[18] These veth pairs support integration with bridges and VLANs, configurable via tools like ip(8) and brctl(8), allowing complex topologies such as bridging a veth endpoint to a physical interface for external connectivity.[17] Firewall rules, including iptables chains, are also namespace-specific, ensuring that filtering and NAT policies apply only within the isolated stack.[17] A new network namespace is created using the clone(2) system call with the CLONE_NEWNET flag, which requires CAP_SYS_ADMIN privilege and results in the child process inheriting a private network stack.[19] Alternatively, the unshare(2) system call with the same flag can detach the calling process into a new namespace.[12] For user-space management, the ip netns tool from the iproute2 package allows creation (e.g.,ip netns add mynet), listing (ip netns list), execution of commands within a namespace (e.g., ip netns exec mynet ip link set lo up), and deletion (ip netns del mynet).[20] veth pairs are created with commands like ip link add veth0 type veth peer name veth1, followed by moving endpoints to namespaces using ip link set veth1 netns mynet.[18]
In a practical example, a container runtime might create a network namespace for a container process, assigning it a private IP address (e.g., 192.168.1.2) via a veth pair connected to a host bridge, ensuring the container has no direct access to the host's network interfaces or routing tables while allowing outbound traffic through the bridge.[17] This setup is commonly used in container networking to provide isolated, virtualized network environments.[17]
Limitations include the fact that physical devices cannot be shared across namespaces simultaneously, and veth pairs are destroyed when their owning namespace is freed.[17] Certain kernel-wide network parameters, such as those in /proc/sys/net (e.g., IPv4/IPv6 configuration inheritance), may propagate from the initial namespace to new ones unless explicitly configured otherwise via sysctls like devconf_inherit_init_net, potentially leading to unintended shared behaviors.[21] Management typically requires the ip netns tool or equivalent, as direct namespace handling is privileged.[20]
Inter-process Communication (ipc) Namespace
The Inter-process Communication (IPC) namespace in Linux isolates System V IPC resources, including shared memory segments (shm), semaphores (sem), and message queues (msg), as well as POSIX message queues, ensuring that these objects are confined to processes within the same namespace.[22] This isolation makes keys and identifiers unique per namespace, preventing processes in different IPC namespaces from accessing or sharing the same IPC objects, even if they use identical keys generated by functions like ftok().[2] For instance, ftok() paths are effectively isolated because the resulting IPC objects remain namespace-bound, avoiding unintended cross-namespace collisions.[22] Additionally, /dev/shm mounts, which often back shared memory, can be namespaced to align with this isolation, providing a separate tmpfs instance per IPC namespace since Linux 2.6.19.[2] Creation of an IPC namespace occurs through system calls such as clone(2) or unshare(2) with the CLONE_NEWIPC flag, which establishes a new namespace for the calling process or its child, requiring the CONFIG_IPC_NS kernel configuration option.[19] This flag affects only System V and POSIX IPC primitives; other communication mechanisms like pipes (isolated via PID namespaces) and Unix domain sockets (handled by network or mount namespaces) remain unaffected.[22] Processes can join an existing IPC namespace using setns(2). When the last process in an IPC namespace exits, all associated IPC objects are automatically destroyed, cleaning up resources without leakage to other namespaces.[22] In practice, IPC namespaces enable containers to utilize shared memory and other IPC mechanisms without contaminating the host system or adjacent containers; for example, a containerized application can create a shared memory segment for internal process coordination that remains invisible and inaccessible to the host kernel or other isolated environments.[2] This complements PID namespaces by providing finer-grained isolation for legacy IPC primitives beyond simple process tree separation.[1] However, limitations arise with older applications designed around global IPC assumptions, where processes expect to communicate across what would now be namespace boundaries, potentially requiring configuration adjustments like sharing the host's IPC namespace (e.g., via container runtime flags) to maintain functionality.[23] In some cases, migrating such applications to fully namespaced environments may necessitate recompilation or modifications to avoid reliance on cross-namespace IPC, particularly if they use hardcoded global identifiers.[2] POSIX message queue support, added in Linux 2.6.30, further mitigates issues for modern applications but highlights the evolutionary constraints on legacy code.[22]UTS Namespace
The UTS namespace provides isolation for system identifiers associated with the Unix Time-sharing System (UTS), specifically the hostname, nodename, and domainname fields returned by theuname(2) system call.[24] This isolation ensures that processes in different UTS namespaces perceive distinct values for these identifiers, enabling independent system identities without global impact.[24] The namespace copies the parent's hostname and NIS (Network Information Service, also known as Yellow Pages or YP) domain name upon creation, allowing subsequent modifications to remain local to the namespace.[24]
Key features include the ability to modify the hostname using sethostname(2) and the domain name using setdomainname(2), with changes visible only to processes sharing the same UTS namespace.[24] This locality is particularly useful for establishing per-process or per-group identities in virtualized environments, such as assigning unique hostnames to isolated workloads.[25] A new UTS namespace is created via the clone(2) or unshare(2) system calls with the CLONE_NEWUTS flag.[24] Inspection occurs through system calls like gethostname(2), getdomainname(2), or uname(2) executed within the namespace, or by examining the namespace handle in /proc/[pid]/ns/uts, which displays the namespace type and inode number (e.g., uts:[4026531838]).[1]
For example, in container orchestration, each container can operate with its own hostname—such as "app-server1"—facilitating service discovery, logging, and application configuration while the host retains its original identity like "prod-host".[25] This per-container hostname isolation simplifies management in multi-tenant setups without requiring full system reconfiguration.[2]
The UTS namespace affects only UTS-specific identifiers, providing isolation for the NIS/YP domain name but not for DNS resolution, which depends on configurations like /etc/[resolv.conf](/page/Resolv.conf) managed through mount or network namespaces.[24] Thus, while hostname-based lookups may appear isolated, broader name resolution remains subject to other namespace interactions.[1]
User (uid) Namespace
The user namespace in Linux isolates the user and group ID number spaces, allowing processes within the namespace to perceive a remapped set of user IDs (UIDs) and group IDs (GIDs) distinct from those on the host system. This isolation enables a process to operate as the superuser (UID 0) inside the namespace while being mapped to an unprivileged UID on the host, thereby containing privilege escalations and enhancing security for containerized or sandboxed environments.[14] The primary purpose is to support unprivileged container execution by allocating subsets of the host's UID/GID ranges to the namespace, preventing processes from accessing or modifying resources outside their mapped range.[26] Key features include the configuration of UID and GID mappings through the/proc/[pid]/uid_map and /proc/[pid]/gid_map files, which define how IDs in the child namespace correspond to IDs in the parent namespace; for example, a mapping might specify that the range 0-65536 in the child maps to 100000-165535 on the host.[14] These mappings support sub-UID and sub-GID allocation, often managed via tools like newuidmap and newgidmap, allowing non-root users to delegate portions of their UID range for nested namespaces.[14] User namespaces enable unprivileged creation since Linux kernel 3.8, provided the kernel is configured with CONFIG_USER_NS=y and the creating process has appropriate sub-UID/GID ranges allocated in /etc/subuid and /etc/subgid.[2]
User namespaces are created using the clone(2) or unshare(2) system calls with the CLONE_NEWUSER flag, which establishes a new namespace as the first step if combined with other namespace creation flags; privileged processes require CAP_SYS_ADMIN, but unprivileged creation relies on the aforementioned kernel configuration and ID mappings.[14] For instance, a container process with root privileges inside the namespace (UID 0) can be mapped to a host UID such as 1000, ensuring that any attempts to access host resources are restricted to that non-privileged identity and mitigating potential privilege escalations.
Despite these benefits, user namespaces have limitations, including early security vulnerabilities such as symlink attacks exploitable before kernel 4.2 due to incomplete ID mapping enforcement in filesystem operations (e.g., CVE-2013-1858).[27] Additionally, not all system calls and kernel interfaces fully respect UID mappings; for example, early implementations had issues with setuid binaries that were resolved in subsequent kernels through improved capability checks and VFS adjustments.[27] As of 2025, enhancements in Linux kernel 6.x series, including refined idmapped mount support and stricter capability bounding, have bolstered container security by better integrating user namespace mappings with filesystem permissions.[28] User namespaces complement mount namespaces in handling setuid binaries by applying ID remapping to filesystem views.[28]
Control Groups (cgroup) Namespace
The control groups (cgroup) namespace in Linux provides isolation of the view of the cgroup hierarchy for processes within the namespace, ensuring that they perceive only their own subtree as the root of the hierarchy rather than the full host system structure.[29] This virtualization hides the host's cgroup organization from containerized or isolated processes, preventing information leakage about the broader system and enhancing abstraction in environments like containers.[29] By remapping cgroup paths to be relative to the namespace's root, it allows processes to operate as if their local cgroup is the global root, which is particularly useful for security and migration scenarios.[30] Introduced in Linux kernel version 4.6 in 2016, the cgroup namespace supports both cgroup v1 and v2 hierarchies, with the latter having been unified starting in kernel 4.5.[31] Key features include the modification of views in/proc/<pid>/cgroup, which displays cgroup paths relative to the namespace root, and adjustments to /proc/<pid>/mountinfo to reflect only the visible cgroup mountpoints.[29] For instance, a process outside the namespace might see a full path like 0::/user.slice/user-1000.slice/session-1.scope, while inside, it appears as 0::/.[29] This isolation applies to cgroup roots but does not alter the underlying resource controls themselves.[29]
Creation of a cgroup namespace occurs via the clone(2) or unshare(2) system calls using the CLONE_NEWCGROUP flag, where the calling process's current cgroup becomes the root for the new namespace.[29] Joining an existing namespace is possible with setns(2), provided the process has CAP_SYS_ADMIN capability in the target namespace.[29] Upon creation, mountpoints such as /sys/fs/cgroup are affected, showing only the namespace-local view, which may require remounting specific cgroup filesystems (e.g., mount -t cgroup -o freezer none /sys/fs/cgroup/freezer) for full visibility within the isolated context.[29]
In practice, this namespace enables containers to remain unaware of the host's resource controllers; for example, a container process might see its cgroup path as /docker/<container_id> as the root, abstracting away host-level hierarchies like systemd slices and improving portability without exposing sensitive system details.[29] This complements the user namespace by providing a fuller isolation layer for resource-related views, though it requires a kernel configured with CONFIG_CGROUPS.[29]
Limitations include the fact that the namespace does not isolate or modify actual resource limits or accounting, which are handled by the cgroups mechanism itself rather than the namespace virtualization.[29] Additionally, while it supports cgroup v2's unified hierarchy from kernel 4.5 onward, older v1 setups may exhibit inconsistencies in mount visibility without explicit remounting.[32] Processes cannot migrate outside their namespace root, enforcing the isolation but potentially complicating certain administrative tasks.[30]
Time Namespace
The time namespace in Linux isolates specific time-related counters, virtualizing the CLOCK_MONOTONIC (including its COARSE and RAW variants) and CLOCK_BOOTTIME (including ALARM) clocks to provide per-namespace offsets, while the wall-clock time (CLOCK_REALTIME) remains shared globally.[33] This isolation ensures that processes in different namespaces perceive distinct views of monotonic time progression and boot-time elapsed, which is particularly valuable for maintaining time consistency during container migration, checkpoint/restore operations, and process freezing without affecting the host system's real-time clock.[33] The feature was merged into the Linux kernel in version 5.6, released in March 2020, and requires the kernel to be configured with the CONFIG_TIME_NS option.[34] Key system calls affected by time namespaces include clock_gettime(2), clock_nanosleep(2), nanosleep(2), timer_settime(2), and timerfd_settime(2), all of which return or use the offset-adjusted time values specific to the calling process's namespace; similarly, the /proc/uptime file reflects namespace-specific uptime.[33] Offsets for these clocks are managed via the /proc//timens_offsets file, where each line specifies a clock ID followed by seconds and nanoseconds to add (e.g., "1 172800 0" to offset CLOCK_MONOTONIC by two days), inheriting from the parent namespace upon creation and remaining fixed once the first process enters the namespace.[33] Setting offsets requires the CAP_SYS_TIME capability in the user namespace, with limits ensuring offsets do not result in negative times or exceed approximately 146 years to prevent overflow.[33]
Time namespaces are created by invoking unshare(2) with the CLONE_NEWTIME flag, which enters child processes into the new namespace while leaving the calling process in its original one; the namespace can also be referenced and joined via the /proc//ns/time symlink.[33] For example, in container testing or scheduling scenarios, a namespace might apply a zero offset to CLOCK_MONOTONIC to effectively pause time perception for paused processes, or add a custom offset to simulate accelerated testing environments without altering the host's global time.[33] In checkpoint/restore tools like CRIU, time namespaces enable restoring containers with adjusted boottime and monotonic offsets to match the checkpointed state, supporting seamless migration.[35]
Limitations include the inability to virtualize CLOCK_REALTIME or gettimeofday(2), meaning wall time and certain legacy interfaces remain unisolated, and offsets cannot be modified after namespace population to avoid inconsistencies for existing processes.[33] This design prioritizes safety in multi-process environments but restricts full time virtualization, making it unsuitable for scenarios requiring isolated real-time clocks, such as full virtual machine emulation.[33]Implementation Details
System Calls and Commands
Linux namespaces are managed primarily through kernel system calls that allow processes to create, join, or unshare namespaces. The clone(2) system call is used to create a new process while specifying one or more new namespaces for the child process via the CLONE_NEW* flags in its flags argument.[1] These flags include CLONE_NEWNS for mount namespaces, CLONE_NEWPID for process ID namespaces, CLONE_NEWNET for network namespaces, CLONE_NEWUTS for UTS namespaces, CLONE_NEWIPC for IPC namespaces, CLONE_NEWUSER for user namespaces, CLONE_NEWCGROUP for cgroup namespaces, and CLONE_NEWTIME for time namespaces.[1] Multiple flags can be combined bitwise in a single clone(2) call to create several new namespaces simultaneously for the child process.[1] The unshare(2) system call enables a process to disassociate parts of its execution context from shared resources, effectively moving the calling process into new namespaces without forking a child.[12] It accepts the same CLONE_NEW* flags as clone(2) to specify which namespaces to unshare and enter.[12] For example, unshare(2) with CLONE_NEWNET would create and join a new network namespace for the caller.[36] To enter an existing namespace, the setns(2) system call is employed, which joins the calling process to a namespace specified by a file descriptor obtained from the /proc filesystem.[1] The nstype argument in setns(2) indicates the type of namespace to join, using the same CLONE_NEW* constants for verification.[1] This allows processes to migrate into namespaces created by other processes. Namespaces are exposed in user space through the /proc//ns/ directory, where each namespace type appears as a symbolic link or bind-mountable file descriptor representing the namespace's inode.[1] These entries, such as /proc/
/ns/net for the network namespace, can be opened to obtain file descriptors for use with setns(2).[1] Processes sharing the same namespace have identical inode numbers for these entries.[1]
User-space tools facilitate namespace management without direct system call invocation. The unshare command from the util-linux package invokes the unshare(2) system call based on command-line options corresponding to namespace types, then executes a specified program in the new namespaces.[37] For instance,unshare --net --fork /bin/bash creates a new network namespace and forks a shell into it.[37] The nsenter command, also from util-linux, uses setns(2) to enter specified namespaces of a target process or PID and run a command therein.[38] It supports options like -t to target a process and -n for network namespaces.[38]
For network namespaces specifically, the ip netns subcommand from the iproute2 package provides utilities to add, delete, list, and execute commands within named network namespaces. Commands such asip netns add <name> create a new network namespace, while ip netns exec <name> <cmd> runs Namespace Hierarchy and Joining
Linux namespaces are organized in a hierarchical tree structure for each namespace type, where child namespaces are nested within parent namespaces. This hierarchy ensures that namespaces form a forest across the system, with the global (root) namespace serving as the top-level parent for all types. For PID and user namespaces specifically, the structure is explicitly hierarchical, allowing a namespace to persist as long as it has active child namespaces or, in the case of user namespaces, owns subordinate non-user namespaces.[1] When a new process is created viafork(2), it inherits all of its parent's namespaces by default, maintaining continuity in the hierarchy unless explicitly overridden during creation.[1]
Processes can join existing namespaces to alter their view of system resources, enabling peer relationships outside the default parent-child inheritance. The setns(2) system call facilitates this by allowing a process to reassociate itself with a target namespace, specified by a file descriptor obtained from /proc/[pid]/ns/[type] entries. This fd-based approach enhances safety by avoiding direct path manipulations and permitting atomic joins for multiple namespace types when using a PID file descriptor (available since Linux 5.8). For instance, a process might join a different network namespace while remaining in its original PID namespace, demonstrating how processes can belong to distinct namespaces across types simultaneously—such as operating in the host's PID space but an isolated network environment. However, joining imposes restrictions: for PID namespaces, the target must be a descendant or the same as the caller's; user namespace joins require appropriate capabilities like CAP_SYS_ADMIN in the target.[13][1]
This multi-namespace capability allows fine-grained isolation, where a single process views resources through a combination of inherited and joined namespaces, without requiring a full hierarchical shift. Forked children thus start in the same set as their parent, but subsequent setns(2) calls or unshare(2) operations can create or enter peers, forming branches in the per-type tree.[1][13]
To inspect namespace hierarchies and memberships, tools and interfaces provide visibility into the tree structure and per-process affiliations. The lsns(1) command from util-linux lists all accessible namespaces system-wide, displaying details like namespace ID (inode number), type, number of processes, owner, and command, which helps trace hierarchical relationships by inode comparisons. For per-process inspection, the /proc/[pid]/ns/ directory contains symbolic links for each namespace type (e.g., pid, net), where matching device IDs and inodes indicate shared membership; the /proc/[pid]/status file further reports the PID namespace via the NSpid field. These mechanisms allow administrators to map the overall tree and verify joins without kernel modifications.[39][40]
Creation, Destruction, and Lifecycle
Linux namespaces are primarily created through two mechanisms: implicitly, when a new process is forked using theclone system call with CLONE_NEW* flags specifying the desired namespace types, which allocates fresh namespace instances for the child process; or explicitly, when an existing process calls unshare with the same flags to detach itself from its current namespaces and enter new ones. These operations are handled by the kernel's namespace subsystem, which initializes the appropriate structures based on the flags provided. While most namespace types require the CAP_SYS_ADMIN capability within the creating process's user namespace to prevent unauthorized isolation, user namespaces can be created by unprivileged users since Linux kernel version 3.8, enabling safer experimentation with container-like isolation.[1]
Destruction of namespaces is handled automatically by the kernel through a reference-counting mechanism, ensuring resources are reclaimed only when no entities depend on the namespace. A namespace remains alive as long as at least one process is bound to it, or while it is pinned by open file descriptors—typically obtained from entries in /proc/[pid]/ns/—or by bind mounts of those descriptors. When the final reference is released, such as upon the last process exiting or closing the pinning file descriptor, the kernel decrements the count to zero and frees the namespace's associated data structures. In cases of namespace hierarchies, like those formed by PID or user namespaces, a parent namespace persists until all descendant namespaces and their processes are gone, preventing premature cleanup.[1]
The lifecycle of a namespace is tightly coupled to process management and reference tracking within the kernel. Key events include process creation and termination, which can alter reference counts, and propagation rules that dictate how changes in one namespace (such as mount operations) may affect related namespaces under specific sharing configurations. Zombie or orphaned namespaces—those with no active processes but lingering references—are automatically cleaned up by the kernel upon reference release, avoiding indefinite resource retention. This design ensures efficient memory and kernel object reuse, with the nsfs pseudo-filesystem facilitating visibility into namespace states via /proc.[1]
Management of namespaces during their lifecycle often involves obtaining namespace file descriptors (nsfds) from /proc/[pid]/ns/ entries, which enable operations like joining namespaces or monitoring their status without direct process attachment. Process file descriptors (pidfds) can complement this by allowing signaling of processes within specific namespaces. However, capabilities such as CAP_SYS_ADMIN are typically required for creation, joining, or modification to enforce security boundaries. Unprivileged users face additional constraints: since Linux 4.9, per-user limits on namespace creation (e.g., maximum number of each type) are enforced via tunable files in /proc/sys/user/, charged recursively across nested user namespaces to curb potential denial-of-service from excessive allocations.[1][41]
Adoption and Applications
Container Technologies
Linux namespaces form the foundational isolation mechanism in modern container technologies, enabling lightweight virtualization by segregating processes into distinct views of system resources. Their adoption began with Linux Containers (LXC) in 2008, which first combined namespaces with control groups to create user-space containers that mimic full operating system environments without requiring a separate kernel.[42] Docker significantly popularized this approach in 2013 by introducing an accessible tooling layer atop LXC's primitives, shifting containerization from niche server management to widespread application deployment.[43] By 2025, namespaces underpin over 90% of container-based deployments, reflecting the surge in cloud-native architectures where 89% of organizations report substantial use of such techniques.[44] In Docker, namespaces are integrated through the libcontainer library, which has evolved into the runc runtime under the Open Container Initiative (OCI). Runc employs system calls likeclone()—with flags such as CLONE_NEWPID and CLONE_NEWNET—and unshare() to instantiate and manage namespaces, creating isolated scopes for container processes. By default, Docker activates the core namespaces of PID for process IDs, network for interfaces and routing tables, mount for filesystem hierarchies, UTS for hostname and domain details, and IPC for inter-process communication (user namespaces require daemon-level configuration for UID/GID mappings).[45] This setup ensures containers operate in a self-contained environment, with root privileges remapped to non-privileged users on the host for enhanced security when user namespaces are enabled. Configuration flexibility includes flags like --network=host, which bypasses the network namespace entirely, allowing the container to utilize the host's networking stack directly for scenarios requiring low-latency access to host ports. The time namespace is not used by default.
Kubernetes builds on these primitives by incorporating namespaces into pod sandboxes, which provide a secure boundary for co-located containers sharing resources like volumes and networks while isolating them from other pods and the host. Pod sandboxes leverage namespaces for PID, network, IPC, and user isolation—as of Kubernetes v1.33 (April 2025), user namespaces are enabled by default when stack requirements are met—managed through compliant runtimes such as CRI-O—which focuses on OCI standards and uses runc for execution—and containerd, a high-level runtime that handles container lifecycle operations including namespace setup.[46] Kubernetes NetworkPolicies further utilize network namespaces to define fine-grained traffic controls, selecting pods or entire namespaces via labels to permit or deny ingress/egress flows, thereby enforcing isolation between workloads in multi-tenant clusters.[47]
The performance impact of namespaces in container technologies remains negligible, with benchmarks showing less than 1% overhead in CPU and I/O operations compared to bare-metal execution, primarily due to the kernel-level efficiency of namespace switching. Namespaces are typically complemented by control groups for resource limiting, ensuring balanced scalability in production environments.[48]
