Linux 4.14 Released
14 Nov 2017 tags: audit selinuxLinux v4.14 was released on Sunday, November 12th; this is a quick summary of the SELinux and audit changes.
SELinux
- Driven by the increased use of the No New Privileges (NNP) functionality, a new mechanism was introduced which allows domain transitions when NNP is enabled, or when executing applications on a “nosuid” mounted filesystem. This new mechanism extends the “process” object class to the “process2” class, adding two new permissions to “process2”: “nnp_transition” and “nosuid_transition”. These new permissions allow the policy developer to specify when a domain transition is allowed under NNP or on a nosuid mount, bypassing the bounded relationship requirement. Example SELinux policy is shown below:
allow <old_domain> <new_domain>:process2 { nnp_transition }; allow <old_domain> <new_domain>:process2 { nosuid_transition };
This new functionality is gated by the “nnp_nosuid_transition” policy capability; if the policy capability is disabled, the the existing behavior is preserved. You can check the status of the currently loaded SELinux policy with the following commands:
# cd /sys/fs/selinux/policy_capabilities # ls always_check_network extended_socket_class nnp_nosuid_transition cgroup_seclabel network_peer_controls open_perms # cat nnp_nosuid_transition 1
For more information, you can read the patch’s description:
From: Stephen Smalley
selinux: Generalize support for NNP/nosuid SELinux domain transitions
As systemd ramps up enabling NNP (NoNewPrivileges) for system services, it is increasingly breaking SELinux domain transitions for those services and their descendants. systemd enables NNP not only for services whose unit files explicitly specify NoNewPrivileges=yes but also for services whose unit files specify any of the following options in combination with running without CAP_SYS_ADMIN (e.g. specifying User= or a CapabilityBoundingSet= without CAP_SYS_ADMIN): SystemCallFilter=, SystemCallArchitectures=, RestrictAddressFamilies=, RestrictNamespaces=, PrivateDevices=, ProtectKernelTunables=, ProtectKernelModules=, MemoryDenyWriteExecute=, or RestrictRealtime= as per the systemd.exec(5) man page.
The end result is bad for the security of both SELinux-disabled and SELinux-enabled systems. Packagers have to turn off these options in the unit files to preserve SELinux domain transitions. For users who choose to disable SELinux, this means that they miss out on at least having the systemd-supported protections. For users who keep SELinux enabled, they may still be missing out on some protections because it isn’t necessarily guaranteed that the SELinux policy for that service provides the same protections in all cases.
commit 7b0d0b40cd78 (“selinux: Permit bounded transitions under NO_NEW_PRIVS or NOSUID.”) allowed bounded transitions under NNP in order to support limited usage for sandboxing programs. However, defining typebounds for all of the affected service domains is impractical to implement in policy, since typebounds requires us to ensure that each domain is allowed everything all of its descendant domains are allowed, and this has to be repeated for the entire chain of domain transitions. There is no way to clone all allow rules from descendants to their ancestors in policy currently, and doing so would be undesirable even if it were practical, as it requires leaking permissions to objects and operations into ancestor domains that could weaken their own security in order to allow them to the descendants (e.g. if a descendant requires execmem permission, then so do all of its ancestors; if a descendant requires execute permission to a file, then so do all of its ancestors; if a descendant requires read to a symbolic link or temporary file, then so do all of its ancestors…). SELinux domains are intentionally not hierarchical / bounded in this manner normally, and making them so would undermine their protections and least privilege.
We have long had a similar tension with SELinux transitions and nosuid mounts, albeit not as severe. Users often have had to choose between retaining nosuid on a mount and allowing SELinux domain transitions on files within those mounts. This likewise leads to unfortunate tradeoffs in security.
Decouple NNP/nosuid from SELinux transitions, so that we don’t have to make a choice between them. Introduce a nnp_nosuid_transition policy capability that enables transitions under NNP/nosuid to be based on a permission (nnp_transition for NNP; nosuid_transition for nosuid) between the old and new contexts in addition to the current support for bounded transitions. Domain transitions can then be allowed in policy without requiring the parent to be a strict superset of all of its children.
With this change, systemd unit files can be left unmodified from upstream. SELinux-disabled and SELinux-enabled users will benefit from retaining any of the systemd-provided protections. SELinux policy will only need to be adapted to enable the new policy capability and to allow the new permissions between domain pairs as appropriate.
NB: Allowing nnp_transition between two contexts opens up the potential for the old context to subvert the new context by installing seccomp filters before the execve. Allowing nosuid_transition between two contexts domains are allowed, and this has to be repeated for the entire chain of domain transitions. There is no way to clone all allow rules from descendants to their ancestors in policy currently, and doing so would be undesirable even if it were practical, as it requires leaking permissions to objects and operations into ancestor domains that could weaken their own security in order to allow them to the descendants (e.g. if a descendant requires execmem permission, then so do all of its ancestors; if a descendant requires execute permission to a file, then so do all of its ancestors; if a descendant requires read to a symbolic link or temporary file, then so do all of its ancestors…). SELinux domains are intentionally not hierarchical / bounded in this manner normally, and making them so would undermine their protections and least privilege.
We have long had a similar tension with SELinux transitions and nosuid mounts, albeit not as severe. Users often have had to choose between retaining nosuid on a mount and allowing SELinux domain transitions on files within those mounts. This likewise leads to unfortunate tradeoffs in security.
Decouple NNP/nosuid from SELinux transitions, so that we don’t have to make a choice between them. Introduce a nnp_nosuid_transition policy capability that enables transitions under NNP/nosuid to be based on a permission (nnp_transition for NNP; nosuid_transition for nosuid) between the old and new contexts in addition to the current support for bounded transitions. Domain transitions can then be allowed in policy without requiring the parent to be a strict superset of all of its children.
With this change, systemd unit files can be left unmodified from upstream. SELinux-disabled and SELinux-enabled users will benefit from retaining any of the systemd-provided protections. SELinux policy will only need to be adapted to enable the new policy capability and to allow the new permissions between domain pairs as appropriate.
NB: Allowing nnp_transition between two contexts opens up the potential for the old context to subvert the new context by installing seccomp filters before the execve. Allowing nosuid_transition between two contexts opens up the potential for a context transition to occur on a file from an untrusted filesystem (e.g. removable media or remote filesystem). Use with care.
-
Support for labeling of individual cgroup and cgroup2 files was added using the SELinux genfscon mechanism. In order to use this new functionality the cgroup, or cgroup2, filesystem must be mounted with the “xattr” mount option.
-
Fix a bug where AF_UNIX/SOCK_RAW sockets were not properly assigned the “unix_dgram_socket” object class. This should be noticeable to users of libpcap.
- Minor small changes to the how the kernel allocates SELinux internal memory and how it protects a few internal data structures.
Audit
-
Previous work to make the audit subsystem year 2038 safe in Linux v4.12 resulted in the audit subsystem calling a rather heavyweight clock API in the kernel to generate the audit event timestamp. In this kernel release we return to using a more lightweight clock API, while still ensuring the code remains year 2038 safe.
-
The “AVC INITIALIZED” audit KERNEL record seen at boot on SELinux systems was removed. It did not provide any useful information that couldn’t be found in other audit records emitted at boot.