2017 Year In Review

With 2017 coming to an end in a little over a week, it’s a good time to look back on what the SELinux, audit, and libseccomp projects have accomplished this year, and recognize the contributors that made it all possible.

In 2017 we had five Linux Kernel releases, one SELinux userspace release, ten audit userspace releases, and two libseccomp releases.

An Open Source project is only as good as it’s contributors, so I want to thank everyone who contributed code in 2017, as well as those who contributed code that hasn’t yet made it into the main repositories (unfortunately not represented in the lists below).

Contributors to the SELinux kernel and userspace code bases (sorted by number of commits).

Nicolas Iooss
Stephen Smalley
James Carter
Petr Lautrbach
Markus Elfring
Alan Jenkins
Daniel Jurgens
Vit Mojzis
Jason Zaman
Richard Haines
Jan Zarsky
Dan Walsh
Steve Lawrence
James Morris
Colin Ian King
Florian Westphal
Patrick Steinhardt
Chenbo Feng
Corentin LABBE
Eric W. Biederman
Antonio Murdaca
Kees Cook
Luis Ressel
Jeff Vander Stoep
Matthias Kaehlcke
Ingo Molnar
Gary Tierney
Dan Cashman
Tom Cherry
Christian Göttsche
Miroslav Grepl
Nick Kralevich
Guido Trentalancia
Greg Kroah-Hartman
Paul Moore
Kyeongdon Kim
Richard Guy Briggs
Arvind Yadav
Michal Hocko
Julien Gomes
Scott Mayhew
Junil Lee
Al Viro
Tetsuo Handa
Eric Biggers
Dan Carpenter
David Ahern
Alexander Potapenko
Alexey Dobriyan
Dave Jiang
Krister Johansen
Casey Schaufler
Yongqin Liu
Lukas Vrabec
Grégoire Colbert
Laurent Bigonville
Bernhard M. Wiedemann
Colin Walters
Nikola Forró
Ville Skyttä
Lokesh Mandvekar
Thomas Petazzoni
Karl MacMillan
Sandeep Patil


Contributors to the audit kernel code base (sorted by number of commits). Unfortunately I’m unable to include the audit userspace contributors as the audit userspace git log is not a reliable source of contributor information for 2017.

Paul Moore
Jan Kara
Richard Guy Briggs
Elena Reshetova
Nicholas Mc Guire
Steve Grubb
Greg Kroah-Hartman
Deepa Dinamani
Casey Schaufler
Geliang Tang
Mel Gorman
Tyler Hicks
Shu Wang
Derek Robson
Johannes Berg


Contributors to the main libseccomp code base as well as the Golang and artwork repositories (sorted by number of commits).

Paul Moore
Tyler Hicks
Matthew Heon
Jay Guo
Tobias Klauser
Luca Bruno
valoq
Vladimir Rutsky
Justin Cormack
NODA, Kai
K.C. Wong
Kyle R. Conway

A big thanks from me to all of you! I hope you have a safe, happy, and exciting 2018.

Linux 4.14 Released

Linux v4.14 was released on Sunday, November 12th; this is a quick summary of the SELinux and audit changes.

SELinux

  • Driven by the increased use of the No New Privileges (NNP) functionality, a new mechanism was introduced which allows domain transitions when NNP is enabled, or when executing applications on a “nosuid” mounted filesystem. This new mechanism extends the “process” object class to the “process2” class, adding two new permissions to “process2”: “nnp_transition” and “nosuid_transition”. These new permissions allow the policy developer to specify when a domain transition is allowed under NNP or on a nosuid mount, bypassing the bounded relationship requirement. Example SELinux policy is shown below:
    allow <old_domain> <new_domain>:process2 { nnp_transition };
    allow <old_domain> <new_domain>:process2 { nosuid_transition };
    

    This new functionality is gated by the “nnp_nosuid_transition” policy capability; if the policy capability is disabled, the the existing behavior is preserved. You can check the status of the currently loaded SELinux policy with the following commands:

    # cd /sys/fs/selinux/policy_capabilities
    # ls
    always_check_network  extended_socket_class  nnp_nosuid_transition
    cgroup_seclabel       network_peer_controls  open_perms
    # cat nnp_nosuid_transition
    1
    

    For more information, you can read the patch’s description:

    From: Stephen Smalley

    selinux: Generalize support for NNP/nosuid SELinux domain transitions

    As systemd ramps up enabling NNP (NoNewPrivileges) for system services, it is increasingly breaking SELinux domain transitions for those services and their descendants. systemd enables NNP not only for services whose unit files explicitly specify NoNewPrivileges=yes but also for services whose unit files specify any of the following options in combination with running without CAP_SYS_ADMIN (e.g. specifying User= or a CapabilityBoundingSet= without CAP_SYS_ADMIN): SystemCallFilter=, SystemCallArchitectures=, RestrictAddressFamilies=, RestrictNamespaces=, PrivateDevices=, ProtectKernelTunables=, ProtectKernelModules=, MemoryDenyWriteExecute=, or RestrictRealtime= as per the systemd.exec(5) man page.

    The end result is bad for the security of both SELinux-disabled and SELinux-enabled systems. Packagers have to turn off these options in the unit files to preserve SELinux domain transitions. For users who choose to disable SELinux, this means that they miss out on at least having the systemd-supported protections. For users who keep SELinux enabled, they may still be missing out on some protections because it isn’t necessarily guaranteed that the SELinux policy for that service provides the same protections in all cases.

    commit 7b0d0b40cd78 (“selinux: Permit bounded transitions under NO_NEW_PRIVS or NOSUID.”) allowed bounded transitions under NNP in order to support limited usage for sandboxing programs. However, defining typebounds for all of the affected service domains is impractical to implement in policy, since typebounds requires us to ensure that each domain is allowed everything all of its descendant domains are allowed, and this has to be repeated for the entire chain of domain transitions. There is no way to clone all allow rules from descendants to their ancestors in policy currently, and doing so would be undesirable even if it were practical, as it requires leaking permissions to objects and operations into ancestor domains that could weaken their own security in order to allow them to the descendants (e.g. if a descendant requires execmem permission, then so do all of its ancestors; if a descendant requires execute permission to a file, then so do all of its ancestors; if a descendant requires read to a symbolic link or temporary file, then so do all of its ancestors…). SELinux domains are intentionally not hierarchical / bounded in this manner normally, and making them so would undermine their protections and least privilege.

    We have long had a similar tension with SELinux transitions and nosuid mounts, albeit not as severe. Users often have had to choose between retaining nosuid on a mount and allowing SELinux domain transitions on files within those mounts. This likewise leads to unfortunate tradeoffs in security.

    Decouple NNP/nosuid from SELinux transitions, so that we don’t have to make a choice between them. Introduce a nnp_nosuid_transition policy capability that enables transitions under NNP/nosuid to be based on a permission (nnp_transition for NNP; nosuid_transition for nosuid) between the old and new contexts in addition to the current support for bounded transitions. Domain transitions can then be allowed in policy without requiring the parent to be a strict superset of all of its children.

    With this change, systemd unit files can be left unmodified from upstream. SELinux-disabled and SELinux-enabled users will benefit from retaining any of the systemd-provided protections. SELinux policy will only need to be adapted to enable the new policy capability and to allow the new permissions between domain pairs as appropriate.

    NB: Allowing nnp_transition between two contexts opens up the potential for the old context to subvert the new context by installing seccomp filters before the execve. Allowing nosuid_transition between two contexts domains are allowed, and this has to be repeated for the entire chain of domain transitions. There is no way to clone all allow rules from descendants to their ancestors in policy currently, and doing so would be undesirable even if it were practical, as it requires leaking permissions to objects and operations into ancestor domains that could weaken their own security in order to allow them to the descendants (e.g. if a descendant requires execmem permission, then so do all of its ancestors; if a descendant requires execute permission to a file, then so do all of its ancestors; if a descendant requires read to a symbolic link or temporary file, then so do all of its ancestors…). SELinux domains are intentionally not hierarchical / bounded in this manner normally, and making them so would undermine their protections and least privilege.

    We have long had a similar tension with SELinux transitions and nosuid mounts, albeit not as severe. Users often have had to choose between retaining nosuid on a mount and allowing SELinux domain transitions on files within those mounts. This likewise leads to unfortunate tradeoffs in security.

    Decouple NNP/nosuid from SELinux transitions, so that we don’t have to make a choice between them. Introduce a nnp_nosuid_transition policy capability that enables transitions under NNP/nosuid to be based on a permission (nnp_transition for NNP; nosuid_transition for nosuid) between the old and new contexts in addition to the current support for bounded transitions. Domain transitions can then be allowed in policy without requiring the parent to be a strict superset of all of its children.

    With this change, systemd unit files can be left unmodified from upstream. SELinux-disabled and SELinux-enabled users will benefit from retaining any of the systemd-provided protections. SELinux policy will only need to be adapted to enable the new policy capability and to allow the new permissions between domain pairs as appropriate.

    NB: Allowing nnp_transition between two contexts opens up the potential for the old context to subvert the new context by installing seccomp filters before the execve. Allowing nosuid_transition between two contexts opens up the potential for a context transition to occur on a file from an untrusted filesystem (e.g. removable media or remote filesystem). Use with care.

  • Support for labeling of individual cgroup and cgroup2 files was added using the SELinux genfscon mechanism. In order to use this new functionality the cgroup, or cgroup2, filesystem must be mounted with the “xattr” mount option.

  • Fix a bug where AF_UNIX/SOCK_RAW sockets were not properly assigned the “unix_dgram_socket” object class. This should be noticeable to users of libpcap.

  • Minor small changes to the how the kernel allocates SELinux internal memory and how it protects a few internal data structures.

Audit

  • Previous work to make the audit subsystem year 2038 safe in Linux v4.12 resulted in the audit subsystem calling a rather heavyweight clock API in the kernel to generate the audit event timestamp. In this kernel release we return to using a more lightweight clock API, while still ensuring the code remains year 2038 safe.

  • The “AVC INITIALIZED” audit KERNEL record seen at boot on SELinux systems was removed. It did not provide any useful information that couldn’t be found in other audit records emitted at boot.

The 2017 Linux Security Summit

The past Thursday and Friday was the 2017 Linux Security Summit, and once again I think it was a great success. A round of thanks to James Morris for leading the effort, the program committee for selecting a solid set of talks (we saw a big increase in submissions this year), the presenters, the attendees, the Linux Foundation, and our sponsor - thank you all!

Unfortunately we don’t have recordings of the talks, but I’ve included my notes on each of the presentations below. I’ve also included links to the slides, but not all of the slides were available at the time of writing; check the LSS 2017 slide archive for updates.

ARMv8.3 Pointer Authentication - Mark Rutland, ARM Ltd.

Traditional memory protection mechanisms have focused on preventing code injection, success in this area has refocused attacks on reusing existing code, e.g. ROP attacks. While code reuse mitigations do exist, they are not widely deployed due to difficulties in integration and impacts on performance and debugging. ARM’s pointer authentication is designed to help prevent against code reuse attacks while minimizing these negative impacts. The pointer authentication works by combining a pointer value, 64-bit context, and 128-bit private key into a Pointer Authentication Code (PAC) which is inserted into a reserved portion of the pointer (a pointer with a 48-bit VA space has a 7-bit PAC); the pointer value can later be authenticated before it is dereferenced. Linux Kernel patches have been posted to enable userspace protection with per-process PAC keys. Compiler support is already present in GCC v7 using the “-msign-return-address” option, although GDB support is currently blocked on the acceptance of the kernel ptrace patches.

Defeating Invisible Enemies: Firmware Based Security in the OpenPOWER Platform - George Wilson, IBM

The past few years have proven firmware based threats to be a real problem, with the various secure and trusted boot solutions providing an method to help eliminate these risks. The OpenPOWER Foundation has been working to map the Trusted Computing Group’s secure boot specifications to the OpenPOWER platform as well as provide a freely available working secure boot implementation for the platform.

Landlock LSM: Towards Unprivileged Sandboxing - Mickael Salaun

Multiple application sandboxing mechanisms exist today, Landlock attempts to do better than most by providing fine-grained access control with embedded policy to any application on the system, including unprivileged applications. The Landlock LSM does this by allowing applications to write their own sandbox policy using eBPF. While this may sound similar to the existing seccomp-bpf based approaches, Landlock provides mechanisms to incorporate object information into the sandbox policy; something that is not possible with the existing seccomp-bpf mechanism. Landlock remains a work in progress, but v7 is considered to be a minimally viable product and has been posted upstream for review.

The State of Kernel Self-Protection - Kees Cook, Google

The Kernel Self Protection Project (KSPP) provided a quick introduction on the motivation and goals for the project before moving on to an overview of some of the bug classes they are working on eliminating in the Linux Kernel, and concluded with a discussion of some of the challenges facing the project and the broader kernel community. While progress can be slow, it was encouraging to see that ~12 organizations and ~10 unaffiliated individuals have joined the KSPP effort and are currently working on ~20 features.

Confessions of a Security Hardware Driver Maintainer - Gilad Ben-Yossef, ARM Ltd.

The presentation started with an overview of the Android secure boot process before introducing the ARM Trustzone CryptoCell, a security processor which appears similar to a TPM. The presenter went over some of the challenges he faced integrating the CryptoCell into the boot process, including some of the work that went into improving the performance of the solution.

CII Best Practices Badge, 1.5 Years Later - David Wheeler, IDA

The Linux Foundation’s Core Infrastructure Initiative’s (CII) Best Practices Badge Program has been running for approximately one and a half years; this talk provided a basic introduction, observations from the first 18 months, and some recent additions to the badge program. The CII Badge Program has approximately 1000 self-registered projects, with about 100 of those projects earning the “Best Practices” badge. Interestingly, this 10% success rate seems to be holding true, even as the number of total projects grow. The higher level Silver and Gold badges, recent additions to the program, were also discussed.

The Smack 10th Year Update - Casey Schaufler

This year marks Smack’s 10th anniversary, and sees Smack continuing to be part of Tizen and Automotive Grade Linux. Despite the anniversary, the past year was a relatively quiet one for Smack with only one new feature: marking signal delivery as an “append” and not a “write” operation.

Integrity Subsystem Update - Mimi Zohar, IBM

The Integrity subsystem update started with a summary of the subsystem’s functionality before presenting some of the recent additions and ongoing efforts. Recent additions include the ability to pass the IMA measurements across the kexec boundary, deeper embedding into the VFS layer, and support for “appended signatures”. The appended signature feature is initially being used to verify loadable kernel modules, but it could be used for any number of objects that are passed into the kernel as memory buffers and not files. Future work includes performance improvements, namespacing, and UEFI support.

TPM Subsystem Update - Jarkko Sakkinen, Intel Corporation

The Trusted Platform Module (TPM) subsystem update started with an introduction to the subsystem and a history of the TPM before presenting the new work from the past year. New functionality included the addition of a TPM v2.0 resource manager and event log, as well as 64-bit ARM support.

Thursday BoFs

As I was busy participating in both the “extreme” LSM stacking and LSM namespacing BoFs I wasn’t able to capture much in the way of notes.

Hatching Security: LinuxKit as Security Incubator - Tycho Andersen & Riyaz Faizullabhoy, Docker

LinuxKit was designed to make it easy for people to create their own Linux distribution, with a strong focus on minimal OS installs such as one would use in a container hosting environment. LinuxKit has several features that make it interesting from a security perspective, the most notable being the read-only rootfs which is managed using external tooling. Applications are installed via signed container images. In addition to software, LinuxKit also has a Special Interest Group (SIG) which meets bi-weekly and hosts a number of presentations related to Linux security and hardening.

This talk also spent some time talking about eXclusive Page Frame Ownership (XPFO). XPFO is a mechanism which protects against ret2dir (return to direct mapped memory) attacks; this is important as many of the existing ret2usr (return to userspace memory) mitigations are not successful against ret2dir. The XPFO work remains a work in progress.

Running Linux in a Shielded VM - Michael Kelley, Microsoft

Microsoft Shielded VMs are built on Hyper-V and designed to protect the VM data both at rest and in-flight. Shielded VMs block administrators from the VM data, provide a mechanism for attesting the hypervisor, and guard the hypervisor fabric from attacks. This is accomplished by mixing a variety of technologies; one that is particularly interesting is the “synthetic TPM”. A synthetic TPM is a TPM that is exposed to the guest, but is not backed by a physical TPM as you would expect with a virtual TPM (vTPM). While the synthetic TPM (sTPM?) can not offer as strong of a security assurance as a physical, or even virtual TPM, it does offer the ability to migrate the TPM with the guest, a critical requirement for cloud providers wishing to provide a TPM to guests.

Keys Subsystem - David Howells, Red Hat

David Howells presented a quick update on the changes to the kernel keyring over the past year as well as current development efforts. Changes over the past year included protecting big keys by encrypting them with a transient key, restricted keyings, and support for Diffie-Hellman operations. Current efforts include notifications on keyring changes, use counters, decomposing the setattr operations, and keyring namespacing.

Protecting VM Register State with AMD SEV-ES, David Kaplan, AMD

The presentation started with a description of AMD’s Secure Encrypted Virtualization (SEV) and the AES cryptographic processor built into the memory controller. SEV-ES, is not just a palindrome (!), it is also a mechanism which expands SEV’s protection from memory pages to register state. Basic Secure Memory Encryption (SME) support is in the upstream Linux Kernel, with SEV patches posted to the lists. The associated OVMF/BIOS patches have already been accepted. Initial hardware support is shipping in AMD’s Ryzen Threadripper CPUs.

Proposal of a Method to Prevent Privilege Escalation Attacks for Linux Kernel - Yuichi Nakamura, Hitachi Ltd. & Toshihiro Yamauchi, Okayama University

Unfortunately I missed the start of this presentation, but the goal of this proposal is the detection and prevention of privilege escalation by detecting unexpected credential changes during syscall execution in the kernel. The talk included measurements documenting the minimal performance impact and demos showing how this technology would help mitigate a number of published CVEs. Unfortunately additional work is still needed to protect against smart exploits tricking the new proposed checks.

SELinux in Android O: Separating Policy to Allow for Independent Updates - Daniel Cashman, Google

Android v8.0, aka “Oreo”, is the latest Android release and it includes a number of SELinux related changes. The most significant of these changes is project Treble, which is a redesign of the Android SELinux policy to enable updating of the general Android code independent of the hardware specific code. The presentation discussed the approach taken with the SELinux policy rework, including the advantages, disadvantages, and various lessons learned during development.

SELinux Subsystem Update - Paul Moore, Red Hat

I delivered the annual “State of SELinux” presentation, slides are available below.

AppArmor Subsystem Update - John Johansen, Canonical Group, Ltd.

AppArmor has historically carried a rather large patchset in Ubuntu, but over the past year the AppArmor developers have started working on upstreaming this code; this is still a work in progress, but most of the changes have made their way into Linus’ tree. The current AppArmor approach to namespacing was also discussed.

Seccomp Subsystem Update - Kees Cook, Google

The seccomp subsystem update started with an introduction to seccomp, and moved to the new features added to kernel subsystem over the past year. Significant changes include the generation of coredumps when a process is killed, a new action to support killing the entire process and not just the offending thread, logging improvements, and better regression testing.

Securing Automated Decryption - Nathaniel McCallum, Red Hat

This presentation described some of the challenges to using typical key escrow models in large scale-out deployments and presented a novel new solution to provide an automated way to securely manage keys across a large number of systems. The technology is based on a Diffie-Hellman algorithm variant, McCallum-Relyea, which is implemented in a client (Clevis) and server (Tang). While the Linux Security Summit presentations were not recorded this year, Nathaniel did give a very similar talk at DevConf.cz 2017, which was recorded, the link can be found below.