The 2017 Linux Security Summit

The past Thursday and Friday was the 2017 Linux Security Summit, and once again I think it was a great success. A round of thanks to James Morris for leading the effort, the program committee for selecting a solid set of talks (we saw a big increase in submissions this year), the presenters, the attendees, the Linux Foundation, and our sponsor - thank you all!

Unfortunately we don't have recordings of the talks, but I've included my notes on each of the presentations below. I've also included links to the slides, but not all of the slides were available at the time of writing; check the LSS 2017 slide archive for updates.

ARMv8.3 Pointer Authentication - Mark Rutland, ARM Ltd.

Traditional memory protection mechanisms have focused on preventing code injection, success in this area has refocused attacks on reusing existing code, e.g. ROP attacks. While code reuse mitigations do exist, they are not widely deployed due to difficulties in integration and impacts on performance and debugging. ARM's pointer authentication is designed to help prevent against code reuse attacks while minimizing these negative impacts. The pointer authentication works by combining a pointer value, 64-bit context, and 128-bit private key into a Pointer Authentication Code (PAC) which is inserted into a reserved portion of the pointer (a pointer with a 48-bit VA space has a 7-bit PAC); the pointer value can later be authenticated before it is dereferenced. Linux Kernel patches have been posted to enable userspace protection with per-process PAC keys. Compiler support is already present in GCC v7 using the "-msign-return-address" option, although GDB support is currently blocked on the acceptance of the kernel ptrace patches.

Defeating Invisible Enemies: Firmware Based Security in the OpenPOWER Platform - George Wilson, IBM

The past few years have proven firmware based threats to be a real problem, with the various secure and trusted boot solutions providing an method to help eliminate these risks. The OpenPOWER Foundation has been working to map the Trusted Computing Group's secure boot specifications to the OpenPOWER platform as well as provide a freely available working secure boot implementation for the platform.

Landlock LSM: Towards Unprivileged Sandboxing - Mickael Salaun

Multiple application sandboxing mechanisms exist today, Landlock attempts to do better than most by providing fine-grained access control with embedded policy to any application on the system, including unprivileged applications. The Landlock LSM does this by allowing applications to write their own sandbox policy using eBPF. While this may sound similar to the existing seccomp-bpf based approaches, Landlock provides mechanisms to incorporate object information into the sandbox policy; something that is not possible with the existing seccomp-bpf mechanism. Landlock remains a work in progress, but v7 is considered to be a minimally viable product and has been posted upstream for review.

The State of Kernel Self-Protection - Kees Cook, Google

The Kernel Self Protection Project (KSPP) provided a quick introduction on the motivation and goals for the project before moving on to an overview of some of the bug classes they are working on eliminating in the Linux Kernel, and concluded with a discussion of some of the challenges facing the project and the broader kernel community. While progress can be slow, it was encouraging to see that ~12 organizations and ~10 unaffiliated individuals have joined the KSPP effort and are currently working on ~20 features.

Confessions of a Security Hardware Driver Maintainer - Gilad Ben-Yossef, ARM Ltd.

The presentation started with an overview of the Android secure boot process before introducing the ARM Trustzone CryptoCell, a security processor which appears similar to a TPM. The presenter went over some of the challenges he faced integrating the CryptoCell into the boot process, including some of the work that went into improving the performance of the solution.

CII Best Practices Badge, 1.5 Years Later - David Wheeler, IDA

The Linux Foundation's Core Infrastructure Initiative's (CII) Best Practices Badge Program has been running for approximately one and a half years; this talk provided a basic introduction, observations from the first 18 months, and some recent additions to the badge program. The CII Badge Program has approximately 1000 self-registered projects, with about 100 of those projects earning the "Best Practices" badge. Interestingly, this 10% success rate seems to be holding true, even as the number of total projects grow. The higher level Silver and Gold badges, recent additions to the program, were also discussed.

The Smack 10th Year Update - Casey Schaufler

This year marks Smack's 10th anniversary, and sees Smack continuing to be part of Tizen and Automotive Grade Linux. Despite the anniversary, the past year was a relatively quiet one for Smack with only one new feature: marking signal delivery as an "append" and not a "write" operation.

Integrity Subsystem Update - Mimi Zohar, IBM

The Integrity subsystem update started with a summary of the subsystem's functionality before presenting some of the recent additions and ongoing efforts. Recent additions include the ability to pass the IMA measurements across the kexec boundary, deeper embedding into the VFS layer, and support for "appended signatures". The appended signature feature is initially being used to verify loadable kernel modules, but it could be used for any number of objects that are passed into the kernel as memory buffers and not files. Future work includes performance improvements, namespacing, and UEFI support.

TPM Subsystem Update - Jarkko Sakkinen, Intel Corporation

The Trusted Platform Module (TPM) subsystem update started with an introduction to the subsystem and a history of the TPM before presenting the new work from the past year. New functionality included the addition of a TPM v2.0 resource manager and event log, as well as 64-bit ARM support.

Thursday BoFs

As I was busy participating in both the "extreme" LSM stacking and LSM namespacing BoFs I wasn't able to capture much in the way of notes.

Hatching Security: LinuxKit as Security Incubator - Tycho Andersen & Riyaz Faizullabhoy, Docker

LinuxKit was designed to make it easy for people to create their own Linux distribution, with a strong focus on minimal OS installs such as one would use in a container hosting environment. LinuxKit has several features that make it interesting from a security perspective, the most notable being the read-only rootfs which is managed using external tooling. Applications are installed via signed container images. In addition to software, LinuxKit also has a Special Interest Group (SIG) which meets bi-weekly and hosts a number of presentations related to Linux security and hardening.

This talk also spent some time talking about eXclusive Page Frame Ownership (XPFO). XPFO is a mechanism which protects against ret2dir (return to direct mapped memory) attacks; this is important as many of the existing ret2usr (return to userspace memory) mitigations are not successful against ret2dir. The XPFO work remains a work in progress.

Running Linux in a Shielded VM - Michael Kelley, Microsoft

Microsoft Shielded VMs are built on Hyper-V and designed to protect the VM data both at rest and in-flight. Shielded VMs block administrators from the VM data, provide a mechanism for attesting the hypervisor, and guard the hypervisor fabric from attacks. This is accomplished by mixing a variety of technologies; one that is particularly interesting is the "synthetic TPM". A synthetic TPM is a TPM that is exposed to the guest, but is not backed by a physical TPM as you would expect with a virtual TPM (vTPM). While the synthetic TPM (sTPM?) can not offer as strong of a security assurance as a physical, or even virtual TPM, it does offer the ability to migrate the TPM with the guest, a critical requirement for cloud providers wishing to provide a TPM to guests.

Keys Subsystem - David Howells, Red Hat

David Howells presented a quick update on the changes to the kernel keyring over the past year as well as current development efforts. Changes over the past year included protecting big keys by encrypting them with a transient key, restricted keyings, and support for Diffie-Hellman operations. Current efforts include notifications on keyring changes, use counters, decomposing the setattr operations, and keyring namespacing.

Protecting VM Register State with AMD SEV-ES, David Kaplan, AMD

The presentation started with a description of AMD's Secure Encrypted Virtualization (SEV) and the AES cryptographic processor built into the memory controller. SEV-ES, is not just a palindrome (!), it is also a mechanism which expands SEV's protection from memory pages to register state. Basic Secure Memory Encryption (SME) support is in the upstream Linux Kernel, with SEV patches posted to the lists. The associated OVMF/BIOS patches have already been accepted. Initial hardware support is shipping in AMD's Ryzen Threadripper CPUs.

Proposal of a Method to Prevent Privilege Escalation Attacks for Linux Kernel - Yuichi Nakamura, Hitachi Ltd. & Toshihiro Yamauchi, Okayama University

Unfortunately I missed the start of this presentation, but the goal of this proposal is the detection and prevention of privilege escalation by detecting unexpected credential changes during syscall execution in the kernel. The talk included measurements documenting the minimal performance impact and demos showing how this technology would help mitigate a number of published CVEs. Unfortunately additional work is still needed to protect against smart exploits tricking the new proposed checks.

SELinux in Android O: Separating Policy to Allow for Independent Updates - Daniel Cashman, Google

Android v8.0, aka "Oreo", is the latest Android release and it includes a number of SELinux related changes. The most significant of these changes is project Treble, which is a redesign of the Android SELinux policy to enable updating of the general Android code independent of the hardware specific code. The presentation discussed the approach taken with the SELinux policy rework, including the advantages, disadvantages, and various lessons learned during development.

SELinux Subsystem Update - Paul Moore, Red Hat

I delivered the annual "State of SELinux" presentation, slides are available below.

AppArmor Subsystem Update - John Johansen, Canonical Group, Ltd.

AppArmor has historically carried a rather large patchset in Ubuntu, but over the past year the AppArmor developers have started working on upstreaming this code; this is still a work in progress, but most of the changes have made their way into Linus' tree. The current AppArmor approach to namespacing was also discussed.

Seccomp Subsystem Update - Kees Cook, Google

The seccomp subsystem update started with an introduction to seccomp, and moved to the new features added to kernel subsystem over the past year. Significant changes include the generation of coredumps when a process is killed, a new action to support killing the entire process and not just the offending thread, logging improvements, and better regression testing.

Securing Automated Decryption - Nathaniel McCallum, Red Hat

This presentation described some of the challenges to using typical key escrow models in large scale-out deployments and presented a novel new solution to provide an automated way to securely manage keys across a large number of systems. The technology is based on a Diffie-Hellman algorithm variant, McCallum-Relyea, which is implemented in a client (Clevis) and server (Tang). While the Linux Security Summit presentations were not recorded this year, Nathaniel did give a very similar talk at 2017, which was recorded, the link can be found below.

Linux 4.13 Released

Linux v4.13 was released this past weekend on Sunday, September 3rd; this is a quick summary of the SELinux and audit changes.


  • The largest SELinux change in Linux v4.13 is the addition of SELinux access controls for Infiniband. This was a large effort the involved a new SELinux policy version (v31), two new object classes (infiniband_pkey and infiniband_endport), the creation of a LSM notification mechanism, and a number of changes to core Infiniband code. Daniel Jurgens, the patchset author, provided an excellent summary of the changes in his cover letter, a portion of it is excerpted below:

    From: Daniel Jurgens

    Infiniband applications access HW from user-space -- traffic is generated directly by HW, bypassing the kernel. Consequently, Infiniband Partitions, which are associated directly with HW transport endpoints, are a natural choice for enforcing granular mandatory access control for Infiniband. QPs may only send or receives packets tagged with the corresponding partition key (PKey). The PKey is not a cryptographic key; it's a 16 bit number identifying the partition.

    Every Infiniband fabric is controlled by a central Subnet Manager (SM). The SM provisions the partitions by assigning each port with the partitions it can access. In addition, the SM tags each port with a subnet prefix, which identifies the subnet. Determining which users are allowed to access which partition keys on a given subnet forms an effective policy for isolating users on the fabric. Any application that attempts to send traffic on a given subnet is automatically subject to the policy, regardless of which device and port it uses. SM software configures the subnet through a privileged Subnet Management Interface (SMI), which is presented by each Infiniband port. Thus, the SMI must also be controlled to prevent unauthorized changes to fabric configuration and partitioning.

    To support access control for IB partitions and subnet management, security contexts must be provided for two new types of objects - PKeys and IB ports.

    A PKey label consists of a subnet prefix and a range of PKey values and is similar to the labeling mechanism for netports. Each Infiniband port can reside on a different subnet. So labeling the PKey values for specific subnet prefixes provides the user maximum flexibility, as PKey values may be determined independently for different subnets. There is a single access vector for PKeys called "access".

    An Infiniband port is labeled by device name and port number. There is a single access vector for IB ports called "manage_subnet".

    Because RDMA allows kernel bypass, enforcement must be done during connection setup. Communication over RDMA requires a send and receive queue, collectively known as a Queue Pair (QP). A QP must be initialized by privileged system calls before it can be used to send or receive data. During initialization the user must provide the PKey and port the QP will use; at this time access control can be enforced.

    Because there is a possibility that the enforcement settings or security policy can change, a means of notifying the ib_core module of such changes is required. To facilitate this a generic notification callback mechanism is added to the LSM. One callback is registered for checking the QP PKey associations when the policy changes. Mad agents also register a callback, they cache the permission to send and receive SMPs to avoid another per packet call to the LSM.

    Because frequent accesses to the same PKey's SID is expected a cache is implemented which is very similar to the netport cache.

    In order to properly enforce security when changes to the PKey table or security policy or enforcement occur ib_core must track which QPs are using which port, pkey index, and alternate path for every IB device. This makes operations that used to be atomic transactional.

  • An important part of the Infiniband work was the creation of a LSM notification mechanism that allows various kernel subsystems to receive notification of LSM events. At present this is limited to just SELinux policy changes, but I expect additional events to be added in the future as they are needed.

  • The SELinux "file:map" permission was added to control memory mapped file access. This allows the SELinux policy to prevent direct memory access to files and ensure that every file access is revalidated. Stephen Smalley provides more information in the patch description:

    From: Stephen Smalley

    Add a map permission check on mmap so that we can distinguish memory mapped access (since it has different implications for revocation). When a file is opened and then read or written via syscalls like read(2)/write(2), we revalidate access on each read/write operation via selinux_file_permission() and therefore can revoke access if the process context, the file context, or the policy changes in such a manner that access is no longer allowed. When a file is opened and then memory mapped via mmap(2) and then subsequently read or written directly in memory, we presently have no way to revalidate or revoke access. The purpose of a separate map permission check on mmap(2) is to permit policy to prohibit memory mapping of specific files for which we need to ensure that every access is revalidated, particularly useful for scenarios where we expect the file to be relabeled at runtime in order to reflect state changes (e.g. cross-domain solution, assured pipeline without data copying).

  • Allow proper per-file labeling for tracefs filesystems using the SELinux genfscon mechanism.

  • Starting with Linux v4.13, whenever SELinux policy is loaded into the kernel we log the SELinux policy capability state to the kernel's ring buffer. An example can be seen below:

[    2.017308] SELinux:  policy capability network_peer_controls=1
[    2.017880] SELinux:  policy capability open_perms=1
[    2.018344] SELinux:  policy capability extended_socket_class=0
[    2.018919] SELinux:  policy capability always_check_network=0
[    2.019513] SELinux:  policy capability cgroup_seclabel=0
  • The Linux Kernel does not allow directly opening sockets, returning the ENXIO error. However, before the kernel ultimately rejects the access, the SELinux policy is checked and in the case of a socket file descriptor the resulting check can seem a bit odd. The SELinux socket object classes do not contain the "open" permission, they contain the "recvfrom" permission instead; this difference causes a socket "open" access to appear as a "recvfrom" SELinux denial. Linux v4.13 fixes this by skipping open access checking on sockets and letting the core kernel code handle the denial.

  • Allow the LSM security_sb_clone_mnt_opts() hook to enable or disable the native labeling behavior. This is important for proper SELinux file labeling on NFS v4.2+.

  • Normally valid SELinux labels must be used when labeling files, however, if the process has the CAP_MAC_ADMIN capability it is possible to set an unknown, or invalid, SELinux label on a file. Prior to Linux v4.13 setting an unknown SELinux label on a file would cause the SELinux subsystem to perform the usual SELinux checks, in addition to any other stacked LSM's CAP_MAC_ADMIN checks. Depending on the LSMs that were in use this could result in odd, or unexpected behavior. We fix this in Linux v4.13 by only performing the base CAP_MAC_ADMIN capability checks in addition to the SELinux checks; no other LSMs are asked to provide access control decisions.

  • The SELinux internal ebitmap type was converted to use the kmem_cache mechanism. This potentially saves a small amount of memory on some systems and provides better SELinux memory usage statistics.

  • SELinux was converted to use the LSM security_task_alloc() hook instead of the security_task_create() hook. The expectation is that the security_task_create() hook will be deprecated and eventually removed from the Linux Kernel.

  • A clang build warning related to redundant filesystem labeling behavior checks was fixed.


  • Linux v4.3 added the concept of ambient capabilities to the Linux Kernel, we now log the ambient capabilities in the audit BPRM_FCAPS and CAPSET records using the "cap_pa", "old_pa", and "pa" fields.

  • Prior to Linux v4.13 file capabilities would only be recorded in the audit PATH record if they were set. Starting with v4.13 the permitted and inheritable file capabilities are always recorded in the PATH record, resulting in a more consistent record format.

  • The "new_<capability>" prefix has been shortened to simply "<capability>" in the audit BPRM_FCAPS record; for example "new_pp" is now "pp".

  • Fixed a race condition where the kernel/auditd connection could be reset shortly after the audit daemon starts and registers itself with the kernel. Fedora BZ #1459326 has more information:

    This issue is partly due to the read-copy nature of RCU, and partly due to how we sync the auditd_connection state across kauditd_thread and the audit control channel. The kauditd_thread thread is always running so it can service the record queues and emit the multicast messages, if it happens to be just past the "main_queue" label, but before the "if (sk == NULL || ...)" if-statement which calls auditd_reset() when the new auditd connection is registered it could end up resetting the auditd connection, regardless of if it is valid or not. This is a rather small window and the variable nature of multi-core scheduling explains why this is proving rather difficult to reproduce.

  • Fixed a user-after-free problem in the audit filesystem watch code. The core problem was improper fsnotify reference counting in the audit subsystem. Jan Kara provides more information in the patch description:

    From: Jan Kara

    audit_remove_watch_rule() drops watch's reference to parent but then continues to work with it. That is not safe as parent can get freed once we drop our reference. The following is a trivial reproducer:

    mount -o loop image /mnt
    touch /mnt/file
    auditctl -w /mnt/file -p wax
    umount /mnt
    auditctl -D
    <crash in fsnotify_destroy_mark()>

    Grab our own reference in audit_remove_watch_rule() earlier to make sure mark does not get freed under us.

  • Ensure we cleanup any audit filesystem watch fsnotify marks when a filesystem is unmounted.

  • Ensure that all of the audit records are sent to any multicast listeners, e.g. the systemd journal, when the audit daemon connection is reset. Prior to Linux v4.13 some audit records could be lost when the audit daemon unregistered from the kernel.

  • Fixed a memory leak in the auditd_send_unicast_skb() function that would leak an audit REPLACE record in certain situations.

Kernel Repository Process

It has been over a year since I formally updated the SELinux and audit kernel repository processes, and based on how things have evolved it seems we are due for another update. This time the changes are rather small, and shouldn't surprise anyone who has been following upstream development.


  1. After the merge window closes upstream, patches will be merged into the selinux/next branch up until the merge window reopens. However, it is important to note that large, complicated, or invasive patches sent late in the development cycle may be deferred until the next cycle.

  2. Any patches deemed necessary for the current Linux -rcX releases will be merged into the current selinux/stable-X.Y branch, marked with a signed tag, and a pull request sent against linux/master as soon as it is reasonable to do so.

  3. During the development cycle Fedora Rawhide test kernels will be generated using the selinux/next and most recent selinux/stable-X.Y branches on a weekly basis, if not more often. These kernels will be tested against the SELinux test suite and made available to everyone for additional testing.

  4. Once the merge window opens a decision will be made if there is a need to rebase the current selinux/next branch on top of the recent Linux release. In general rebases should be limited, but rebases will be necessary over time to keep the selinux/next branch reasonably current. If the selinux/next branch is to be rebased, it should be done very early in the merge window and additional care should be taken to ensure that it passes all of the SELinux regression tests.

  5. After any rebases have taken place, the selinux/next branch will be copied to a new branch, selinux/stable-X.Y, and the branch will be marked with a signed tag in the format selinux-pr-YYYYMMDD. A pull request will be sent against the linux/master branch using the signed tag.


  1. After the merge window closes upstream, patches will be merged into the audit/next branch up until the merge window reopens. However, it is important to note that large, complicated, or invasive patches sent late in the development cycle may be deferred until the next cycle.

  2. Any patches deemed necessary for the current Linux -rcX releases will be merged into the current audit/stable-X.Y branch, marked with a signed tag, and a pull request sent against linux/master as soon as it is reasonable to do so.

  3. During the development cycle Fedora Rawhide test kernels will be generated using the audit/next and most recent audit/stable-X.Y branches on a weekly basis, if not more often. These kernels will be tested against the audit test suite and made available to everyone for additional testing.

  4. Once the merge window opens a decision will be made if there is a need to rebase the current audit/next branch on top of the recent Linux release. In general rebases should be limited, but rebases will be necessary over time to keep the audit/next branch reasonably current. If the audit/next branch is to be rebased, it should be done very early in the merge window and additional care should be taken to ensure that it passes all of the audit regression tests.

  5. After any rebases have taken place, the audit/next branch will be copied to a new branch, audit/stable-X.Y, and the branch will be marked with a signed tag in the format audit-pr-YYYYMMDD. A pull request will be sent against the linux/master branch using the signed tag.

For reference, the previous process was defined here.

UPDATE: The SELinux kernel process has been updated now that we are basing the tree against Linus' tree and sending pull requests directly to Linus.