Linux 5.6 Released

Linux v5.6 was released on Sunday, March 29, 2020; the SELinux and audit highlights are below:

SELinux

  • We’ve wanted to remove the CONFIG_SECURITY_SELINUX_DISABLE build option for some time, and in Linux v5.6 we are taking the first step by marking it deprecated as explained in the patch description:

    Deprecate the CONFIG_SECURITY_SELINUX_DISABLE functionality. The code was originally developed to make it easier for Linux distributions to support architectures where adding parameters to the kernel command line was difficult. Unfortunately, supporting runtime disable meant we had to make some security trade-offs when it came to the LSM hooks, as documented in the Kconfig help text:

    NOTE: selecting this option will disable the ‘__ro_after_init’ kernel hardening feature for security hooks. Please consider using the selinux=0 boot parameter instead of enabling this option.

    Fortunately it looks as if that the original motivation for the runtime disable functionality is gone, and Fedora/RHEL appears to be the only major distribution enabling this capability at build time so we are now taking steps to remove it entirely from the kernel. The first step is to mark the functionality as deprecated and print an error when it is used (what this patch is doing). As Fedora/RHEL makes progress in transitioning the distribution away from runtime disable, we will introduce follow-up patches over several kernel releases which will block for increasing periods of time when the runtime disable is used. Finally we will remove the option entirely once we believe all users have moved to the kernel cmdline approach.

  • Add new SELinux controls for the kernel lockdown functionality with a new object class, “lockdown”:
    lockdown { integrity confidentiality }
    

    Stephen Smalley, the patch author, provides a good explanation of the new controls in the patch description:

    Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled.

    Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value.

    Sample AVC audit output from denials:

    avc:  denied  { integrity } for pid=3402 comm="fwupd"
    lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0
    tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0
    
    avc:  denied  { confidentiality } for pid=4628 comm="cp"
    lockdown_reason="/proc/kcore access"
    scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
    tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
    tclass=lockdown permissive=0
    
  • Add a new access control for the move_mount(2) syscall which reuses the “file { mounton }” permission.

  • Enable SELinux per-file labeling for binderfs.

  • Add SELinux support for the new netlink VLAN configuration messages RTM_NEWVLAN, RTM_DELVLAN, and RTM_GETVLAN.

  • Improve how we store and access SELinux security labels within the kernel by caching the kernel’s SID-to-string security label translation. Those systems with a large number of MLS/MCS categories in use and applications which often query the kernel for SELinux labels should see the biggest improvement in performance.

  • Improve the LSM and SELinux kernel build definitions such that the LSM auditing and SELinux InfiniBand code is only built when needed by the kernel’s build time configuration.

  • The LSM stacking changes introduced a SELinux bug when SELinux is disabled at runtime. We attempt to fix this in Linux v5.6 by reordering the SELinux hooks in the LSM configuration such that the data structures are properly managed. The real fix for this is to eventually remove the runtime disable functionality (see the CONFIG_SECURITY_SELINUX_DISABLE deprecation notice above) and use the kernel command line to disable SELinux at boot (e.g. “selinux=0”) if desired.

  • Fix a problem in the SELinux access control enforcement cache where we were not always cleaning up properly on error. In some extreme cases this could effectively shrink the access control cache and impact performance.

  • Multiple key internal SELinux data structures were marked with “__randomize_layout” to help harden the SELinux code in the kernel.

  • Fixed out of date references to the selinuxfs mount point in the kernel documentation. The kernel documentation should now correctly reference selinuxfs as being mounted on “/sys/fs/selinux”.

  • Several cleanups and smaller fixes relating to locking, inode auditing, and caching which should have little impact to most users but helps improve the quality of the SELinux kernel code.

Audit

  • Add support for auditing BPF program load and unload operations with the BPF specific program ID. Daniel Borkmann, the patch author, provides a good explanation, complete with an example, in the patch description:

    Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event.

    The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core.

    Raw example output:

    # auditctl -D
    # auditctl -a always,exit -F arch=x86_64 -S bpf
    # ausearch --start recent -m 1334
    ...
    ----
    time->Wed Nov 27 16:04:13 2019
    type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
    type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \
    success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477    \
    pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001    \
    egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf"                \
    exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf"                  \
    subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
    type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
    ----
    time->Wed Nov 27 16:04:13 2019
    type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
    ...
    
  • In Linux v3.18 the audit code was changed to convert one of the internal audit filter structures into a union, which unfortunately introduced a bug which was just recently discovered by syzbot. Converting the structure into a union was the right thing to do so we left that in place, but we fixed the associated code to properly handle the data structure. The patch description provides more detail on the problem and the fix:

    Commit 219ca39427bf (“audit: use union for audit_field values since they are mutually exclusive”) combined a number of separate fields in the audit_field struct into a single union. Generally this worked just fine because they are generally mutually exclusive. Unfortunately in audit_data_to_entry() the overlap can be a problem when a specific error case is triggered that causes the error path code to attempt to cleanup an audit_field struct and the cleanup involves attempting to free a stored LSM string (the lsm_str field). Currently the code always has a non-NULL value in the audit_field.lsm_str field as the top of the for-loop transfers a value into audit_field.val (both .lsm_str and .val are part of the same union); if audit_data_to_entry() fails and the audit_field struct is specified to contain a LSM string, but the audit_field.lsm_str has not yet been properly set, the error handling code will attempt to free the bogus audit_field.lsm_str value that was set with audit_field.val at the top of the for-loop.

    This patch corrects this by ensuring that the audit_field.val is only set when needed (it is cleared when the audit_field struct is allocated with kcalloc()). It also corrects a few other issues to ensure that in case of error the proper error code is returned.

  • Fixed a problem where we were not always properly checking the length of audit netlink messages before parsing the messages in the kernel.

  • Added some missing RCU annotations to internal audit data structures.

Linux 5.5 Released

Linux v5.5 was released on Sunday, January 26, 2020, the SELinux and audit highlights are below:

SELinux

  • Add new SELinux access controls for the perf_event_open(2) syscall to control access to the performance monitoring subsystem in the kernel. A new SELinux object class with six new permissions were created for this purpose:
    perf_event { open cpu kernel tracepoint read write }
    

    The “cpu”, “kernel”, and “tracepoint” permissions are used to reflect their associated accesses requested while the “open” permission is described by this mailing list post. The “read” and “write” permissions are checked when I/O happens on the file descriptor returned by perf_event_open(2). In order to make use of the new controls some additional configuration is required as described by the patch author, Joel Fernandes:

    To use this patch, we set the perf_event_paranoid sysctl to -1 and then apply selinux checking as appropriate (default deny everything, and then add policy rules to give access to domains that need it). In the future we can remove the perf_event_paranoid sysctl altogether.

  • Add support for the “greatest lower bound” policy construct which is defined as the intersection of the MLS range of two SELinux labels. The greatest lower bound is described by the patch author, Joshua Brindle:

    A policy developer can now specify glblub as a default_range default and the computed transition will be the intersection of the mls range of the two contexts.

    The glb (greatest lower bound) lub (lowest upper bound) of a range is calculated as the greater of the low sensitivities and the lower of the high sensitivities and the and of each category bitmap.

    This can be used by MLS solution developers to compute a context that satisfies, for example, the range of a network interface and the range of a user logging in.

    Some examples are:

    User Permitted Range Network Device Label Computed Label
    s0-s1:c0.c12 s0 s0
    s0-s1:c0.c12 s0-s1:c0.c1023 s0-s1:c0.c12
    s0-s4:c0.c512 s1-s1:c0.c1023 s1-s1:c0.c512
    s0-s15:c0,c2 s4-s6:c0.c128 s4-s6:c0,c2
    s0-s4 s2-s6 s2-s4
    s0-s4 s5-s8 INVALID
    s5-s8 s0-s4 INVALID
  • Allow SELinux file labeling before the policy is loaded into the kernel. This should ease some of the burden when the policy is initially loaded as there is no longer a need to relabel files, as well as help enable new system concepts which dynamically create the root filesystem durint boot in the initramfs.

  • Remove the size limit on SELinux policies, the limitation was a lingering vestige and no longer necessary.

Audit

  • Allow for the auditing of suspicious O_CREAT usage via the new AUDIT_ANOM_CREAT record.

Linux 5.4 Released

Linux v5.4 was released on Monday, November 25, 2019, the SELinux and audit highlights are below:

SELinux

  • Add new SELinux access control hooks for dnotify, inotify, and fanotify. The patch author, Aaron Goidel, provided an excellent commit message describing the new controls:

    As of now, setting watches on filesystem objects has, at most, applied a check for read access to the inode, and in the case of fanotify, requires CAP_SYS_ADMIN. No specific security hook or permission check has been provided to control the setting of watches. Using any of inotify, dnotify, or fanotify, it is possible to observe, not only write-like operations, but even read access to a file. Modeling the watch as being merely a read from the file is insufficient for the needs of SELinux. This is due to the fact that read access should not necessarily imply access to information about when another process reads from a file. Furthermore, fanotify watches grant more power to an application in the form of permission events. While notification events are solely, unidirectional (i.e. they only pass information to the receiving application), permission events are blocking. Permission events make a request to the receiving application which will then reply with a decision as to whether or not that action may be completed. This causes the issue of the watching application having the ability to exercise control over the triggering process. Without drawing a distinction within the permission check, the ability to read would imply the greater ability to control an application. Additionally, mount and superblock watches apply to all files within the same mount or superblock. Read access to one file should not necessarily imply the ability to watch all files accessed within a given mount or superblock.

    In order to solve these issues, a new LSM hook is implemented and has been placed within the system calls for marking filesystem objects with inotify, fanotify, and dnotify watches. These calls to the hook are placed at the point at which the target path has been resolved and are provided with the path struct, the mask of requested notification events, and the type of object on which the mark is being set (inode, superblock, or mount). The mask and obj_type have already been translated into common FS_* values shared by the entirety of the fs notification infrastructure. The path struct is passed rather than just the inode so that the mount is available, particularly for mount watches. This also allows for use of the hook by pathname-based security modules. However, since the hook is intended for use even by inode based security modules, it is not placed under the CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security modules would need to enable all of the path hooks, even though they do not use any of them.

    This only provides a hook at the point of setting a watch, and presumes that permission to set a particular watch implies the ability to receive all notification about that object which match the mask. This is all that is required for SELinux. If other security modules require additional hooks or infrastructure to control delivery of notification, these can be added by them. It does not make sense for us to propose hooks for which we have no implementation. The understanding that all notifications received by the requesting application are all strictly of a type for which the application has been granted permission shows that this implementation is sufficient in its coverage.

    Security modules wishing to provide complete control over fanotify must also implement a security_file_open hook that validates that the access requested by the watching application is authorized. Fanotify has the issue that it returns a file descriptor with the file mode specified during fanotify_init() to the watching process on event. This is already covered by the LSM security_file_open hook if the security module implements checking of the requested file mode there. Otherwise, a watching process can obtain escalated access to a file for which it has not been authorized.

    The selinux_path_notify hook implementation works by adding five new file permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm (descriptions about which will follow), and one new filesystem permission: watch (which is applied to superblock checks). The hook then decides which subset of these permissions must be held by the requesting application based on the contents of the provided mask and the obj_type. The selinux_file_open hook already checks the requested file mode and therefore ensures that a watching process cannot escalate its access through fanotify.

    The watch, watch_mount, and watch_sb permissions are the baseline permissions for setting a watch on an object and each are a requirement for any watch to be set on a file, mount, or superblock respectively. It should be noted that having either of the other two permissions (watch_reads and watch_with_perm) does not imply the watch, watch_mount, or watch_sb permission. Superblock watches further require the filesystem watch permission to the superblock. As there is no labeled object in view for mounts, there is no specific check for mount watches beyond watch_mount to the inode. Such a check could be added in the future, if a suitable labeled object existed representing the mount.

    The watch_reads permission is required to receive notifications from read-exclusive events on filesystem objects. These events include accessing a file for the purpose of reading and closing a file which has been opened read-only. This distinction has been drawn in order to provide a direct indication in the policy for this otherwise not obvious capability. Read access to a file should not necessarily imply the ability to observe read events on a file.

    Finally, watch_with_perm only applies to fanotify masks since it is the only way to set a mask which allows for the blocking, permission event. This permission is needed for any watch which is of this type. Though fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit trust to root, which we do not do, and does not support least privilege.

  • Fix a potential leak of uninitialized kernel memory to userspace when viewing SELinux labels on objects.

  • Improve our network object labeling cache so that we always return the object’s label, even when under memory pressure. Previously we would return an error if we couldn’t allocate a new cache entry, now we always return the label even if we can’t create a new cache entry for it. This should result in fewer errors when applying SELinux security policy to network traffic on a heavily loaded system.

  • Improve the performance of the SELinux label database by carefully removing some of the locking while preserving the database integrity.

  • Fixed a few minor, lingering bugs from the ongoing LSM stacking effort.

  • A number of code cleanups.

Audit

  • Minor kernel internal changes related to filesystem locking.