The SELinux Notebook

Many of you reading this are likely already aware of “The SELinux Notebook” by Richard Haines. For those of you who have not seen it before, “The SELinux Notebook” is a very comprehensive guide to SELinux that stretches from the kernel all the way up through the policy. It’s really an impressive work, made even more impressive by the fact that Richard has been kind enough to make it freely available.

In the latest 5th edition, Richard opened the book even more by converting it into Markdown and posting the book, in source form, on GitHub. This is a tremendous gift to the SELinux community, and one that I hope we will not squander. My hope is that we can turn “The SELinux Notebook” into a living document that is updated along with the code and the policies so that it continues to be the excellent resource that it is today.

If you are interested in helping contribute to “The SELinux Notebook”, there are some quick notes in the CONTRIBUTING.md file to help you get started with the project.

Linux 5.7 Released

Linux v5.7 was released on Sunday, May 31, 2020; the SELinux and audit highlights are below:

SELinux

  • Deprecate setting “/sys/fs/selinux/checkreqprot” to 1. This flag was originally created to deal with legacy userspace and the READ_IMPLIES_EXEC personality flag. We changed the default from 1 to 0 back in Linux v4.4 and now we are taking the next step of deprecating it, at some point in the future we will take the final step of rejecting 1.

  • Allow kernfs symlinks to inherit the SELinux label of the parent directory. In order to preserve backwards compatibility this is protected by the “genfs_seclabel_symlinks” SELinux policy capability.

  • Fix a problem where we were not properly handling multiple netlink messages in a single message buffer. Unfortunately this could cause some netlink messages to escape the SELinux access controls. This issue was assigned the CVE number CVE-2020-10751.

  • Enable per-file labeling for the BPF filesystem.

  • Improve how we handle initial SIDs in the kernel and remove a number that were unused.

  • Optimize how we store filename transitions in the kernel, resulting in some significant improvements to policy load times.

  • We now do a better job calculating the sizes of the internal hash tables, which improved SELinux policy load times and likely general SELinux performance as well.

  • Ensure that we properly label NFS v4.2 filesystems to avoid a temporary unlabeled condition.

  • Add some missing XFS quota command types to the SELinux quota access controls.

  • Fix a problem where we were not properly handling all read operations in selinuxfs.

  • Convert several linked lists to arrays to help with performance and improve code simplicity.

Audit

  • Stop logging inode information when updating an audit file watch. Since we are not changing the inode, or the fact that we are watching the associated file, the inode information is just noise that we can do without.

  • Fix a problem where mandatory audit records were missing their accompanying audit records (e.g. SYSCALL records were missing). The missing records often meant that we didn’t have the necessary context to understand what was going on when the event occurred. [UPDATE August 4, 2020: this patch was reverted during the Linux v5.8-rcX phase due to problems, it should reappear at a later date]

  • Fix a problem where we were not properly checking the length of audit records generated by userspace programs allowed to submit audit records due to the CAP_AUDIT_WRITE capability.

Linux 5.6 Released

Linux v5.6 was released on Sunday, March 29, 2020; the SELinux and audit highlights are below:

SELinux

  • We’ve wanted to remove the CONFIG_SECURITY_SELINUX_DISABLE build option for some time, and in Linux v5.6 we are taking the first step by marking it deprecated as explained in the patch description:

    Deprecate the CONFIG_SECURITY_SELINUX_DISABLE functionality. The code was originally developed to make it easier for Linux distributions to support architectures where adding parameters to the kernel command line was difficult. Unfortunately, supporting runtime disable meant we had to make some security trade-offs when it came to the LSM hooks, as documented in the Kconfig help text:

    NOTE: selecting this option will disable the ‘__ro_after_init’ kernel hardening feature for security hooks. Please consider using the selinux=0 boot parameter instead of enabling this option.

    Fortunately it looks as if that the original motivation for the runtime disable functionality is gone, and Fedora/RHEL appears to be the only major distribution enabling this capability at build time so we are now taking steps to remove it entirely from the kernel. The first step is to mark the functionality as deprecated and print an error when it is used (what this patch is doing). As Fedora/RHEL makes progress in transitioning the distribution away from runtime disable, we will introduce follow-up patches over several kernel releases which will block for increasing periods of time when the runtime disable is used. Finally we will remove the option entirely once we believe all users have moved to the kernel cmdline approach.

  • Add new SELinux controls for the kernel lockdown functionality with a new object class, “lockdown”:
    lockdown { integrity confidentiality }
    

    Stephen Smalley, the patch author, provides a good explanation of the new controls in the patch description:

    Implement a SELinux hook for lockdown. If the lockdown module is also enabled, then a denial by the lockdown module will take precedence over SELinux, so SELinux can only further restrict lockdown decisions. The SELinux hook only distinguishes at the granularity of integrity versus confidentiality similar to the lockdown module, but includes the full lockdown reason as part of the audit record as a hint in diagnosing what triggered the denial. To support this auditing, move the lockdown_reasons[] string array from being private to the lockdown module to the security framework so that it can be used by the lsm audit code and so that it is always available even when the lockdown module is disabled.

    Note that the SELinux implementation allows the integrity and confidentiality reasons to be controlled independently from one another. Thus, in an SELinux policy, one could allow operations that specify an integrity reason while blocking operations that specify a confidentiality reason. The SELinux hook implementation is stricter than the lockdown module in validating the provided reason value.

    Sample AVC audit output from denials:

    avc:  denied  { integrity } for pid=3402 comm="fwupd"
    lockdown_reason="/dev/mem,kmem,port" scontext=system_u:system_r:fwupd_t:s0
    tcontext=system_u:system_r:fwupd_t:s0 tclass=lockdown permissive=0
    
    avc:  denied  { confidentiality } for pid=4628 comm="cp"
    lockdown_reason="/proc/kcore access"
    scontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
    tcontext=unconfined_u:unconfined_r:test_lockdown_integrity_t:s0-s0:c0.c1023
    tclass=lockdown permissive=0
    
  • Add a new access control for the move_mount(2) syscall which reuses the “file { mounton }” permission.

  • Enable SELinux per-file labeling for binderfs.

  • Add SELinux support for the new netlink VLAN configuration messages RTM_NEWVLAN, RTM_DELVLAN, and RTM_GETVLAN.

  • Improve how we store and access SELinux security labels within the kernel by caching the kernel’s SID-to-string security label translation. Those systems with a large number of MLS/MCS categories in use and applications which often query the kernel for SELinux labels should see the biggest improvement in performance.

  • Improve the LSM and SELinux kernel build definitions such that the LSM auditing and SELinux InfiniBand code is only built when needed by the kernel’s build time configuration.

  • The LSM stacking changes introduced a SELinux bug when SELinux is disabled at runtime. We attempt to fix this in Linux v5.6 by reordering the SELinux hooks in the LSM configuration such that the data structures are properly managed. The real fix for this is to eventually remove the runtime disable functionality (see the CONFIG_SECURITY_SELINUX_DISABLE deprecation notice above) and use the kernel command line to disable SELinux at boot (e.g. “selinux=0”) if desired.

  • Fix a problem in the SELinux access control enforcement cache where we were not always cleaning up properly on error. In some extreme cases this could effectively shrink the access control cache and impact performance.

  • Multiple key internal SELinux data structures were marked with “__randomize_layout” to help harden the SELinux code in the kernel.

  • Fixed out of date references to the selinuxfs mount point in the kernel documentation. The kernel documentation should now correctly reference selinuxfs as being mounted on “/sys/fs/selinux”.

  • Several cleanups and smaller fixes relating to locking, inode auditing, and caching which should have little impact to most users but helps improve the quality of the SELinux kernel code.

Audit

  • Add support for auditing BPF program load and unload operations with the BPF specific program ID. Daniel Borkmann, the patch author, provides a good explanation, complete with an example, in the patch description:

    Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event.

    The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core.

    Raw example output:

    # auditctl -D
    # auditctl -a always,exit -F arch=x86_64 -S bpf
    # ausearch --start recent -m 1334
    ...
    ----
    time->Wed Nov 27 16:04:13 2019
    type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf"
    type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \
    success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477    \
    pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001    \
    egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf"                \
    exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf"                  \
    subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
    type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD
    ----
    time->Wed Nov 27 16:04:13 2019
    type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD
    ...
    
  • In Linux v3.18 the audit code was changed to convert one of the internal audit filter structures into a union, which unfortunately introduced a bug which was just recently discovered by syzbot. Converting the structure into a union was the right thing to do so we left that in place, but we fixed the associated code to properly handle the data structure. The patch description provides more detail on the problem and the fix:

    Commit 219ca39427bf (“audit: use union for audit_field values since they are mutually exclusive”) combined a number of separate fields in the audit_field struct into a single union. Generally this worked just fine because they are generally mutually exclusive. Unfortunately in audit_data_to_entry() the overlap can be a problem when a specific error case is triggered that causes the error path code to attempt to cleanup an audit_field struct and the cleanup involves attempting to free a stored LSM string (the lsm_str field). Currently the code always has a non-NULL value in the audit_field.lsm_str field as the top of the for-loop transfers a value into audit_field.val (both .lsm_str and .val are part of the same union); if audit_data_to_entry() fails and the audit_field struct is specified to contain a LSM string, but the audit_field.lsm_str has not yet been properly set, the error handling code will attempt to free the bogus audit_field.lsm_str value that was set with audit_field.val at the top of the for-loop.

    This patch corrects this by ensuring that the audit_field.val is only set when needed (it is cleared when the audit_field struct is allocated with kcalloc()). It also corrects a few other issues to ensure that in case of error the proper error code is returned.

  • Fixed a problem where we were not always properly checking the length of audit netlink messages before parsing the messages in the kernel.

  • Added some missing RCU annotations to internal audit data structures.