Linux 5.9 Released

Linux v5.9 was released on Sunday, October 11th, 2020; the SELinux and audit highlights are below:

SELinux

  • Allow reading of SELinux labels before the policy is loaded, allowing for some more “exotic” initramfs approaches as described by the author Jonathan Lebon:

    This patch does for ‘getxattr’ what commit 3e3e24b42043 (“selinux: allow labeling before policy is loaded”) did for ‘setxattr’; it allows querying the current SELinux label on disk before the policy is loaded.

    One of the motivations described in that commit message also drives this patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be able to move the root filesystem for example, from xfs to ext4 on RAID, on first boot, at initrd time.

    Because such an operation works at the filesystem level, we need to be able to read the SELinux labels first from the original root, and apply them to the files of the new root. The previous commit enabled the second part of this process; this commit enables the first part.

  • Improved the SELinux policy symbol table performance due to a rework of the insert and search functions. The patch author, Ondrej Mosnacek, described the impact of the changes in the commit description:

    With this patch, I measured a speed up in the following areas (measured on x86_64 F32 VM with 4 CPUs):

    1. Policy load (‘load_policy’) - takes ~150 ms instead of ~230 ms.
    2. ‘chcon -R unconfined_u:object_r:user_tmp_t:s0:c381,c519 /tmp/linux-src’ where /tmp/linux-src is an extracted linux-5.7 source tarball - takes ~522 ms instead of ~576 ms. This is because of many symtab_search() calls in string_to_context_struct() when there are many categories specified in the context.
    3. ‘stress-ng –msg 1 –msg-ops 10000000’ - takes 12.41 s instead of 13.95 s (consumes 18.6 s of kernel CPU time instead of 21.6 s). This is thanks to security_transition_sid() being ~43% faster after this patch.
  • Added support for the CAP_CHECKPOINT_RESTORE capability in the “capability2” object class as “checkpoint_restore”.

  • Fixed a problem where error messages were not properly logged when the required “process” object class, “transition” permission, or “dyntransition” permission were missing from the policy being loaded into the kernel.

  • Fix some problems with initial SIDs and the script generated SELinux MDP policy.

Audit

  • Audit records are now generated for nftables configuration change events using the NETFILTER_CFG record with the “table” field carrying the nftables name and handle information as seen in this example record provided by the patch author, Richard Guy Briggs:
    type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) :
      table=firewalld:1;filter_FORWARD:85 family=inet entries=101
      op=nft_register_rule pid=396 subj=system_u:system_r:firewalld_t:s0
      comm=firewalld
    
  • Add a new backlog wait metric to the audit status message, this is intended to help admins determine how long processes have been waiting for the audit backlog queue to clear. The patch author, Max Englander, describes this in more detail in the commit description:

    In environments where the preservation of audit events and predictable usage of system memory are prioritized, admins may use a combination of –backlog_wait_time and -b options at the risk of degraded performance resulting from backlog waiting. In some cases, this risk may be preferred to lost events or unbounded memory usage. Ideally, this risk can be mitigated by making adjustments when backlog waiting is detected.

    However, detection can be difficult using the currently available metrics. For example, an admin attempting to debug degraded performance may falsely believe a full backlog indicates backlog waiting. It may turn out the backlog frequently fills up but drains quickly.

    To make it easier to reliably track degraded performance to backlog waiting, this patch makes the following changes:

    Add a new field backlog_wait_time_total to the audit status reply. Initialize this field to zero. Add to this field the total time spent by the current task on scheduled timeouts while the backlog limit is exceeded. Reset field to zero upon request via AUDIT_SET.

  • Fixed a problem where the LSM_AUDIT_DATA_* records were not causing the CWD record to be generated. Prior to this fix, administrators could find it difficult to piece together a complete audit event in some situations.

  • Several small internal kernel fixes and removal of old, outdated code.

Linux 5.8 Released

Linux v5.8 was released on Sunday, August 2, 2020; the SELinux and audit highlights are below:

SELinux

  • Added support for a new SELinux policy version, version 33, which allows for a more space efficient way of storing the filename transitions in the binary policy. Given the default Fedora SELinux policy with the unconfined module enabled, this change drops the policy size from ~7.6MB to ~3.3MB, with policy load times dropping as well.

  • A number of improvements to various SELinux internal kernel data structures to help improve performance and simplify the code. The role transitions moved into a hash table, and we shifted from hashing the rendered SELinux label string to the content structure itself, when it is valid.

  • Support was added for the new CAP_PERFMON and CAP_BPF capabilities in the “capability2” object class.

  • Several bug fixes found by the Clang Static analyzer which resolve potential double-free conditions and undefined return values.

  • Some fixes to the error handling code in the policy parser to properly return error codes when things go wrong.

  • Internal changes to the the LSM hook responsible for ensuring that the LSM credentials are set correctly for processes when they are executed.

  • Changes to the LSM/SELinux hooks for the kernel keyring.

Audit

  • Binding and unbinding to the audit multicast socket now generates audit records. This is intended to help administrators identify which processes have, or had, access to the information in the audit record stream.

  • Some of the audit error handling was improved to remove the potential for leaking network namespace references in the kernel.

  • The netfilter configuration records were cleaned and additional information was added to the records.

  • Sadly the commit which helped enable better support for accompanying records which was merged for the Linux v5.7 release needed to be reverted due to problems with the implementation. I expect this to come back at a later date once the code is improved.

Libseccomp 2.5.0 Released

On behalf of the libseccomp project I would like to announce libseccomp v2.5.0!

The libseccomp v2.5.0 release is backwards compatible with previous v2.x releases and is a drop-in replacement; no recompilation of applications is required. Applications will need to be restarted to take advantage of the new libseccomp release. While the v2.4.x release stream will be supported for at least one more maintenance release, all users and distributions are encouraged to upgrade to libseccomp v2.5.0.

The core libseccomp library is the work of 56 contributors, and this release is a significant upgrade over the libseccomp v2.4.x release stream. The v2.5.0 release brings new support for RISC-V and seccomp user notifications along with a number of bug fixes and performance improvements. A more detailed list of changes can be seen below:

  • Add support for the seccomp user notifications, see the seccomp_notify_alloc(3), seccomp_notify_receive(3), seccomp_notify_respond(3) manpages for more information
  • Add support for new filter optimization approaches, including a balanced tree optimization, see the SCMP_FLTATR_CTL_OPTIMIZE filter attribute for more information
  • Add support for the 64-bit RISC-V architecture
  • Performance improvements when adding new rules to a filter thanks to the use of internal shadow transactions and improved syscall lookup tables
  • Properly document the libseccomp API return values and include them in the stable API promise
  • Improvements to the s390 and s390x multiplexed syscall handling
  • Multiple fixes and improvements to the libseccomp manpages
  • Moved from manually maintained syscall tables to an automatically generated syscall table in CSV format
  • Update the syscall tables to Linux v5.8.0-rc5
  • Python bindings and build now default to Python 3.x
  • Improvements to the tests have boosted code coverage to over 93%
  • Enable Travis CI testing on the aarch64 and ppc64le architectures
  • Add code inspection via lgtm.com