Linux 5.18 Released

Linux v5.18 was released on Sunday, May 22nd; the SELinux and audit highlights are below:

SELinux

  • Add a new policy capability, “ioctl_skip_cloexec”, which allows the FIOCLEX and FIONCLEX ioctls independently of the loaded SELinux policy when enabled.

  • Implement the security_sctp_assoc_established() hook in SELinux to ensure that the SCTP peer labeling behavior is consistent on both the client and server side. The kernel’s SCTP documentation provides more information on how the SCTP peer labeling behavior:

    An SCTP socket will only have one peer label assigned to it. This will be assigned during the establishment of the first association. Any further associations on this socket will have their packet peer label compared to the socket’s peer label, and only if they are different will the association permission be validated. This is validated by checking the socket peer sid against the received packets peer sid to determine whether the association should be allowed or denied.

  • Reworked how SELinux processes the filesystem mount contexts in an effort to simplify the kernel code and ensure that memory allocations are not attempted when it is inappropriate, e.g. when a spinlock is held. This work did introduce a new restriction in that the new mount API requires that the SELinux policy is loaded before passing filesystem contexts to the mount API.

  • Add SELinux netlink message mappings for RTM_NEWTUNNEL, RTM_DELTUNNEL, RTM_GETTUNNEL, and RTM_SETSTATS. The new tunnel, delete tunnel, and hardware offload stat commands map to the “netlink_route_socket:nlmsg_write” permission while the get tunnel command maps to the “netlink_route_socket:nlmsg_read” permission.

  • Fixed problems in the error handling of the kernel’s SELinux policy loading code.

  • Fixed a problem with stacked LSMs when accessing a filesystem’s superblock.

  • More kernel internal variables and function parameters were marked as constant values to help prevent unintended modification in the SELinux kernel code.

  • Fixed a number of RCU variable marking mismatches.

  • Minor internal style, type casting, and dead code fixes.

Audit

  • Change how the “AUDIT_TIME_*” records are generated such that the time related records are only generated when they are associated with a corresponding syscall. This should help reduce the time related noise in the audit logs.

  • Fixed a problem where a task’s audit context might not be properly reset when using io_uring.

Linux 5.17 Released

Linux v5.17 was released on Sunday, March 20th; the SELinux and audit highlights are below:

SELinux

  • Fixed an improper mutex check in the SELinux code which could have resulted in spurious warning messages.

  • Fixed a problem where an internal policy structure field was not properly reset after freeing, potentially leading to a double-free problem on certain error conditions.

  • Internal hardening improvements relating to calculating memory allocation sizes by changing code to use the struct_size() macro.

  • Various “house cleaning” changes to the SELinux filesystem mount hooks: removing dead code, minor code tweaks, and plugging a potential memory leak.

  • Renamed a LSM/SELinux hook responsible for returning the security label of the currently running task to better reflect its behavior.

Audit

  • Fix problems relating to record queuing and system responsiveness when “audit=1” is specified on the kernel command line and the audit daemon is SIGSTOP‘d for an extended period of time.

  • Ensure that processes which generate userspace records are not exempt from the kernel’s record throttling when the audit queues are being overrun.

  • Fix a problem when auditing the openat2() syscall which could result in improperly accessing userspace memory.

  • Internal hardening improvements through the use of the struct_size() macro and zero-length array to flexible-array conversions.

Linux 5.16 Released

Linux v5.16 was released on Sunday, January 9th; the SELinux and audit highlights are below:

SELinux

  • Added SELinux access controls for io_uring. While io_uring provides an asynchronous I/O mechanism largely free of syscall overhead, its credential sharing functionality presents a challenge to SELinux as well as other LSMs. The new access controls in Linux v5.16 are designed to give SELinux policy developers the ability to restrict which domains are allowed to make use of these new credential override mechanisms. The commit description by Paul Moore (occasionally I still get to write kernel patches!) describes these controls in more detail:

    This patch implements two new io_uring access controls, specifically support for controlling the io_uring “personalities” and IORING_SETUP_SQPOLL. Controlling the sharing of io_urings themselves is handled via the normal file/inode labeling and sharing mechanisms.

    The io_uring { override_creds } permission restricts which domains the subject domain can use to override it’s own credentials. Granting a domain the io_uring { override_creds } permission allows it to impersonate another domain in io_uring operations.

    The io_uring { sqpoll } permission restricts which domains can create asynchronous io_uring polling threads. This is important from a security perspective as operations queued by this asynchronous thread inherit the credentials of the thread creator by default; if an io_uring is shared across process/domain boundaries this could result in one domain impersonating another. Controlling the creation of sqpoll threads, and the sharing of io_urings across processes, allow policy authors to restrict the ability of one domain to impersonate another via io_uring.

    As a quick summary, this patch adds a new object class with two permissions:

    io_uring { override_creds sqpoll }
    

    These permissions can be seen in the two simple policy statements below:

    allow domA_t domB_t : io_uring { override_creds };
    allow domA_t self : io_uring { sqpoll };
    
  • Unfortunately the SELinux lockdown implementation was removed, the commit description provided by Paul Moore explains the reasoning behind this change:

    The original SELinux lockdown implementation in 59438b46471a (“security,lockdown,selinux: implement SELinux lockdown”) used the current task’s credentials as both the subject and object in the SELinux lockdown hook, selinux_lockdown(). Unfortunately that proved to be incorrect in a number of cases as the core kernel was calling the LSM lockdown hook in places where the credentials from the “current” task_struct were not the correct credentials to use in the SELinux access check.

    Attempts were made to resolve this by adding a credential pointer to the LSM lockdown hook as well as suggesting that the single hook be split into two: one for user tasks, one for kernel tasks; however neither approach was deemed acceptable by Linus. Faced with the prospect of either changing the subj/obj in the access check to a constant context (likely the kernel’s label) or removing the SELinux lockdown check entirely, the SELinux community decided that removing the lockdown check was preferable.

  • Enable SELinux genfscon policy support for securityfs, allowing for improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA.

  • A number of fixes relating to the SELinux binder access controls to help ensure that the proper credentials are used in access control decisions.

  • Minor rework of the SELinux SCTP access controls to move the labeling from the SCTP endpoints to the SCTP associations. This shouldn’t result in any user visible changes to the SELinux policy or how it is enforced in the kernel. Future kernel versions are expected to build on this and further improve the SELinux SCTP access controls.

  • A number of bug fixes within the kernel to fix problems relating to uninitialized stack variables, failed memory allocations, blocking in an improper context, and race conditions when object labels were used for the first time.

  • The SELinux per-packet access control implementation was simplified with the removal of some unneeded IPv6 wrapper functions.

  • A number of smaller issues were resolved so that the SELinux subsystem now compiles cleanly with the “W=1” build flag.

Audit

  • In addition to the new SELinux access controls for io_uring we also added audit functionality to record io_uring operations in the audit record stream. The io_uring operations are recorded using a new record, URINGOP/1336. An example, taken from the commit description, can be seen below:
    type=URINGOP msg=audit(1631800225.981:37289):
    uring_op=19 success=yes exit=0 items=0 ppid=15454 pid=15681
    uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
    subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    key=(null)
    

    In the example above the io_uring operation is indicated by the “uring_op” field with the value of “19” indicating a “close” operation. As io_uring operations can be dispatched from several different contexts, it is possible to see an URINGOP record as part of a syscall event as well as a standalone record; in both cases it can be accompanied by various other records, such as PATH records, depending on the operation. Filtering on URINGOP and “uring_op” is also possible using the new AUDIT_FILTER_URING_EXIT audit filter table; see the audit userspace tools for more information.

  • Two new audit records were added for the device mapper subsystem. The records capture information about the creation and destruction of devices as well as any anomalous events such as integrity violations. Michael Weiß provides more information in his commit description (NOTE: corrections have been made to the text below):

    To be able to send auditing events to user space, we introduce a generic dm-audit module. It provides helper functions to emit audit events through the kernel audit subsystem. We claim the AUDIT_DM_CTRL type=1338 and AUDIT_DM_EVENT type=1339 out of the audit event messages range in the corresponding userspace api in ‘include/uapi/linux/audit.h’ for those events.

    AUDIT_DM_CTRL is used to provide information about creation and destruction of device mapper targets which are triggered by user space admin control actions. AUDIT_DM_EVENT is used to provide information about actual errors during operation of the mapped device, showing e.g. integrity violations in audit log.

    Following commits to device mapper targets actually will make use of this to emit those events in relevant cases.

    The audit logs look like this if executing the following simple test:

    # dd if=/dev/zero of=test.img bs=1M count=1024
    # losetup -f test.img
    # integritysetup -vD format --integrity sha256 -t 32 /dev/loop0
    # integritysetup open -D /dev/loop0 --integrity sha256 integritytest
    # integritysetup status integritytest
    # integritysetup close integritytest
    # integritysetup open -D /dev/loop0 --integrity sha256 integritytest
    # integritysetup status integritytest
    # dd if=/dev/urandom of=/dev/loop0 bs=512 count=1 seek=100000
    # dd if=/dev/mapper/integritytest of=/dev/null
    

    audit.log from auditd

    type=UNKNOWN[1338] msg=audit(1630425039.363:184): module=integrity
    op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    type=UNKNOWN[1338] msg=audit(1630425039.471:185): module=integrity
    op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    type=UNKNOWN[1338] msg=audit(1630425039.611:186): module=integrity
    op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    type=UNKNOWN[1338] msg=audit(1630425054.475:187): module=integrity
    op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    
    type=UNKNOWN[1338] msg=audit(1630425073.171:191): module=integrity
    op=ctr ppid=3807 pid=3883 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    
    type=UNKNOWN[1338] msg=audit(1630425087.239:192): module=integrity
    op=dtr ppid=3807 pid=3902 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    
    type=UNKNOWN[1338] msg=audit(1630425093.755:193): module=integrity
    op=ctr ppid=3807 pid=3906 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0
    egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup"
    exe="/sbin/integritysetup" subj==unconfined dev=254:3
    error_msg='success' res=1
    
    type=UNKNOWN[1339] msg=audit(1630425112.119:194): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:195): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:196): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:197): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:198): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:199): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:200): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:201): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:202): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    type=UNKNOWN[1339] msg=audit(1630425112.119:203): module=integrity
    op=integrity-checksum dev=254:3 sector=77480 res=0
    
  • The openat(2) syscall “how” information is now recorded in an OPENAT2 record, as explained by Richard Guy Briggs in the commit description:

    Since the openat2(2) syscall uses a struct open_how pointer to communicate its parameters they are not usefully recorded by the audit SYSCALL record’s four existing arguments.

    Add a new audit record type OPENAT2 that reports the parameters in its third argument, struct open_how with fields oflag, mode and resolve.

    The new record in the context of an event would look like:

    time->Wed Mar 17 16:28:53 2021
    type=PROCTITLE msg=audit(1616012933.531:184): proctitle=
      73797363616C6C735F66696C652F6F70656E617432002F746D702F61756469742D
      7465737473756974652D737641440066696C652D6F70656E617432
    type=PATH msg=audit(1616012933.531:184): item=1 name="file-openat2"
      inode=29 dev=00:1f mode=0100600 ouid=0 ogid=0 rdev=00:00
      obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
      cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
    type=PATH msg=audit(1616012933.531:184):
      item=0 name="/root/rgb/git/audit-testsuite/tests"
      inode=25 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00
      obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT
      cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
    type=CWD msg=audit(1616012933.531:184):
      cwd="/root/rgb/git/audit-testsuite/tests"
      type=OPENAT2 msg=audit(1616012933.531:184):
      oflag=0100302 mode=0600 resolve=0xa
    type=SYSCALL msg=audit(1616012933.531:184): arch=c000003e syscall=437
      success=yes exit=4 a0=3 a1=7ffe315f1c53 a2=7ffe315f1550 a3=18
      items=2 ppid=528 pid=540 auid=0 uid=0 gid=0 euid=0 suid=0
      fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="openat2"
      exe="/root/rgb/git/audit-testsuite/tests/syscalls_file/openat2"
      subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
      key="testsuite-1616012933-bjAUcEPO"
    
  • Changes to the audit record queuing mechanism in the kernel to improve the robustness of the queues, especially in the face of a SIGSTOP’d audit daemon. Paul Moore’s patch description provides more information:

    If the audit daemon were ever to get stuck in a stopped state the kernel’s kauditd_thread() could get blocked attempting to send audit records to the userspace audit daemon. With the kernel thread blocked it is possible that the audit queue could grow unbounded as certain audit record generating events must be exempt from the queue limits else the system enter a deadlock state.

    This patch resolves this problem by lowering the kernel thread’s socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks the kauditd_send_queue() function to better manage the various audit queues when connection problems occur between the kernel and the audit daemon. With this patch, the backlog may temporarily grow beyond the defined limits when the audit daemon is stopped and the system is under heavy audit pressure, but kauditd_thread() will continue to make progress and drain the queues as it would for other connection problems. For example, with the audit daemon put into a stopped state and the system configured to audit every syscall it was still possible to shutdown the system without a kernel panic, deadlock, etc.; granted, the system was slow to shutdown but that is to be expected given the extreme pressure of recording every syscall.

  • A minor performance and correctness fix to the audit filtering engine to skip filter rules with a lower priority.