Linux 5.16 Released
10 Jan 2022 tags: audit selinuxLinux v5.16 was released on Sunday, January 9th; the SELinux and audit highlights are below:
SELinux
- Added SELinux access controls for io_uring. While io_uring provides an asynchronous I/O mechanism largely free of syscall overhead, its credential sharing functionality presents a challenge to SELinux as well as other LSMs. The new access controls in Linux v5.16 are designed to give SELinux policy developers the ability to restrict which domains are allowed to make use of these new credential override mechanisms. The commit description by Paul Moore (occasionally I still get to write kernel patches!) describes these controls in more detail:
This patch implements two new io_uring access controls, specifically support for controlling the io_uring “personalities” and IORING_SETUP_SQPOLL. Controlling the sharing of io_urings themselves is handled via the normal file/inode labeling and sharing mechanisms.
The io_uring { override_creds } permission restricts which domains the subject domain can use to override it’s own credentials. Granting a domain the io_uring { override_creds } permission allows it to impersonate another domain in io_uring operations.
The io_uring { sqpoll } permission restricts which domains can create asynchronous io_uring polling threads. This is important from a security perspective as operations queued by this asynchronous thread inherit the credentials of the thread creator by default; if an io_uring is shared across process/domain boundaries this could result in one domain impersonating another. Controlling the creation of sqpoll threads, and the sharing of io_urings across processes, allow policy authors to restrict the ability of one domain to impersonate another via io_uring.
As a quick summary, this patch adds a new object class with two permissions:
io_uring { override_creds sqpoll }
These permissions can be seen in the two simple policy statements below:
allow domA_t domB_t : io_uring { override_creds }; allow domA_t self : io_uring { sqpoll };
- Unfortunately the SELinux lockdown implementation was removed, the commit description provided by Paul Moore explains the reasoning behind this change:
The original SELinux lockdown implementation in 59438b46471a (“security,lockdown,selinux: implement SELinux lockdown”) used the current task’s credentials as both the subject and object in the SELinux lockdown hook, selinux_lockdown(). Unfortunately that proved to be incorrect in a number of cases as the core kernel was calling the LSM lockdown hook in places where the credentials from the “current” task_struct were not the correct credentials to use in the SELinux access check.
Attempts were made to resolve this by adding a credential pointer to the LSM lockdown hook as well as suggesting that the single hook be split into two: one for user tasks, one for kernel tasks; however neither approach was deemed acceptable by Linus. Faced with the prospect of either changing the subj/obj in the access check to a constant context (likely the kernel’s label) or removing the SELinux lockdown check entirely, the SELinux community decided that removing the lockdown check was preferable.
-
Enable SELinux genfscon policy support for securityfs, allowing for improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA.
-
A number of fixes relating to the SELinux binder access controls to help ensure that the proper credentials are used in access control decisions.
-
Minor rework of the SELinux SCTP access controls to move the labeling from the SCTP endpoints to the SCTP associations. This shouldn’t result in any user visible changes to the SELinux policy or how it is enforced in the kernel. Future kernel versions are expected to build on this and further improve the SELinux SCTP access controls.
-
A number of bug fixes within the kernel to fix problems relating to uninitialized stack variables, failed memory allocations, blocking in an improper context, and race conditions when object labels were used for the first time.
-
The SELinux per-packet access control implementation was simplified with the removal of some unneeded IPv6 wrapper functions.
- A number of smaller issues were resolved so that the SELinux subsystem now compiles cleanly with the “W=1” build flag.
Audit
- In addition to the new SELinux access controls for io_uring we also added audit functionality to record io_uring operations in the audit record stream. The io_uring operations are recorded using a new record, URINGOP/1336. An example, taken from the commit description, can be seen below:
type=URINGOP msg=audit(1631800225.981:37289): uring_op=19 success=yes exit=0 items=0 ppid=15454 pid=15681 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
In the example above the io_uring operation is indicated by the “uring_op” field with the value of “19” indicating a “close” operation. As io_uring operations can be dispatched from several different contexts, it is possible to see an URINGOP record as part of a syscall event as well as a standalone record; in both cases it can be accompanied by various other records, such as PATH records, depending on the operation. Filtering on URINGOP and “uring_op” is also possible using the new AUDIT_FILTER_URING_EXIT audit filter table; see the audit userspace tools for more information.
- Two new audit records were added for the device mapper subsystem. The records capture information about the creation and destruction of devices as well as any anomalous events such as integrity violations. Michael Weiß provides more information in his commit description (NOTE: corrections have been made to the text below):
To be able to send auditing events to user space, we introduce a generic dm-audit module. It provides helper functions to emit audit events through the kernel audit subsystem. We claim the AUDIT_DM_CTRL type=1338 and AUDIT_DM_EVENT type=1339 out of the audit event messages range in the corresponding userspace api in ‘include/uapi/linux/audit.h’ for those events.
AUDIT_DM_CTRL is used to provide information about creation and destruction of device mapper targets which are triggered by user space admin control actions. AUDIT_DM_EVENT is used to provide information about actual errors during operation of the mapped device, showing e.g. integrity violations in audit log.
Following commits to device mapper targets actually will make use of this to emit those events in relevant cases.
The audit logs look like this if executing the following simple test:
# dd if=/dev/zero of=test.img bs=1M count=1024 # losetup -f test.img # integritysetup -vD format --integrity sha256 -t 32 /dev/loop0 # integritysetup open -D /dev/loop0 --integrity sha256 integritytest # integritysetup status integritytest # integritysetup close integritytest # integritysetup open -D /dev/loop0 --integrity sha256 integritytest # integritysetup status integritytest # dd if=/dev/urandom of=/dev/loop0 bs=512 count=1 seek=100000 # dd if=/dev/mapper/integritytest of=/dev/null
audit.log from auditd
type=UNKNOWN[1338] msg=audit(1630425039.363:184): module=integrity op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1338] msg=audit(1630425039.471:185): module=integrity op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1338] msg=audit(1630425039.611:186): module=integrity op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1338] msg=audit(1630425054.475:187): module=integrity op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1
type=UNKNOWN[1338] msg=audit(1630425073.171:191): module=integrity op=ctr ppid=3807 pid=3883 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1
type=UNKNOWN[1338] msg=audit(1630425087.239:192): module=integrity op=dtr ppid=3807 pid=3902 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1
type=UNKNOWN[1338] msg=audit(1630425093.755:193): module=integrity op=ctr ppid=3807 pid=3906 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1
type=UNKNOWN[1339] msg=audit(1630425112.119:194): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:195): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:196): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:197): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:198): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:199): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:200): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:201): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:202): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1339] msg=audit(1630425112.119:203): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0
- The openat(2) syscall “how” information is now recorded in an OPENAT2 record, as explained by Richard Guy Briggs in the commit description:
Since the openat2(2) syscall uses a struct open_how pointer to communicate its parameters they are not usefully recorded by the audit SYSCALL record’s four existing arguments.
Add a new audit record type OPENAT2 that reports the parameters in its third argument, struct open_how with fields oflag, mode and resolve.
The new record in the context of an event would look like:
time->Wed Mar 17 16:28:53 2021 type=PROCTITLE msg=audit(1616012933.531:184): proctitle= 73797363616C6C735F66696C652F6F70656E617432002F746D702F61756469742D 7465737473756974652D737641440066696C652D6F70656E617432 type=PATH msg=audit(1616012933.531:184): item=1 name="file-openat2" inode=29 dev=00:1f mode=0100600 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(1616012933.531:184): item=0 name="/root/rgb/git/audit-testsuite/tests" inode=25 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1616012933.531:184): cwd="/root/rgb/git/audit-testsuite/tests" type=OPENAT2 msg=audit(1616012933.531:184): oflag=0100302 mode=0600 resolve=0xa type=SYSCALL msg=audit(1616012933.531:184): arch=c000003e syscall=437 success=yes exit=4 a0=3 a1=7ffe315f1c53 a2=7ffe315f1550 a3=18 items=2 ppid=528 pid=540 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="openat2" exe="/root/rgb/git/audit-testsuite/tests/syscalls_file/openat2" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="testsuite-1616012933-bjAUcEPO"
- Changes to the audit record queuing mechanism in the kernel to improve the robustness of the queues, especially in the face of a SIGSTOP’d audit daemon. Paul Moore’s patch description provides more information:
If the audit daemon were ever to get stuck in a stopped state the kernel’s kauditd_thread() could get blocked attempting to send audit records to the userspace audit daemon. With the kernel thread blocked it is possible that the audit queue could grow unbounded as certain audit record generating events must be exempt from the queue limits else the system enter a deadlock state.
This patch resolves this problem by lowering the kernel thread’s socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks the kauditd_send_queue() function to better manage the various audit queues when connection problems occur between the kernel and the audit daemon. With this patch, the backlog may temporarily grow beyond the defined limits when the audit daemon is stopped and the system is under heavy audit pressure, but kauditd_thread() will continue to make progress and drain the queues as it would for other connection problems. For example, with the audit daemon put into a stopped state and the system configured to audit every syscall it was still possible to shutdown the system without a kernel panic, deadlock, etc.; granted, the system was slow to shutdown but that is to be expected given the extreme pressure of recording every syscall.
- A minor performance and correctness fix to the audit filtering engine to skip filter rules with a lower priority.