This story is a continuation of the last one, in which we discussed Process Capabilities Sets in detail. Some of you may be wondering how these Capabilities Sets are determined or applied to Unprivileged and Privileged Program Binary. This article is aimed at them.

Before I begin detailing process creation mechanics and Linux capabilities, I'd want to go over two key concepts.

Capability Aware Applications

Capability-aware applications can manipulate their capability set with system calls (capset, capget, prctl) after load. At some point during execution when an application doesn't need certain capabilities, it can drop some capabilities from its effective set to limit exposure to privileged tasks. As long it has a capability in the permitted set, it always brings back that capability to its effective set.

e.g runc, ping etc.

Capability Dump Applications

Applications don’t do any system calls (capset) to modify their capabilities, and they depend on the capability sets that are inherited from the parent and constructed during application load. In order words, they rely on an effective capability set to do their job.

e.g cat, ls etc.


Unprivileged Program Binary

Unprivileged Program Binary is when no File Capabilities are enabled on the executable. When we load an unprivileged program binary (e.g., ls, cat), the capability sets of the thread (parent) in conjunction with file SETUID bit are used to determine the capabilities of that thread after execve(2).

In the case of Unprivileged Program Binary, the ambient capabilities are critical in determining the thread's capabilities.

Let's have a look at how capability sets are determined for an Unprivileged Program Binary after execve(2) under certain conditions.

Capabilities Transition

Explanation

Ambient capabilities must exist in a bounding set.

Use Case #1: Unprivileged Bash Process

An unprivileged user (bash process) uses the ping executable to ping a local server.

Criteria:

Schematic Diagram

Prepare The Environment

# File Ownership: setuid bit != set && owner == root
$ ls -la ping_clone
-rwxr-xr-x ... root root ... ping_clone

# Parent Process: Unprivileged bash proces which runs with no 
# or limited capabilities
$ capsh --print 
Current: =
Bounding set =cap_chown,cap_dac_override, .....
 .....
uid=1000(ubuntu)
gid=1000(ubuntu)


# Executable Binary: Unprivileged ping binary
$ getcap ping_clone

Demo #1: Using capsh Utility

Use capsh utility to bootstrap an unprivileged bash process and then ping a local server.

$ sudo capsh --caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" 
--keep=1 --user=ubuntu --addamb="cap_net_admin,cap_net_raw" --print -- -c "./ping_clone -c 1 localhost"
Current: = cap_setgid,cap_setuid,cap_setpcap,cap_net_admin,cap_net_raw+p 
Bounding set = cap_chown,cap_dac_override,cap_dac_read_search,
    cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,
    cap_setpcap,cap_linux_immutable,cap_net_bind_service,
    cap_net_broadcast,cap_net_admin,cap_net_raw,
    cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,
    cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,
    cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,
    cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,
    cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,
    cap_syslog,35,36,37 
Securebits: 020/0x10/5'b10000 
 secure-noroot: no (unlocked) 
 secure-no-suid-fixup: no (unlocked) 
 secure-keep-caps: yes (unlocked) 
uid=1000(ubuntu) 
gid=1000(ubuntu) 
groups=4(adm),10(wheel),190(systemd-journal),991(docker),1000(ubuntu) 
PING localhost (127.0.0.1) 56(84) bytes of data. 
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=255 time=0.033 ms
--- localhost ping statistics --- 
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 0.033/0.033/0.033/0.000 ms

So what's going on here? Let's have a look:

Demo #2: Using setpriv Utility

You may need to install setpriv utility.

$ sudo apt install setpriv

We'll use the setpriv utility to run the ping_clone binary as an unprivileged user.

$ sudo setpriv --inh-caps '-all,+net_raw' \
--bounding-set '-all,+net_raw' \
--reuid=ubuntu \
--ambient-caps='+net_raw' \ 
./ping_clone -c1 127.0.0.1                                                                  
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.019 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms

When --ambient-caps argument isn't supplied, ping_clone utility will complain about 'socket: Operation not permitted'

So, what exactly is going on here? Let me clarify.


Use Case #2: Privileged Bash Process

A privileged user (bash process) pings a local server using an unprivileged ping binary.

Criteria

Schematic Diagram

Prepare The Environment

# File Ownership: setuid bit != set && owner == root.
$ ls -la ping_clone
-rwxr-xr-x ... root root ... ping_clone

# Parent Process: Privileged bash process runs with full capabilities.
$ capsh --print 
Current: = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, ..... 
Bounding set = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, .....
 .....
uid=0(root) 
gid=0(root) 
...

# Executable Binary: Unprivileged ping binary (file capabilities aren't set).
$ getcap ping_clone

Capabilities Transition

When you log in as root, your Effective User ID is set to 0 and you have unrestricted access to the system to do (nearly) whatever you want.

Login as a root user explains everything.

With (Effective User ID == 0), the bash process becomes a privileged process. Despite the fact that all Linux capabilities are enabled, the kernel normally skips all restriction checks when Effective User ID == 0.


Use Case #3: Special Permissions (SUID, SGID)

Set User ID (setuid) and Set Group ID (sgid) are special permissions for executable files.

When these permissions are assigned to a file, the file to be executed assumes the privileges of the file's owner or group.

setuid bit changes a program effective uid (euid) upon execution.

Criteria:

Schematic Diagram

Prepare The Environment

# File Ownership: setuid bit == set && owner == root.
$ ls -la ping_clone
-rwsr-xr-x ... root root ... ping_clone

# Parent Process: Unprivileged bash process(no or limited capabilities).
$ capsh --print 
Current: =
Bounding set =cap_chown,cap_dac_override, .....
 .....
uid=1000(ubuntu)
gid=1000(ubuntu)
...

# Executable Binary: Unprivileged ping binary. (file capabilities aren't set).
$ getcap ping_clone
# setuid bit set
$ ls -la
...
-rwsr-xr-x ... root root ... ping_clone

Capabilities Transition

When a non-root user executes the ping clone utility owned by the root user and with the setuid bit set, the file will always run in root user context (EUID = 0), until a program changes its effective uid (euid) during execution.

~$ ping_clone localhost &
[1] 31994
~# PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.027 ms

~$ cat /proc/31994/status
Name:   ping_clone
...
...
Uid:    1000    1000    0       1000
Gid:    1000    1000    1000    1000
...
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
...

So, what's going on here?

Let's take a look at the ping_clone utility from the perspective of system calls. Remember that it is a capability-aware application that may change its capabilities programmatically.

Take a look at the output of the strace tracing tool.


Privileged Program Binary

Privileged Program Binary means that certain capabilities have been assigned to executable files. When we load a privileged Program Binary (e.g., ping clone), the executable file's capability set plays a significant role in the thread after execve(2).

Use getcap utility to determine privileged status of a Program Binary.

Capabilities Transition

Explanation

Use Case #1: Unprivileged Bash Process

A unprivileged user (bash process) pings a local server using a privileged ping binary.

Criteria:

Schematic Diagram

Prepare The Environment

# setuid bit != set && owner != root
$ ls -la ping_clone
-rwxr-xr-x ... ubuntu ubuntu ... ping_clone

# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+i

# Unprivileged User
$ capsh --print 
Current: =
Bounding set =cap_chown,cap_dac_override, .....
 .....
uid=1000(ubuntu)
gid=1000(ubuntu)
...

Example #1: When File Inheritable Set is set

Condition: Make sure that ping_clone utility is set with cap_net_raw as it's inheritable capability.

Terminal 1

# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+i

$ sudo capsh 
--caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" 
--keep=1 --user=ubuntu --inh="cap_net_raw" 
--print -- -c "./ping_clone localhost"                                                         
Current: = cap_net_raw+ip cap_setgid,cap_setuid,cap_setpcap,cap_net_admin+p
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 020/0x10/5'b10000
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: yes (unlocked)
uid=1000(ubuntu)
gid=1000(ubuntu)
groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu)
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023

Terminal 2

$ cat /proc/4696/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000002000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

So what's going on here? Let's have a look

Example #2: File Permitted Set is set

When file permitted set is limited to cap_net_raw.

Terminal 1

# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+p

$ sudo capsh 
--caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" 
--user=ubuntu 
--print -- -c "./ping_clone localhost"                                                         
Current: = 
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 020/0x10/5'b10000
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: yes (unlocked)
uid=1000(ubuntu)
gid=1000(ubuntu)
groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu)
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023 ms

Terminal 2

$ cat /proc/4696/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000002000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

So what's going on here? Let's explain:

Example #3: When File Effective Bit is set

File effective bit makes more sense when application binaries like cat, nice, etc are unaware of capget() and capset() syscalls and can't change their thread effective set. In this case, they rely on external conditions, such as file effective bit, to copy all the capabilities of the permitted set into an effective set.

Instead of ping_clone utility, we will use top_clone utility for demonstration.

Terminal 1

# Privicp leged ping binary
$ getcap top_clone
top_clone = cap_chown+ep

$ ./top_clone 
....
uid=1000(ubuntu)
top - 09:44:35 up 13:25,  0 users,  load average: 0.15, 0.05, 0.01
Tasks: 120 total,   2 running,  79 sleeping,   0 stopped,   0 zombie
.....

Terminal 2

CapInh: 0000000000000000
CapPrm: 0000000000000001
CapEff: 0000000000000001
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

So what's going with thread capabilities:

$ getcap top_clone
top_clone = cap_chown+ep

CapPrm: 0000000000000001
CapEff: 0000000000000001