Config-Model-Systemd

 view release on metacpan or  search on metacpan

lib/Config/Model/models/Systemd/Common/Exec.pl  view on Meta::CPAN

    CapabilityBoundingSet=CAP_B CAP_C

then C<CAP_A>, C<CAP_B>, and
C<CAP_C> are set. If the second line is prefixed with
C<~>, e.g.,

    CapabilityBoundingSet=CAP_A CAP_B
    CapabilityBoundingSet=~CAP_B CAP_C

then, only C<CAP_A> is set.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'AmbientCapabilities',
      {
        'description' => 'Controls which capabilities to include in the ambient capability set for the executed
process. Takes a whitespace-separated list of capability names, e.g. C<CAP_SYS_ADMIN>,
C<CAP_DAC_OVERRIDE>, C<CAP_SYS_PTRACE>. This option may appear more than
once, in which case the ambient capability sets are merged (see the above examples in
C<CapabilityBoundingSet>). If the list of capabilities is prefixed with C<~>,
all but the listed capabilities will be included, the effect of the assignment inverted. If the empty string is
assigned to this option, the ambient capability set is reset to the empty capability set, and all prior
settings have no effect. If set to C<~> (without any further argument), the ambient capability
set is reset to the full set of available capabilities, also undoing any previous settings. Note that adding
capabilities to the ambient capability set adds them to the process\'s inherited capability set.

Ambient capability sets are useful if you want to execute a process as a non-privileged user but
still want to give it some capabilities. Note that, in this case, option C<keep-caps>
is automatically added to C<SecureBits> to retain the capabilities over the user
change. C<AmbientCapabilities> does not affect commands prefixed with
C<+>.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'NoNewPrivileges',
      {
        'description' => 'Takes a boolean argument. If true, ensures that the service process and all its
children can never gain new privileges through execve() (e.g. via setuid or
setgid bits, or filesystem capabilities). This is the simplest and most effective way to ensure that
a process and its children can never elevate privileges again. Defaults to false. In case the service
will be run in a new mount namespace anyway and SELinux is disabled, all file systems are mounted with
C<MS_NOSUID> flag. Also see L<No New Privileges Flag|https://docs.kernel.org/userspace-api/no_new_privs.html>.

Note that this setting only has an effect on the unit\'s processes themselves (or any processes
directly or indirectly forked off them). It has no effect on processes potentially invoked on request
of them through tools such as L<at(1)>,
L<crontab(1)>,
L<systemd-run(1)>, or
arbitrary IPC services.',
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'boolean',
        'write_as' => [
          'no',
          'yes'
        ]
      },
      'SecureBits',
      {
        'description' => 'Controls the secure bits set for the executed process. Takes a space-separated combination of
options from the following list: C<keep-caps>, C<keep-caps-locked>,
C<no-setuid-fixup>, C<no-setuid-fixup-locked>, C<noroot>, and
C<noroot-locked>. This option may appear more than once, in which case the secure bits are
ORed. If the empty string is assigned to this option, the bits are reset to 0. This does not affect commands
prefixed with C<+>. See L<capabilities(7)> for
details.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'SELinuxContext',
      {
        'description' => 'Set the SELinux security context of the executed process. If set, this will override the
automated domain transition. However, the policy still needs to authorize the transition. This directive is
ignored if SELinux is disabled. If prefixed by C<->, failing to set the SELinux
security context will be ignored, but it is still possible that the subsequent
execve() may fail if the policy does not allow the transition for the
non-overridden context. This does not affect commands prefixed with C<+>. See
L<setexeccon(3)>
for details.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'AppArmorProfile',
      {
        'description' => 'Takes a profile name as argument. The process executed by the unit will switch to
this profile when started. Profiles must already be loaded in the kernel, or the unit will fail. If
prefixed by C<->, all errors will be ignored. This setting has no effect if AppArmor
is not enabled. This setting does not affect commands prefixed with C<+>.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'SmackProcessLabel',
      {
        'description' => 'Takes a C<SMACK64> security label as argument. The process executed by the unit
will be started under this label and SMACK will decide whether the process is allowed to run or not, based on
it. The process will continue to run under the label specified here unless the executable has its own
C<SMACK64EXEC> label, in which case the process will transition to run under that label. When not
specified, the label that systemd is running under is used. This directive is ignored if SMACK is
disabled.

The value may be prefixed by C<->, in which case all errors will be ignored. An empty
value may be specified to unset previous assignments. This does not affect commands prefixed with
C<+>.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'LimitCPU',
      {
        'description' => "Set soft and hard limits on various resources for executed processes. See
L<setrlimit(2)> for
details on the process resource limit concept. Process resource limits may be specified in two formats:
either as single value to set a specific soft and hard limit to the same value, or as colon-separated
pair C<soft:hard> to set both limits individually
(e.g. C<LimitAS=4G:16G>).  Use the string C<infinity> to configure no
limit on a specific resource. The multiplicative suffixes K, M, G, T, P and E (to the base 1024) may
be used for resource limits measured in bytes (e.g. C<LimitAS=16G>). For the limits
referring to time values, the usual time units ms, s, min, h and so on may be used (see
L<systemd.time(7)> for
details). Note that if no time unit is specified for C<LimitCPU> the default unit of
seconds is implied, while for C<LimitRTTIME> the default unit of microseconds is
implied. Also, note that the effective granularity of the limits might influence their
enforcement. For example, time limits specified for C<LimitCPU> will be rounded up
implicitly to multiples of 1s. For C<LimitNICE> the value may be specified in two

lib/Config/Model/models/Systemd/Common/Exec.pl  view on Meta::CPAN

normally supported by the per-user instances of the service manager.

This setting is particularly useful in conjunction with
C<RootDirectory>/C<RootImage>, as the need to synchronize the user and group
databases in the root directory and on the host is reduced, as the only users and groups who need to be matched
are C<root>, C<nobody> and the unit\'s own user and group.',
        'replace' => {
          '0' => 'no',
          '1' => 'yes',
          'false' => 'no',
          'true' => 'yes'
        },
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'enum'
      },
      'ProtectHostname',
      {
        'choice' => [
          'no',
          'private',
          'yes'
        ],
        'description' => 'Takes a boolean argument or C<private>. If enabled, sets up a new UTS
namespace for the executed processes. If enabled, a hostname can be optionally specified following a
colon (e.g. C<yes:foo> or C<private:host.example.com>), and the
hostname is set in the new UTS namespace for the unit. If set to a true value, changing hostname or
domainname via sethostname() and setdomainname() system
calls is prevented. If set to C<private>, changing hostname or domainname is allowed
but only affects the unit\'s UTS namespace. Defaults to off.

Note that the implementation of this setting might be impossible (for example if UTS namespaces
are not available), and the unit should be written in a way that does not solely rely on this setting
for security.

Note that when this option is enabled for a service hostname changes no longer propagate from
the system into the service, it is hence not suitable for services that need to take notice of system
hostname changes dynamically.

Note that this option does not prevent changing system hostname via hostnamectl.
However, C<User> and C<Group> may be used to run as an unprivileged user
to disallow changing system hostname. See SetHostname() in
L<org.freedesktop.hostname1(5)>
for more details.',
        'replace' => {
          '0' => 'no',
          '1' => 'yes',
          'false' => 'no',
          'true' => 'yes'
        },
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'enum'
      },
      'ProtectClock',
      {
        'description' => 'Takes a boolean argument. If set, writes to the hardware clock or system clock will
be denied. Defaults to off. Enabling this option removes C<CAP_SYS_TIME> and
C<CAP_WAKE_ALARM> from the capability bounding set for this unit, installs a system
call filter to block calls that can set the clock, and C<DeviceAllow=char-rtc r> is
implied. Note that the system calls are blocked altogether, the filter does not take into account
that some of the calls can be used to read the clock state with some parameter combinations.
Effectively, C</dev/rtc0>, C</dev/rtc1>, etc. are made read-only
to the service. See
L<systemd.resource-control(5)>
for the details about C<DeviceAllow>.

It is recommended to turn this on for most services that do not need modify the clock or check
its state.',
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'boolean',
        'write_as' => [
          'no',
          'yes'
        ]
      },
      'ProtectKernelTunables',
      {
        'description' => 'Takes a boolean argument. If true, kernel variables accessible through
C</proc/sys/>, C</sys/>, C</proc/sysrq-trigger>,
C</proc/latency_stats>, C</proc/acpi>,
C</proc/timer_stats>, C</proc/fs> and C</proc/irq> will
be made read-only and C</proc/kallsyms> as well as C</proc/kcore> will be
inaccessible to all processes of the unit.
Usually, tunable kernel variables should be initialized only at boot-time, for example with the
L<sysctl.d(5)> mechanism. Few
services need to write to these at runtime; it is hence recommended to turn this on for most services. For this
setting the same restrictions regarding mount propagation and privileges apply as for
C<ReadOnlyPaths> and related calls, see above. Defaults to off.
Note that this option does not prevent indirect changes to kernel tunables affected by IPC calls to
other processes. However, C<InaccessiblePaths> may be used to make relevant IPC file system
objects inaccessible. If C<ProtectKernelTunables> is set,
C<MountAPIVFS=yes> is implied.',
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'boolean',
        'write_as' => [
          'no',
          'yes'
        ]
      },
      'ProtectKernelModules',
      {
        'description' => 'Takes a boolean argument. If true, explicit module loading will be denied. This allows
module load and unload operations to be turned off on modular kernels. It is recommended to turn this on for most
services
that do not need special file systems or extra kernel modules to work. Defaults to off. Enabling this option
removes C<CAP_SYS_MODULE> from the capability bounding set for the unit, and installs a
system call filter to block module system calls, also C</usr/lib/modules> is made
inaccessible. For this setting the same restrictions regarding mount propagation and privileges apply as for
C<ReadOnlyPaths> and related calls, see above. Note that limited automatic module loading due
to user configuration or kernel mapping tables might still happen as side effect of requested user operations,
both privileged and unprivileged. To disable module auto-load feature please see
L<sysctl.d(5)>C<kernel.modules_disabled> mechanism and
C</proc/sys/kernel/modules_disabled> documentation.',
        'type' => 'leaf',
        'upstream_default' => 'no',
        'value_type' => 'boolean',
        'write_as' => [
          'no',

lib/Config/Model/models/Systemd/Common/Exec.pl  view on Meta::CPAN

L<mount(2)>
for details on mount propagation, and the three propagation flags in particular.

This setting only controls the final propagation setting in effect on all mount
points of the file system namespace created for each process of this unit. Other file system namespacing unit
settings (see the discussion in C<PrivateMounts> above) will implicitly disable mount and
unmount propagation from the unit's processes towards the host by changing the propagation setting of all mount
points in the unit's file system namespace to C<slave> first. Setting this option to
C<shared> does not reestablish propagation in that case.

If not set \x{2013} but file system namespaces are enabled through another file system namespace unit setting \x{2013}
C<shared> mount propagation is used, but \x{2014} as mentioned \x{2014} as C<slave> is applied
first, propagation from the unit's processes to the host is still turned off.

It is not recommended to use C<private> mount propagation for units, as this means
temporary mounts (such as removable media) of the host will stay mounted and thus indefinitely busy in forked
off processes, as unmount propagation events will not be received by the file system namespace of the unit.

Usually, it is best to leave this setting unmodified, and use higher level file system namespacing
options instead, in particular C<PrivateMounts>, see above.",
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'SystemCallFilter',
      {
        'cargo' => {
          'type' => 'leaf',
          'value_type' => 'uniline'
        },
        'description' => "Takes a space-separated list of system call names or system call groups. If this
setting is used, system calls executed by the unit processes except for the listed ones will result
in the system call being denied (allow-listing). If the first character of the list is
C<~>, the effect is inverted: only the listed system calls will be denied
(deny-listing). This option may be specified more than once, in which case the filter masks are
merged. If the empty string is assigned, the filter is reset, all prior assignments will have no
effect.

Commands prefixed with C<+> are not subject to filtering. The
execve(), exit(), exit_group(),
getrlimit(), rt_sigreturn(),
sigreturn() system calls and the system calls for querying time and sleeping are
implicitly allow-listed and do not need to be listed explicitly.

The default action when a system call is denied is to terminate the processes with a
C<SIGSYS> signal. This can changed using C<SystemCallErrorNumber>,
see below. In addition, deny-listed system calls and system call groups may optionally be suffixed
with a colon (C<:>) and an argument in the same format as
C<SystemCallErrorNumber>, to take this action when the matching system call is made.
This takes precedence over the action specified in C<SystemCallErrorNumber>.

This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp
filtering') and is useful for enforcing a minimal sandboxing environment.

Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn
off alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
option. Specifically, it is recommended to combine this option with
C<SystemCallArchitectures=native> or similar.

Note that strict system call filters may impact execution and error handling code paths of the
service invocation. Specifically, access to the execve() system call is required
for the execution of the service binary \x{2014} if it is blocked service invocation will necessarily fail.
Also, if execution of the service binary fails for some reason (for example: missing service
executable), the error handling logic might require access to an additional set of system calls in
order to process and log this failure correctly. It might be necessary to temporarily disable system
call filters in order to allow debugging of such failures.

If you specify both types of this option (i.e. allow-listing and deny-listing), the first
encountered will take precedence and will dictate the default action (termination or approval of a
system call). Then the next occurrences of this option will add or delete the listed system calls
from the set of the filtered system calls, depending of its type and the default action. (For
example, if you have started with an allow list rule for read() and
write(), and right after it add a deny list rule for write(),
then write() will be removed from the set.)

As the number of possible system calls is large, predefined groups of system calls are
provided. A group starts with C<\@> character, followed by name of the set.
Currently predefined system call setsSetDescription\@aioAsynchronous I/O (L<io_setup(2)>, L<io_submit(2)>, and related
calls)\@basic-ioSystem calls for basic I/O: reading, writing, seeking, file descriptor duplication and closing
(L<read(2)>, L<write(2)>, and related calls)\@chownChanging file ownership (L<chown(2)>, L<fchownat(2)>, and related
calls)\@clockSystem calls for changing the system clock (L<adjtimex(2)>, L<settimeofday(2)>, and related
calls)\@cpu-emulationSystem calls for CPU emulation functionality (L<vm86(2)> and related calls)\@debugDebugging,
performance monitoring and tracing functionality (L<ptrace(2)>, L<perf_event_open(2)> and related
calls)\@file-systemFile system operations: opening, creating files and directories for read and write, renaming and
removing them, reading file properties, or creating hard and symbolic links\@io-eventEvent loop system calls
(L<poll(2)>, L<select(2)>, L<epoll(7)>, L<eventfd(2)> and related calls)\@ipcPipes, SysV IPC, POSIX Message Queues and
other IPC (L<mq_overview(7)>, L<svipc(7)>)\@keyringKernel keyring access (L<keyctl(2)> and related calls)\@memlockLocking
of memory in RAM (L<mlock(2)>, L<mlockall(2)> and related calls)\@moduleLoading and unloading of kernel modules
(L<init_module(2)>, L<delete_module(2)> and related calls)\@mountMounting and unmounting of file systems (L<mount(2)>,
L<chroot(2)>, and related calls)\@network-ioSocket I/O (including local AF_UNIX): L<socket(7)>,
L<unix(7)>\@obsoleteUnusual, obsolete or unimplemented (L<create_module(2)>, L<gtty(2)>, \x{2026})\@pkeySystem calls that deal
with memory protection keys (L<pkeys(7)>)\@privilegedAll system calls which need super-user capabilities
(L<capabilities(7)>)\@processProcess control, execution, namespacing operations (L<clone(2)>, L<kill(2)>,
L<namespaces(7)>, \x{2026})\@raw-ioRaw I/O port access (L<ioperm(2)>, L<iopl(2)>, pciconfig_read(), \x{2026})\@rebootSystem calls for
rebooting and reboot preparation (L<reboot(2)>, kexec(), \x{2026})\@resourcesSystem calls for changing resource limits, memory
and scheduling parameters (L<setrlimit(2)>, L<setpriority(2)>, \x{2026})\@sandboxSystem calls for sandboxing programs
(L<seccomp(2)>, Landlock system calls, \x{2026})\@setuidSystem calls for changing user ID and group ID credentials,
(L<setuid(2)>, L<setgid(2)>, L<setresuid(2)>, \x{2026})\@signalSystem calls for manipulating and handling process signals
(L<signal(2)>, L<sigprocmask(2)>, \x{2026})\@swapSystem calls for enabling/disabling swap devices (L<swapon(2)>,
L<swapoff(2)>)\@syncSynchronizing files and memory to disk (L<fsync(2)>, L<msync(2)>, and related calls)\@system-serviceA
reasonable set of system calls used by common system services, excluding any special purpose calls. This is the
recommended starting point for allow-listing system calls for system services, as it contains what is typically needed
by system services, but excludes overly specific interfaces. For example, the following APIs are excluded: C<\@clock>,
C<\@mount>, C<\@swap>, C<\@reboot>.\@timerSystem calls for scheduling operations by time (L<alarm(2)>, L<timer_create(2)>,
\x{2026})\@knownAll system calls defined by the kernel. This list is defined statically in systemd based on a kernel version
that was available when this systemd version was released. It will become progressively more out-of-date as the kernel
is updated.
Note, that as new system calls are added to the kernel, additional system calls might be added to the groups
above. Contents of the sets may also change between systemd versions. In addition, the list of system calls
depends on the kernel version and architecture for which systemd was compiled. Use
systemd-analyze\x{a0}syscall-filter to list the actual list of system calls in each
filter.

Generally, allow-listing system calls (rather than deny-listing) is the safer mode of
operation. It is recommended to enforce system call allow lists for all long-running system
services. Specifically, the following lines are a relatively safe basic choice for the majority of
system services:

    [Service]
    SystemCallFilter=\@system-service
    SystemCallErrorNumber=EPERM

Note that various kernel system calls are defined redundantly: there are multiple system calls
for executing the same operation. For example, the pidfd_send_signal() system
call may be used to execute operations similar to what can be done with the older
kill() system call, hence blocking the latter without the former only provides
weak protection. Since new system calls are added regularly to the kernel as development progresses,
keeping system call deny lists comprehensive requires constant work. It is thus recommended to use
allow-listing instead, which offers the benefit that new system calls are by default implicitly
blocked until the allow list is updated.

Also note that a number of system calls are required to be accessible for the dynamic linker to
work. The dynamic linker is required for running most regular programs (specifically: all dynamic ELF
binaries, which is how most distributions build packaged programs). This means that blocking these
system calls (which include open(), openat() or
mmap()) will make most programs typically shipped with generic distributions
unusable.

It is recommended to combine the file system namespacing related options with
C<SystemCallFilter=~\@mount>, in order to prohibit the unit's processes to undo the
mappings. Specifically these are the options C<PrivateTmp>,
C<PrivateDevices>, C<ProtectSystem>, C<ProtectHome>,
C<ProtectKernelTunables>, C<ProtectControlGroups>,
C<ProtectKernelLogs>, C<ProtectClock>, C<ReadOnlyPaths>,
C<InaccessiblePaths> and C<ReadWritePaths>.",
        'type' => 'list'
      },
      'SystemCallErrorNumber',
      {
        'description' => 'Takes an C<errno> error number (between 1 and 4095) or errno name
such as C<EPERM>, C<EACCES> or C<EUCLEAN>, to
return when the system call filter configured with C<SystemCallFilter> is triggered,
instead of terminating the process immediately. See L<errno(3)> for a
full list of error codes. When this setting is not used, or when the empty string or the special
setting C<kill> is assigned, the process will be terminated immediately when the
filter is triggered.',
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'SystemCallArchitectures',
      {
        'description' => "Takes a space-separated list of architecture identifiers to include in the system call
filter. The known architecture identifiers are the same as for C<ConditionArchitecture>
described in L<systemd.unit(5)>,
as well as C<x32>, C<mips64-n32>, C<mips64-le-n32>, and
the special identifier C<native>. The special identifier C<native>
implicitly maps to the native architecture of the system (or more precisely: to the architecture the system
manager is compiled for). By default, this option is set to the empty list, i.e. no filtering is applied.

If this setting is used, processes of this unit will only be permitted to call native system calls, and
system calls of the specified architectures. For the purposes of this option, the x32 architecture is treated
as including x86-64 system calls. However, this setting still fulfills its purpose, as explained below, on
x32.

System call filtering is not equally effective on all architectures. For example, on x86
filtering of network socket-related calls is not possible, due to ABI limitations \x{2014} a limitation that x86-64
does not have, however. On systems supporting multiple ABIs at the same time \x{2014} such as x86/x86-64 \x{2014} it is hence
recommended to limit the set of permitted system call architectures so that secondary ABIs may not be used to
circumvent the restrictions applied to the native ABI of the system. In particular, setting
C<SystemCallArchitectures=native> is a good choice for disabling non-native ABIs.

System call architectures may also be restricted system-wide via the
C<SystemCallArchitectures> option in the global configuration. See
L<systemd-system.conf(5)> for
details.",
        'type' => 'leaf',
        'value_type' => 'uniline'
      },
      'SystemCallLog',
      {



( run in 1.767 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )