Who Watches Watchmen? – Integrating Elixir Applications with Systemd

latch · on Jan 18, 2022

Nice post and +1 for having a small "hardening" section.

I wish that every systemd example/sample/template came with _extensive_ hardening, since I find it quite confusing. I've used systemd-analyze security <SERVICE> to try to figure out what was needed. For Elixir, I've come up with:

    UMask=077
    NoNewPrivileges=yes
    PrivateTmp=yes
    ProtectSystem=full
    ProtectHome=yes
    ProtectControlGroups=yes
    ProtectKernelModules=yes
    ProtectKernelTunables=yes
    RestrictNamespaces=yes
    RestrictSUIDSGID=yes
    LockPersonality=yes
    PrivateDevices=yes
    PrivateUsers=yes
    ProtectClock=yes
    ProtectKernelLogs=yes
    RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK

Plus the use of TemporaryFileSystem and BindPaths to limit the file system.

hauleth · on Jan 18, 2022

At the beginning I wanted to add all that information and options, but I thought that it can be overwhelming in this article. I wanted to focus on Erlang <-> systemd communication and basic options.

However it may be nice follow-up article where I will describe full hardening process.

5e92cb50239222b · on Jan 18, 2022

Oh yes, thank you for spreading the good news.

'systemd-analyze security' is a fine thing, just make sure to use the latest version of systemd because it was really buggy in the past. For example, the version that ships in RHEL 8 is so buggy it's practically useless.

I came up with this for most of my services that do require a JIT compiler (so Java, dotnet, etc):

  DynamicUser=yes
  CapabilityBoundingSet=
  DevicePolicy=closed
  InaccessiblePaths=-/usr/bin /usr/sbin /mnt /media /var/www
  LockPersonality=yes
  NoNewPrivileges=yes
  PrivateDevices=yes
  PrivateMounts=yes
  PrivateTmp=yes
  PrivateUsers=yes
  ProtectClock=yes
  ProtectControlGroups=yes
  ProtectHome=yes
  ProtectHostname=yes
  ProtectKernelLogs=yes
  ProtectKernelModules=yes
  ProtectKernelTunables=yes
  ProtectProc=invisible
  ProtectSystem=strict
  RemoveIPC=yes
  RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6
  RestrictNamespaces=yes
  RestrictRealtime=yes
  RestrictSUIDSGID=yes
  SystemCallArchitectures=native
  SystemCallFilter=~@clock @cpu-emulation @privileged @module @raw-io @reboot @mount @obsolete @swap @debug

You might want to throw away InaccessiblePaths if your application calls external binaries. The stuff I typically write shouldn't do it.

Some of these flags are not strictly necessary because they should be enabled by other switches, but I prefer to keep them to make configuration more obvious and to mitigate possible bugs (there were some in the past).

If your application needs to store anything locally, add some combination of these:

  RuntimeDirectory=appname        # adds /var/run/appname
  StateDirectory=appname          # adds /var/lib/appname
  CacheDirectory=appname          # adds /var/cache/appname
  LogsDirectory=appname           # adds /var/log/appname
  ConfigurationDirectory=appname  # adds /etc/appname

and you can read the resulting path in environment variables RUNTIME_DIRECTORY / STATE_DIRECTORY / CACHE_DIRECTORY / LOGS_DIRECTORY / CONFIGURATION_DIRECTORY if your systemd is new enough.

systemd will make sure that your limited user can read and write these paths, including their content.

Add this if your application does not use a JIT compiler:

  MemoryDenyWriteExecute=yes

And this to prevent it from listening on wrong ports in an event of misconfiguration.

  SocketBindDeny=any
  SocketBindAllow=tcp:5000

These firewalling flags can be useful if your service does not do much networking to external APIs:

  IPAddressDeny=any
  IPAddressAllow=localhost
  IPAddressAllow=10.3.42.0/24

the8472 · on Jan 18, 2022

With that many options it seems easy to miss something. A whitelist approach would be preferable. The application failing because you missed something would be more obvious than some subtle security hole.

frankjr · on Jan 18, 2022

There's already a ticket precisely for that but it has been closed in as a duplicate https://github.com/systemd/systemd/issues/20247

acdha · on Jan 19, 2022

This leaves me wishing there was some kind of report-only tooling where you could do something like, say, freeze the system calls used in a test run to reduce trial-and-error, similar to how you can use SELinux with audit2allow to get a decent starting point. Does anything like that already exist?

hauleth · on Jan 18, 2022

Oh, I didn't know about `SocketBindDeny=` and `SocketBindAllow=`. This option may be a little troublesome in case of Distributed Erlang, but in recent versions it can be circumvented. Thanks, I will add it as a better option than adding capabilities.

bytelines · on Jan 17, 2022

> Different tools take different approach to solve that issue there. systemd take approach that is derived from launchd - do not do stuff, that is not needed. It achieved that by merging D-Bus into the systemd itself, and then making all service to be D-Bus daemons (which are started on request), and additionally it provides a bunch of triggers for that daemons. We can trigger on action of other services (obviously), but also on stuff like socket activity, path creation/modification, mounts, connection or disconnection of device, time events, etc.

Does this require a good understanding of D-Bus to understand? Because I got completely lost on this...

LukeShu · on Jan 18, 2022

I'm not sure the author has an understanding of what D-Bus is. The article doesn't really discuss D-Bus at all.

D-Bus is not merged in to systemd. When systemd notices that a service it started is named "dbus", it then registers itself as a D-Bus service; programs (like `systemctl` or `poweroff` then use use D-Bus to send it commands. For comparison, sysvinit accepts commands via a named pipe (`/run/initctl`, or `/dev/initctl` for older versions of sysvinit).

Most services are not made to be D-Bus daemons. A service is only made to be a D-Bus daemon if its unit file says `Type=dbus`; none of the examples in the article say this. You can use D-Bus to ask systemd to start or stop a given service; same as you could use `telinit` to use /run/initctl to tell sysvinit to change the runlevel; this does not make a service started this way a D-Bus service.

The NOTIFY_SOCKET and watchdog functionality that the article discusses... this functionality has nothing to do with D-Bus.

hauleth · on Jan 18, 2022

That section was removed as it was leftover from the early drafts.

bytelines · on Jan 18, 2022

I see! Thank you for taking the time to write that out, much appreciated.

hauleth · on Jan 17, 2022

This article? Not at all. Does you need to understand D-Bus to use systemd? Not at all. It is just implementation detail. I have an idea to leverage that in future to slap distributed service management on top of systemd, but in normal operations you will probably never spot that everything is D-Bus backed.

hauleth · on Jan 18, 2022

I have removed that part as it was accidental leftover from the early drafts that I forgot to remove.

cyberpunk · on Jan 17, 2022

I know it's uncouth to say so, but I really like the design of this blog. Top marks! :)

banana_maker · on Jan 17, 2022

It’s a theme for Hugo.

https://themes.gohugo.io/themes/hugo-theme-terminal/

hauleth · on Jan 17, 2022

Actually I am using Zola, not Hugo, but the theme I am using is modified port of Terminal.

banana_maker · on Jan 18, 2022

Ah. I used to use Zola but I migrated to Hugo myself.

whitepoplar · on Jan 17, 2022

I thought the same. The typography, colors, layout, and restraint are gorgeous.

Bayart · on Jan 18, 2022

I was about to post a comment to the same effect, but it's better to piggy-back on yours and avoid polluting the thread.

I find the design incredibly charming !

pm90 · on Jan 17, 2022

JFYI, with podman [0] you can get all the security benefits mentioned in the article with containers.

This is pretty neat and I love how cleanly the application code reads. I’m curious: is the Erlang VM super fast? I would have expected VM startup time to dominate the overall time to start.

[0]: https://podman.io/

jacquesm · on Jan 17, 2022

The Erlang VM is plenty fast for starting stuff up that is long running, but I wouldn't use if for things that you need to launch 100's of times per second. But that's also not how it is intended to be used. As for the other kind of fast: number crunching performance is not what Erlang is for. It's for long running complex applications with a lot of moving parts, redundancy and extreme reliability.

beedrillzzzzz · on Jan 18, 2022

Numerical Elixir (NX)[1] is improving the number crunching situation!

[1]: https://github.com/elixir-nx/nx/tree/main/nx#readme

hauleth · on Jan 18, 2022

I know about Podman, I just wanted to focus on systemd without additional tooling.

Erlang VM startup is ok, but it is not ultra fast, and it can be easily slowed down with many modules in releases or slow applications. Additionally as it was said - Erlang works best in case of long-running instances where the VM is handling spawning and managing of short lived internal processes.

First draft of this article also included the socket activation and FD passing section, but these were making the article way to long, so I moved them to the Part 2 where I will have more space for them. With socket activation Erlang VM startup time is negligible problem.

jolux · on Jan 18, 2022

Yes it does start fairly quick, nothing like the JVM. It's not nearly as fast while it's running, though. It only recently got a JIT, and currently doesn't do any optimization of the generated code.

brightball · on Jan 17, 2022

It's pretty quick on startup in my experience.

pm90 · on Jan 17, 2022

Is this a bug/typo?

  :systemd.ready(),
      :systemd.set_status(down: [status: "drained"]),
      {Plug.Cowboy.Drainer, refs: :all, shutdown: 10_000},
      :systemd.set_status(down: [status: "draining"])

Shouldn’t the status be set to “draining” first and then, after drain is complete, to “drained”?

hauleth · on Jan 17, 2022

No, because these states are set on shutdown, and processes are killed in reverse order. So first the `draining` message will be set and then `drained`.

erik_seaberg · on Jan 18, 2022

This won’t tell you a host died; a distributed service needs remote monitoring.

Is there a standard protocol for either health probes or watchdog timers? It seems like people aren’t defaulting to SNMP like they used to, and I haven’t heard much about RFC 6241 NETCONF which might be intended to replace it. At work we just probe health with trivial RPCs, but it’d be better for the industry to converge on something.

pm90 · on Jan 18, 2022

If you’re talking about multiple machines it’s probably time to use nomad or k8s.

tormeh · on Jan 18, 2022

I think Elixir/OTP already has mechanisms for multi-host monitoring and automatic restart of Elixir actors using a supervision tree.