Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Who Watches Watchmen? – Integrating Elixir Applications with Systemd (hauleth.dev)
188 points by todsacerdoti on Jan 17, 2022 | hide | past | favorite | 30 comments


Nice post and +1 for having a small "hardening" section.

I wish that every systemd example/sample/template came with _extensive_ hardening, since I find it quite confusing. I've used systemd-analyze security <SERVICE> to try to figure out what was needed. For Elixir, I've come up with:

    UMask=077
    NoNewPrivileges=yes
    PrivateTmp=yes
    ProtectSystem=full
    ProtectHome=yes
    ProtectControlGroups=yes
    ProtectKernelModules=yes
    ProtectKernelTunables=yes
    RestrictNamespaces=yes
    RestrictSUIDSGID=yes
    LockPersonality=yes
    PrivateDevices=yes
    PrivateUsers=yes
    ProtectClock=yes
    ProtectKernelLogs=yes
    RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
Plus the use of TemporaryFileSystem and BindPaths to limit the file system.


At the beginning I wanted to add all that information and options, but I thought that it can be overwhelming in this article. I wanted to focus on Erlang <-> systemd communication and basic options.

However it may be nice follow-up article where I will describe full hardening process.


Oh yes, thank you for spreading the good news.

'systemd-analyze security' is a fine thing, just make sure to use the latest version of systemd because it was really buggy in the past. For example, the version that ships in RHEL 8 is so buggy it's practically useless.

I came up with this for most of my services that do require a JIT compiler (so Java, dotnet, etc):

  DynamicUser=yes
  CapabilityBoundingSet=
  DevicePolicy=closed
  InaccessiblePaths=-/usr/bin /usr/sbin /mnt /media /var/www
  LockPersonality=yes
  NoNewPrivileges=yes
  PrivateDevices=yes
  PrivateMounts=yes
  PrivateTmp=yes
  PrivateUsers=yes
  ProtectClock=yes
  ProtectControlGroups=yes
  ProtectHome=yes
  ProtectHostname=yes
  ProtectKernelLogs=yes
  ProtectKernelModules=yes
  ProtectKernelTunables=yes
  ProtectProc=invisible
  ProtectSystem=strict
  RemoveIPC=yes
  RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6
  RestrictNamespaces=yes
  RestrictRealtime=yes
  RestrictSUIDSGID=yes
  SystemCallArchitectures=native
  SystemCallFilter=~@clock @cpu-emulation @privileged @module @raw-io @reboot @mount @obsolete @swap @debug
You might want to throw away InaccessiblePaths if your application calls external binaries. The stuff I typically write shouldn't do it.

Some of these flags are not strictly necessary because they should be enabled by other switches, but I prefer to keep them to make configuration more obvious and to mitigate possible bugs (there were some in the past).

If your application needs to store anything locally, add some combination of these:

  RuntimeDirectory=appname        # adds /var/run/appname
  StateDirectory=appname          # adds /var/lib/appname
  CacheDirectory=appname          # adds /var/cache/appname
  LogsDirectory=appname           # adds /var/log/appname
  ConfigurationDirectory=appname  # adds /etc/appname
and you can read the resulting path in environment variables RUNTIME_DIRECTORY / STATE_DIRECTORY / CACHE_DIRECTORY / LOGS_DIRECTORY / CONFIGURATION_DIRECTORY if your systemd is new enough.

systemd will make sure that your limited user can read and write these paths, including their content.

Add this if your application does not use a JIT compiler:

  MemoryDenyWriteExecute=yes
And this to prevent it from listening on wrong ports in an event of misconfiguration.

  SocketBindDeny=any
  SocketBindAllow=tcp:5000
These firewalling flags can be useful if your service does not do much networking to external APIs:

  IPAddressDeny=any
  IPAddressAllow=localhost
  IPAddressAllow=10.3.42.0/24


With that many options it seems easy to miss something. A whitelist approach would be preferable. The application failing because you missed something would be more obvious than some subtle security hole.


There's already a ticket precisely for that but it has been closed in as a duplicate https://github.com/systemd/systemd/issues/20247


This leaves me wishing there was some kind of report-only tooling where you could do something like, say, freeze the system calls used in a test run to reduce trial-and-error, similar to how you can use SELinux with audit2allow to get a decent starting point. Does anything like that already exist?


Oh, I didn't know about `SocketBindDeny=` and `SocketBindAllow=`. This option may be a little troublesome in case of Distributed Erlang, but in recent versions it can be circumvented. Thanks, I will add it as a better option than adding capabilities.


> Different tools take different approach to solve that issue there. systemd take approach that is derived from launchd - do not do stuff, that is not needed. It achieved that by merging D-Bus into the systemd itself, and then making all service to be D-Bus daemons (which are started on request), and additionally it provides a bunch of triggers for that daemons. We can trigger on action of other services (obviously), but also on stuff like socket activity, path creation/modification, mounts, connection or disconnection of device, time events, etc.

Does this require a good understanding of D-Bus to understand? Because I got completely lost on this...


I'm not sure the author has an understanding of what D-Bus is. The article doesn't really discuss D-Bus at all.

D-Bus is not merged in to systemd. When systemd notices that a service it started is named "dbus", it then registers itself as a D-Bus service; programs (like `systemctl` or `poweroff` then use use D-Bus to send it commands. For comparison, sysvinit accepts commands via a named pipe (`/run/initctl`, or `/dev/initctl` for older versions of sysvinit).

Most services are not made to be D-Bus daemons. A service is only made to be a D-Bus daemon if its unit file says `Type=dbus`; none of the examples in the article say this. You can use D-Bus to ask systemd to start or stop a given service; same as you could use `telinit` to use /run/initctl to tell sysvinit to change the runlevel; this does not make a service started this way a D-Bus service.

The NOTIFY_SOCKET and watchdog functionality that the article discusses... this functionality has nothing to do with D-Bus.


That section was removed as it was leftover from the early drafts.


I see! Thank you for taking the time to write that out, much appreciated.


This article? Not at all. Does you need to understand D-Bus to use systemd? Not at all. It is just implementation detail. I have an idea to leverage that in future to slap distributed service management on top of systemd, but in normal operations you will probably never spot that everything is D-Bus backed.


I have removed that part as it was accidental leftover from the early drafts that I forgot to remove.


I know it's uncouth to say so, but I really like the design of this blog. Top marks! :)



Actually I am using Zola, not Hugo, but the theme I am using is modified port of Terminal.


Ah. I used to use Zola but I migrated to Hugo myself.


I thought the same. The typography, colors, layout, and restraint are gorgeous.


I was about to post a comment to the same effect, but it's better to piggy-back on yours and avoid polluting the thread.

I find the design incredibly charming !


JFYI, with podman [0] you can get all the security benefits mentioned in the article with containers.

This is pretty neat and I love how cleanly the application code reads. I’m curious: is the Erlang VM super fast? I would have expected VM startup time to dominate the overall time to start.

[0]: https://podman.io/


The Erlang VM is plenty fast for starting stuff up that is long running, but I wouldn't use if for things that you need to launch 100's of times per second. But that's also not how it is intended to be used. As for the other kind of fast: number crunching performance is not what Erlang is for. It's for long running complex applications with a lot of moving parts, redundancy and extreme reliability.


Numerical Elixir (NX)[1] is improving the number crunching situation!

[1]: https://github.com/elixir-nx/nx/tree/main/nx#readme


I know about Podman, I just wanted to focus on systemd without additional tooling.

Erlang VM startup is ok, but it is not ultra fast, and it can be easily slowed down with many modules in releases or slow applications. Additionally as it was said - Erlang works best in case of long-running instances where the VM is handling spawning and managing of short lived internal processes.

First draft of this article also included the socket activation and FD passing section, but these were making the article way to long, so I moved them to the Part 2 where I will have more space for them. With socket activation Erlang VM startup time is negligible problem.


Yes it does start fairly quick, nothing like the JVM. It's not nearly as fast while it's running, though. It only recently got a JIT, and currently doesn't do any optimization of the generated code.


It's pretty quick on startup in my experience.


Is this a bug/typo?

  :systemd.ready(),
      :systemd.set_status(down: [status: "drained"]),
      {Plug.Cowboy.Drainer, refs: :all, shutdown: 10_000},
      :systemd.set_status(down: [status: "draining"])


Shouldn’t the status be set to “draining” first and then, after drain is complete, to “drained”?


No, because these states are set on shutdown, and processes are killed in reverse order. So first the `draining` message will be set and then `drained`.


This won’t tell you a host died; a distributed service needs remote monitoring.

Is there a standard protocol for either health probes or watchdog timers? It seems like people aren’t defaulting to SNMP like they used to, and I haven’t heard much about RFC 6241 NETCONF which might be intended to replace it. At work we just probe health with trivial RPCs, but it’d be better for the industry to converge on something.


If you’re talking about multiple machines it’s probably time to use nomad or k8s.


I think Elixir/OTP already has mechanisms for multi-host monitoring and automatic restart of Elixir actors using a supervision tree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: