Closes#72
So, #72 is about a segfault in the LDAP outpost, but this is the actual
culprit[0]:
* Both server & worker share the same configuration in this setup.
* Since 2025.8 this means that both try to start a server for metrics at
port 9300 and an HTTP server (in the worker case for healthchecks) at
port 9000.
* On upgrades, migrations are performed. Only the server waited for the
migrations to finish, hence the worker started up earlier. As a
result, it was quicker in binding port 9000 in ONLY this case (and
thus, this was never reproducible on a second attempt!). Now, on port
9000 was NOT the authentik server, but something that returned an
empty response for everything that's not the healthcheck.
* As a result, the LDAP outpost got a response from what it believed was
authentik, but actually `nil, nil` because of the empty response.
Trying to dereference values from that response[1] caused the
segfault.
The fix is pretty easy, just override the listen ports via the
environment. Unfortunately, the docs[2] are apparently not entirely correct[3],
given the Python code it must be LISTEN__LISTEN_HTTP[4]. I added a
test-case to ensure that the config is properly applied.
[0] Reported as https://github.com/goauthentik/authentik/issues/16850
[1] 57e12cef06/internal/outpost/ak/api.go (L95)
[2] https://docs.goauthentik.io/install-config/configuration/#listen-settings
[3] Reported as https://github.com/goauthentik/authentik/issues/16851
[4] 57e12cef06/authentik/lib/config.py (L238)
This changes the "ak" script to contain all properties from the
authentik.service unit except the Exec* and Restart* properties. This allows the
script to work when the user has added additional properties to the unit (e.g.
the `SupplementaryGroups` property to connect to Redis over a Unix socket).
The store is world-readable, so secrets shouldn't end up there in the
first place. On top, `types.path` has the following behavior:
* `toString foo` returns the absolute path
* `${foo}` copies the path silently into the store and returns the
store-path.
This happens without any real feedback, so this can be caused by an
innocent looking change.
To address this problem, `pathsWith` was introduced into <nixpkgs/lib>
which allows absolute paths represented as string, but rejects things
pointing to the store and path literals which may be copied later on.
Was changed within upstream commit abc0c2d2a2a0bfb0214798ed6bca9d59359b39f8.
The sole reason this worked was that `settings.storage.media.file.path`
pointed to `./media`, relative to `/var/lib/authentik`.
Update our config accordingly.
This gives the default value from this module a slightly higher
priority than the upstream module's default, while still allowing users
to simply set `services.postgresql.package` using the default priority.
The change in 8bc790171f introduced
`mkDefault` for the postgresql package.
Unfortunately the upstream package option default is also specified
using `mkDefault` instead of the more appropriate `mkOptionDefault`.
This meant that users with a `system.stateVersion` other than `22.05`,
`22.11` or `23.05` got an evaluation error because there are two
conflicting definitions for the package option.
This was made possible by d85dacb6c2
which allows to directly use `manage.py`. That script is
effectively used whenever the `ak` command is referenced in the docs,
e.g. to set a new password for the superuser or to send a test email.
This needs to run as the same (dynamic) user and with the same env file,
otherwise `manage.py` exits early. To achieve that, I
decided to use `systemd-run(1)` because now the invocation can be
configured the same way as services are.
The new migration in tenant_files.py references a MEDIA_ROOT directory
based on its own path, which in our case is in the read-only /nix/store.
We need it to refer to the actual authentik state directory instead,
which defaults to /var/lib/authentik/media in module.nix
Fixes#15
Before this change it was non-trivial to deploy the ldap outpost without
also activating the main authentik service on the same host. Adding
functionality to provide a separate configuration file for the outpost
service remains an open task.
The media upload feature is build around being deployed in a container
and only enables uploads when `/media` is a mountpoint. This isn't the
case on nixos and as such media uploads are disable.
In order to enable this, we need to patch authentik so that the
`can_save_media` capability is enabled.
I'm occasionally seeing the following error:
Jan 01 22:02:10 auth ldap[151813]: fatal error: concurrent map writes
Jan 01 22:02:10 auth ldap[151813]: fatal error: concurrent map writes
Jan 01 22:02:10 auth ldap[151813]: goroutine 4841 [running]:
Jan 01 22:02:10 auth ldap[151813]: goauthentik.io/api/v3.(*Configuration).AddDefaultHeader(...)
Jan 01 22:02:10 auth ldap[151813]: goauthentik.io/api/v3@v3.2023101.1/configuration.go:120
Jan 01 22:02:10 auth ldap[151813]: goauthentik.io/internal/outpost/ldap/search/direct.(*DirectSearcher).Search(0xc0002ba4f8, 0xc000510dd0)
Jan 01 22:02:10 auth ldap[151813]: goauthentik.io/internal/outpost/ldap/search/direct/direct.go:112 +0x65a
[...]
Jan 01 22:02:10 auth systemd[1]: authentik-ldap.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 01 22:02:10 auth systemd[1]: authentik-ldap.service: Failed with result 'exit-code'.
Obviously, I need to find out what's up there. However, services
shouldn't just die on a crash, but restart in that case. If that happens
too often, StartLimitBurst/StartLimitIntervalSec ensure that the
(re)start attempt is aborted eventually.
This is especially problematic because Nextcloud tries to contact the
LDAP server on every single request for a sync which means that the
entire service is down when such a crash happens.
* switched from flake-utils to flake-parts
* dropped the overlay and instead populate configurable options for all
required authentik components in the module
* `nixosModule.default` is now a top-level output following the flake spec,
instead of the previously incorrect system-specific definition