(Let me start out by saying that this isn’t quite a troubleshooting request because I’ve already troubleshot it, and I can’t give quite as detailed a response as I might like because – well, frankly, because I’m not breaking my system again to produce the relevant logs, as that would be quite a bit of work. Nonetheless, while I think I can see how it got this way, the issue in question was so weirdly nonintuitive I feel like I ought to write it up for the next person to run into it. Feel free to move it if this isn’t the right place)
Scenario:
Mattermost is running as a container under Kubernetes (configured by the Mattermost operator), using as its back-end a Postgres database running in the same cluster. For configuring the connection to the database, we have a k8s secret defined thus:
---
apiVersion: v1
kind: Secret
metadata:
name: mattermost-db-credentials
namespace: mattermost
stringData:
DB_CONNECTION_STRING: "postgres://mattermost:PASSWORD_GOES_HERE@postgres-mattermost-postgresql.mattermost.svc.cluster.local:5432/mattermost?sslmode=disable"
which is referenced by the Mattermost manifest thus:
database:
external:
secret: mattermost-db-credentials
and so comes through in the final definition of the Mattermost pod thus:
env:
- name: MM_CONFIG
valueFrom:
secretKeyRef:
name: mattermost-db-credentials
key: DB_CONNECTION_STRING
So far, so good. It worked fine like this for a long time.
Until we needed to move that Postgres database. Same data and configuration - running off literally the same back-end storage - so all that would happen is the connection string changes. So all we should need to do is update that in the mattermost-db-credentials secret and restart the Mattermost pod, and all should be okay, yes?
No.
After updating the connection string (and making sure the new pod was picking it up), Mattermost wouldn’t start, and worse, the pod logs kept indicating that Mattermost couldn’t connect to the database using the old connection string, which was nowhere in the visible configuration.
After a little poking at this, it turns out that we would only see that error when Mattermost was properly configured with the new connection string, and not when it was configured with a blatantly erroneous connection string.
So, after taking a moment to say “Wut?”, it occurred to me to fire up my Postgres client and take a look at the configurations table in the Mattermost database, find the latest entry per the createat column, and what do I find:
The old connection string stored in the database! (And yes, changing the connection string in that entry in the new database eliminated the error and let Mattermost start up correctly; apparently Mattermost was connecting to the database at the new location for just long enough to read the latest configuration entry, pull the old connection string out of it, and then fail to connect to it at that old location.)
My working hypothesis here (not having had time to dig through the code to check if I’m right; I welcome corrections from those who know better) is that Mattermost copied the connection settings specified on the pod into the database way back when it was started the first time, and since then, those have been overriding the environment variables the operator sets on the pod, leading to a situation where any changes you make to the configuration on the k8s side end up simply ignored.
Which, well, I can see how it got there, but man, it’s confusing as hell when it actually happens to you.
…this maybe also should be a feature request to let environment variables always unconditionally override the configuration in the database, or at least always in k8s scenarios?