Connection refused

Sometimes, out of nowhere, mattermost server simply loses connection to the database, and the only error message we get is connection refused.

Is it possible that performing a database backup while the server is running can cause this issue?

Hi,

it depends - it could very well be, that the database backup is creating database locks and therefore Mattermost is not able to connect to the database anymore, or that the number of connections fill up on your database server and there’s no free connection left for Mattermost.

What Mattermost server version is this and what’s the database backend (MySQL or PostgreSQL)?
Do you know when your database backups are running and is this connection refused message showing at around the same time of the day when it appears?

Our server version is 7.5.1, next planned update will happen in two weeks, we usually keep it very close to the latest version. Our database is a PostgreSQL instance.

Our database backups run every day at 7 PM and no, I cannot say for sure that it’s during or after a backup that this error starts happening.

I’m keeping an eye on the logs lately and and it’s been working since February 15. What we noticed is that mattermost works fine during the day and when we got back in the next day in the morning it had lost connection. The only job that runs during that period is the database backup.

I’ll provide more information the next time it happens.

Do you have any sort of monitoring that could help us diagnose the issue here? What does your Mattermost deployment look like? Binary? Docker?

1 Like

Our server is deployed directly on the binary, no docker involved whatsoever.

This weekend it happened again, precisely 16 minutes after the backup started, most likely when it ended. As I mentioned earlier, our backup process starts at 7 pm.

This was the first error message.

{“timestamp”:“2023-02-25 19:16:23.356 -03:00”,“level”:“error”,“msg”:“Error occurred getting all pending statuses.”,“caller”:“jobs/jobs_watcher.go:72”,“error”:“failed to find Jobs with status=pending: dial tcp local IP:PORT: connect: connection refused”}

I guess mattermost is not the issue here, since it only required a postgresql service restart to make it work again.

This is our postgres instance version: psql (PostgreSQL) 12.12 (Ubuntu 12.12-0ubuntu0.20.04.1)

Do you think it might be because we’re still at major version 12? I believe it is still compatible, since mattermost only dropped support to version 10 in version v5.30.0.

But I could try upgrading it to version 15…

PostgreSQL 12 should not be an issue and as far as I know, it’s still supported, so let’s just assume it’s not part of the problem here.

Can you see anything in your PostgreSQL server logs with regards to the connection refusals?