New Install unable to ping database

For feature requests, please see: Contributing Feature Proposals - Mattermost.

For troubleshooting questions, please post in the following format:

Summary
new installation cannot ping database

Steps to reproduce
My environment:
docker-ce running on Rocky linux 8.7
postgres db running on localhost as a container
mattermost-team edition running as container

Expected behavior
mattermost should be able to connec to db

Observed behavior
Docker log show that the database gets created and says “database system is ready to accept connections”
Docker logs for Mattermost say “Failed to ping DB” and nothing about access issues (permissions) or wrong account name.

What I have tried so far

  • verify listening port for db is 5432 (docker ps -a shows this)
  • compared docker-compose.yml, .env and config.json files with another version that is presently working. (BTW, all these files were copied from the version that is working)

Please note It is not easy for me to readily supply logs & configs. This setup is air gapped (Gov’t).
I can do it but it does take some time/approvals.

I am most baffled by the fact that I copied configs from a known working/good system. The other system works, this one does not. The only difference might be if I have a different image between the two systems. (Even though the configs are the same it is possible I uploaded a different image to my private registries - which is also air gapped. After all, the image name showing in docker ps -a is whatever you set it to be when you tag an image.)

Thanks for your assistance!

Hey Jeremy and welcome to the Mattermost forums!

Can you verify that your docker nameserver is working and that the mattermost container can resolve the configured database hostname? Are you using the docker-compose based setup with the .env files and when copying them over to the new system, did you confirm that the permissions of these files are correct?
In the .env file, you should see the variable MM_SQLSETTINGS_DATASOURCE which defaults to a host called postgres. Your postgresql container needs to support being called by this alias.

To verify that this is working, please login to your running mattermost application container:

root@host# docker ps
CONTAINER ID   IMAGE                                          COMMAND                  CREATED         STATUS                   PORTS                               NAMES
3047989d8523   mattermost/mattermost-enterprise-edition:7.4   "/entrypoint.sh matt…"   2 minutes ago   Up 2 minutes (healthy)   8065/tcp, 8067/tcp, 8074-8075/tcp   mm-740-mattermost-1
5028ca3e028d   postgres:13-alpine                             "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes             5432/tcp                            mm-740-postgres-1

root@host# docker exec -ti 3047989d8523 /bin/sh
$

Once you’re logged in, try to run this command:

curl postgres:5432

If the result is curl: (52) Empty reply from server, then you verified that the mattermost application container is able to reach the database server by the name postgres on the port 5432.
If the result is curl: (7) Failed to connect to postgres port 5432: Connection refused, then this means that name resolution works, but there is an issue with the database port.
If the result is curl: (6) Could not resolve host: postgres, your name resolution is not working and a restart of the docker service might fix this problem. If a restart is not possible, you could also try to use the IP address of the postgresql container temporarily for connecting to it in your .env file.

To answer your questions:
The mmattermost container could not resolve the db hostname. So that is part of the problem.

I am using the docker-compose & .env file setup. I have not deviated from the config in github except for hostnames (as applicable) and image tags. The permissions on the files were set as indicated in the directions. I see the MM_SQLSETTINGS_DATASOURCE you refrenced.

Perhaps my understanding of DNS / name resolution is a little muddy with Docker or this application specficially. I had presumed that because the docker image was local /127.0.0.1 there wouldn’t need to be any DNS entries. Mattermost and the Postgres DB are on the same server.

Once I added a CNAME entry on my DNS server the curl command output changed from (6) to (52). So a little progress on that front. :slight_smile:

I also restarted the docker service after adding the DNS entry. I am still getting “Failed to ping DB” after all of this.

Thoughts?

Docker creates its own network, with a separate IP space (usually something in the RFC1918 range, 172.16, 192.168, etc.) and by inside this network, there’s also a docker immanent DNS resolver which helps with resolving container names to the docker IP addresses. Creating a CNAME record in your DNS server will not help here, the problem seems to be that either the containers are in different networks (you can create multiple independent networks on the same system, they’re identified by names in the docker compose files) or the docker name service is not working which prevents the correct name resolution.
You did run the curl command from within the mattermost application container, not from the outside, right? Because when the message changes to (52), it basically means that the connection to the database is successful (at least on a TCP level).

You can use the following command to get a list of your docker container names (+ aliases) as well as their IP addresses. Here’s an example of my demo setup:

#  docker ps --format "{{.ID}}" | xargs docker inspect | jq '.[].NetworkSettings.Networks'
{
  "mattermost": {
    "IPAMConfig": null,
    "Links": null,
    "Aliases": [
      "mm-740-mattermost-1",
      "mattermost",
      "3047989d8523"
    ],
    "NetworkID": "9a10035544e09289bcf7b29d8690d73ae357c1517ba5749a58cd11ce2bb6e4cc",
    "EndpointID": "16ef5d4d8a6fd7865ffc673dbeb6b1033adc0af87140f886d9a033429bf6ddec",
    "Gateway": "192.168.208.1",
    "IPAddress": "192.168.208.3",
    "IPPrefixLen": 20,
    "IPv6Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "MacAddress": "02:42:c0:a8:d0:03",
    "DriverOpts": null
  }
}
{
  "mattermost": {
    "IPAMConfig": null,
    "Links": null,
    "Aliases": [
      "mm-740-postgres-1",
      "postgres",
      "5028ca3e028d"
    ],
    "NetworkID": "9a10035544e09289bcf7b29d8690d73ae357c1517ba5749a58cd11ce2bb6e4cc",
    "EndpointID": "799d7eca11b7b848e6e330785db1e52f585c9f654fa7193ef5a1aa3717b147c3",
    "Gateway": "192.168.208.1",
    "IPAddress": "192.168.208.2",
    "IPPrefixLen": 20,
    "IPv6Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "MacAddress": "02:42:c0:a8:d0:02",
    "DriverOpts": null
  }
}

My mattermost application container in this scenario has the alias “mattermost” (as well as “mm-740-mattermost-1”) and the database container has the alias “postgres”. You can also see the IP addresses in this output, so for testing purposes you could change the hostname postgres in the MM_SQLSETTINGS_DATASOURCE to the IP address of your postgres container (make sure to only restart the mattermost application container then, not the postgres container) and see if that changes anything.

OK, your reply is very helpful and clarified a few things for me on how docker networking works. I see where you are leading me.

I had a little trouble with the command you provided for me. I get-

-bash jq .[].NetworkSettings.Networks: command not found
xargs: docker: terminated by signal 13

So I tried docker network ls, located the network name and ran docker network inspect mattermost_mattermost. I was at least able to see what the hostname and IP are for the DB. Though, that command does not show the extra info, like your command does I think it gets the job done for testing purpose. (unless there is something specific we need such as aliases) I can at least get the Name and IP address.

I did sustitute the IP address in MM_SQLSETTINGS_DATASOURCE to the IP address of the postgres container.

Presently my settings are:
MM_SQLSETTINGS_DATASOURCE=postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@172.21.0.2:5432/${POSTGRES_DB}?sslmode=disable&connect_timeout=10

No change even after restarting MM container. :slightly_frowning_face:

I am guessing we need to find a way to make your command work so we can discover Aliases or something else?

That’s interesting. At the first glance it looks like the jq binary is not found on your system, but then the error message would be different (tested it by renaming jq to jq2 on my system):

# docker ps --format "{{.ID}}" | xargs docker inspect | jq2 .[].NetworkSettings.Networks
-bash: jq2: command not found
xargs: docker: terminated by signal 13

Are you sure that you did not introduce any hyphens or the like when you tried to type this command into your air-gapped system?
Anyways, you can also run the steps manually. The first part of the command lists all IDs of docker containers (example output on my system):

# docker ps --format "{{.ID}}"
3047989d8523
5028ca3e028d

You take these IDs and run the next command on both of them:

# docker inspect 3047989d8523
[
    {
        "Id": "3047989d852338735236216afd2018387feb613a399581406e66dd395fc9154f",
        "Created": "2022-12-09T06:10:45.03439528Z",
        "Path": "/entrypoint.sh",
        "Args": [
            "mattermost"
        ],
[...]

The output is very long, we’re only interested in the NetworkSettings.Networks subsection, this is what the jq command was trying to extract, but we could also do it with grep, if you have that, so:

# docker inspect 3047989d8523 | grep -A20 \"Networks\"
            "Networks": {
                "mattermost": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "mm-740-mattermost-1",
                        "mattermost",
                        "3047989d8523"
                    ],
                    "NetworkID": "9a10035544e09289bcf7b29d8690d73ae357c1517ba5749a58cd11ce2bb6e4cc",
                    "EndpointID": "16ef5d4d8a6fd7865ffc673dbeb6b1033adc0af87140f886d9a033429bf6ddec",
                    "Gateway": "192.168.208.1",
                    "IPAddress": "192.168.208.3",
                    "IPPrefixLen": 20,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:c0:a8:d0:03",
                    "DriverOpts": null
                }
            }

You will have to do that for all container IDs to get the output we’re looking for.
But since you changed the name to the IP and it still doesn’t work, I’m a bit lost now.
Did you maybe change the POSTGRES_USER and POSTGRES_PASSWORD in your ´.env` file at some point? Let’s see if we can connect to the postgres database from within the postgres container:

Grab the ID:

# docker ps | grep postgres
5028ca3e028d   postgres:13-alpine                             "docker-entrypoint.s…"   25 hours ago   Up 25 hours             5432/tcp                            mm-740-postgres-1

Enter the container:

# docker exec -ti 5028ca3e028d /bin/sh
/ #

Now try to connect to the database using psql and replace the words in <> with the corresponding values from your .env file:

/ # psql -U <POSTGRES_USER> -W <POSTGRES_DB>
Password: <POSTGRES_PASSWORD>
psql (13.7)
Type "help" for help.

mattermost=# 

If that worked and you’re logged in, please verify that you can run a query on the database:

mattermost=# \dt users
        List of relations
 Schema | Name  | Type  | Owner
--------+-------+-------+--------
 public | users | table | mmuser
(1 row)

Type exit two times to exit psql and the container once you’re done.

Thanks for the reply. I’ll be able to test your directions on Dec 12th when I am back at work. I don’t have access to the system today.

Did you maybe change the POSTGRES_USER and POSTGRES_PASSWORD in your ´.env` file at some point?

I did vary from the github example. However, both systems have the same username and password so that shouldn’t be an issue. Also, I can login to both DBs without issues.

When I compared my known good / working system against my new install I found that \dt command will display the list of relations on the good system.
The new / bad system says it did not find any relations.
So obviously there are some issues with the DB.

To circle back on the question of permissions. I checked the file path where the db gets created. On both systems the path ./volumes/db/var/lib/postgresql/data folder is owned by postgres:postgres and is chmod 700.
One difference between the systems is that one system has a password set for the postgres user and the user can login. The non-working system also has a postgres user but that system has /etc/passwd /sbin/nologin defined for that user.
As a test, I deleted the postgres user and recreated it so that I could login. This made no difference.

Right now, the difference that I am seeing is that something is not quite right with the new DB. I am looking through the code to see if I can spot what creates the DB, tables, relationships, etc. and figure out why no relationships exist. Any thoughts?

Ok, I found my issue and I’m wearing a little egg on my face. I had forgotten to add my postgres user to the docker group. Once I did that, everything worked! :smiley:

Now I am going to figure out if I can run this in rootless mode…

Sorry for my late reply but great to hear that you figured it out. The default documentation puts all files to the same user account (ID 2000), even the postgres files, so that’s the big difference here with your setup. Using system accounts (IDs < 1000) is protected and that’s why you needed to add the user to your docker group. Anyways, great to hear that it’s working now and unfortunately I’ve yet to see someone successfully run it rootless, there have been several topics in here, noone made it work AFAIK. If you manage to get this up and running in rootless mode, I’d be grateful if you could share your config then :slight_smile:

I think this line should be

docker ps --format "{{.ID}}" | xargs docker inspect | jq -C '.[].NetworkSettings.Networks'

in zsh.

and the result should be like this:

...
{
  "mattermost_default": {
    "IPAMConfig": null,
    "Links": null,
    "Aliases": [
      "postgres",
      "59825d206240"
    ],
    "NetworkID": "2584858613fe393a67245be3c78d4f1266e165d9c474fbb76bbd7666e7351271",
    "EndpointID": "017b18e7b7e3bdbadf2fd73478ca826f776ae6edb0c549b831ae52cfee011873",
    "Gateway": "172.21.0.1",
    "IPAddress": "172.21.0.2",
    "IPPrefixLen": 16,
    "IPv6Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "MacAddress": "02:42:ac:15:00:02",
    "DriverOpts": null
  }
}
...

Hi @charlie and welcome to the Mattermost forums!

I’m not sure I can follow you - I cannot really see a difference between our two commands. Can you elaborate on that?

Hi, @agriesser . Thanks again for helping me in peer-to-peer-help channel.

The difference is that you need to add ' ' around the .[].NetworkSettings.Networks, otherwise jq can’t take that string as an arg.

Not here on my system - it works exactly like this. Maybe your shell behaves different with this argument? Bash does not have issues with that, but yes, adding the quotation marks does no harm, so I’ve edited my post.

Aha, yes. I’m using zsh. When I enter bash, the command can be executed without ' '.
TIL! Thanks!

TIL too, thanks - will be more careful with the quotes in the instructions in the future :slight_smile: