Possible memory leak

Summary
Mattermost-team-edition v7.8 running on Docker is killed daily by the OS due to out-of-memory issue. It looks like a memory leak, but I have no experience to investigate or prove this.

Steps to reproduce

  • Unsure how to repro this.
  • I am running Mattermost-team-edition v7.8.0 with Postgres on Docker on a Linux box. The machine is acting as a server, sitting in my garage, hosting a small number of web applications through Docker.
  • I can provide the docker-compose files?

Expected behavior
No memory leaks?

Observed behavior

  • Installation is running fine from a user perspective.
  • Every now and then, the entire server becomes completely unresponsive: web requests time out, and even running SSH sessions are disconnected.
  • After 15-30 minutes, the system becomes responsive again.
  • dmesg -T | grep ill reveals that a Mattermost process was killed by the OS due to out of memory.
  • top reveals that CPU load is normally less than 0.5 but peaks at over 170(!!) while the memory issue is present.
  • This kill situation happens at least daily, even when no users are active on the Mattermost installation.

Example output of dmesg -T | grep ill:

[Sa Mär  4 19:35:27 2023] [   3858]  1000  3858   113617      244   106496        6             0 gsd-rfkill
[Sa Mär  4 19:35:27 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/66ecc2904f86807abf8e5a3e5fb9310028a03c19d09f6b8973eb2c6801c943f7,task=mattermost,pid=1664039,uid=2000
[Sa Mär  4 19:35:27 2023] Out of memory: Killed process 1664039 (mattermost) total-vm:119171600kB, anon-rss:4219024kB, file-rss:0kB, shmem-rss:0kB, UID:2000 pgtables:9300kB oom_score_adj:0
[So Mär  5 10:05:14 2023] nxnode.bin invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 10:05:14 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 10:05:14 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 10:05:14 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/6ceafe411e36e8c1ceb396ae0fe9e0772399b4d981f0c3b9f108c4b637300d6f,task=bundle,pid=2706941,uid=998
[So Mär  5 10:05:14 2023] Out of memory: Killed process 2706941 (bundle) total-vm:1425464kB, anon-rss:756816kB, file-rss:0kB, shmem-rss:204kB, UID:998 pgtables:2444kB oom_score_adj:0
[So Mär  5 14:42:06 2023] apport invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 14:42:06 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 14:42:06 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 14:42:06 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/6ceafe411e36e8c1ceb396ae0fe9e0772399b4d981f0c3b9f108c4b637300d6f,task=bundle,pid=3491975,uid=998
[So Mär  5 14:42:06 2023] Out of memory: Killed process 3491975 (bundle) total-vm:1716760kB, anon-rss:931788kB, file-rss:0kB, shmem-rss:1976kB, UID:998 pgtables:3164kB oom_score_adj:0
[So Mär  5 14:42:36 2023] gmain invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 14:42:36 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 14:42:36 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 14:42:36 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/66ecc2904f86807abf8e5a3e5fb9310028a03c19d09f6b8973eb2c6801c943f7,task=mattermost,pid=2051211,uid=2000
[So Mär  5 14:42:36 2023] Out of memory: Killed process 2051211 (mattermost) total-vm:142924812kB, anon-rss:4533848kB, file-rss:0kB, shmem-rss:0kB, UID:2000 pgtables:10168kB oom_score_adj:0
1 Like

Update:

  • Out of memory still happens, and Mattermost keeps getting killed by the OS. (It automatically starts again.)
  • I have disabled the entire plugin system, and Out of memory still happens. So the Out of memory is not caused by a plugin.

How can I troubleshoot the cause of this?
How can I fix it?

Hi @torbengb ,

could you check the possible solutions in this GitHub issue and let me know if one of them worked for you?

Thank you, Agriesser!

  1. According to the page you provided, a temporary solution is suggested (here): enable Bleve Indexing in the system.
    —I have now done this, and will monitor for any improvement. I will report back in 1-2 days.
  2. Another suggestion (here) is to * Increase pids_limit in docker-compose.yml file for both mattermost and postrgesql*.
    —I will not do this yet; first I want to see if (step 1) has any effect.
  3. Third suggestion (here) is to set GRUB_CMDLINE_LINUX_DEFAULT="quiet consoleblank=0 cgroup_disable=memory".
    —I will not do this yet; first I want to see if (step 1) or (step 2) have any effect.

Update:

  • enable Bleve Indexing did not resolve the issue. I am leaving it enabled.
  • I am now applying the next option: raise pids_limit of the mattermost service from 200 to 400.

I will report back again in 1-2 days.

Thanks for keeping us updated!
Do you maybe also have relevant log lines from systemctl status mattermost or the mattermost.log file when this happens? Can you see any plugins dying in your logs?

systemctl status mattermost = Unit mattermost.service could not be found. = I guess that’s because my MM installation is a docker container, not a local installation?

mattermost.log = I need some time to go through that log and compare against the times when MM was killed by the OS. Will report back.

any plugins dying = this OOM-kill happens even when the entire plugin system is disabled. It does not appear to be caused by plugins.

Yes, you’re right - sorry, I confused your setup with one from another user. docker logs would be the command for you.

And great to know that you already ruled out the plugins, thanks for that!

Update: I did not have time to investigate further.

Just now, I checked dmesg -T | grep ill and it looks like the issue is no longer present?! Well, that’s good of course, but I don’t know what changed, and that worries me a little.

I will (try to) keep an eye on the matter and update this topic again!

The last change you made was to raise the pids_limit, right?

Yes, the pids_limit was raised from 200 to 400, nothing has been changed since then.

As a test, I have now lowered it to 200 again to see that the problem reappears, and then we’ll know for sure.

The memory leak has disappeared, even after lowering the pids_limit back to 200 and restarting the stack.

:man_shrugging:

That’s unfortunate for fixing a potential bug, but good for you :slight_smile: If it ever happens again, feel free to get back here and let’s see if we can pin it down in the future then, if not, this issue will remain unresolved (and a mystery…).

Hello, i have the same problem but i don’t use docker-compose file, so how can i fix it please ?
i use 6.3.8 version
Thank you !

Hi @TiYa-maker and welcome to the Mattermost forums!

6.3.8 is very old - any chances you could upgrade to the latest release (7.10.2)? This can be done in one step and is usually very easy.

Hi, I’m self-hosting Mattermost using docker compose, image mattermost-enterprise-edition version 9.8.0.

I notice the memory usage creeping up on my VM that is dedicated for this, there’s nothing else running on it.

As you can see, this graph is for a 2 week period. The sudden drop is after a server restart. The number of users is constant. Is this a memory leak issue?

Some updates here; the behavior (increase in memory consumption) is consistent:

Additionally, I noticed that nginx is taking up way too much memory: