Possible memory leak

torbengb · March 5, 2023, 3:54pm

Summary
Mattermost-team-edition v7.8 running on Docker is killed daily by the OS due to out-of-memory issue. It looks like a memory leak, but I have no experience to investigate or prove this.

Steps to reproduce

Unsure how to repro this.
I am running Mattermost-team-edition v7.8.0 with Postgres on Docker on a Linux box. The machine is acting as a server, sitting in my garage, hosting a small number of web applications through Docker.
I can provide the docker-compose files?

Expected behavior
No memory leaks?

Observed behavior

Installation is running fine from a user perspective.
Every now and then, the entire server becomes completely unresponsive: web requests time out, and even running SSH sessions are disconnected.
After 15-30 minutes, the system becomes responsive again.
dmesg -T | grep ill reveals that a Mattermost process was killed by the OS due to out of memory.
top reveals that CPU load is normally less than 0.5 but peaks at over 170(!!) while the memory issue is present.
This kill situation happens at least daily, even when no users are active on the Mattermost installation.

Example output of dmesg -T | grep ill:

[Sa Mär  4 19:35:27 2023] [   3858]  1000  3858   113617      244   106496        6             0 gsd-rfkill
[Sa Mär  4 19:35:27 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/66ecc2904f86807abf8e5a3e5fb9310028a03c19d09f6b8973eb2c6801c943f7,task=mattermost,pid=1664039,uid=2000
[Sa Mär  4 19:35:27 2023] Out of memory: Killed process 1664039 (mattermost) total-vm:119171600kB, anon-rss:4219024kB, file-rss:0kB, shmem-rss:0kB, UID:2000 pgtables:9300kB oom_score_adj:0
[So Mär  5 10:05:14 2023] nxnode.bin invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 10:05:14 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 10:05:14 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 10:05:14 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/6ceafe411e36e8c1ceb396ae0fe9e0772399b4d981f0c3b9f108c4b637300d6f,task=bundle,pid=2706941,uid=998
[So Mär  5 10:05:14 2023] Out of memory: Killed process 2706941 (bundle) total-vm:1425464kB, anon-rss:756816kB, file-rss:0kB, shmem-rss:204kB, UID:998 pgtables:2444kB oom_score_adj:0
[So Mär  5 14:42:06 2023] apport invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 14:42:06 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 14:42:06 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 14:42:06 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/6ceafe411e36e8c1ceb396ae0fe9e0772399b4d981f0c3b9f108c4b637300d6f,task=bundle,pid=3491975,uid=998
[So Mär  5 14:42:06 2023] Out of memory: Killed process 3491975 (bundle) total-vm:1716760kB, anon-rss:931788kB, file-rss:0kB, shmem-rss:1976kB, UID:998 pgtables:3164kB oom_score_adj:0
[So Mär  5 14:42:36 2023] gmain invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[So Mär  5 14:42:36 2023]  oom_kill_process.cold+0xb/0x10
[So Mär  5 14:42:36 2023] [   3858]  1000  3858   113617      185   106496       65             0 gsd-rfkill
[So Mär  5 14:42:36 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/docker/66ecc2904f86807abf8e5a3e5fb9310028a03c19d09f6b8973eb2c6801c943f7,task=mattermost,pid=2051211,uid=2000
[So Mär  5 14:42:36 2023] Out of memory: Killed process 2051211 (mattermost) total-vm:142924812kB, anon-rss:4533848kB, file-rss:0kB, shmem-rss:0kB, UID:2000 pgtables:10168kB oom_score_adj:0

torbengb · March 7, 2023, 5:27pm

Update:

Out of memory still happens, and Mattermost keeps getting killed by the OS. (It automatically starts again.)
I have disabled the entire plugin system, and Out of memory still happens. So the Out of memory is not caused by a plugin.

How can I troubleshoot the cause of this?
How can I fix it?

agriesser · March 8, 2023, 2:12pm

Hi @torbengb ,

could you check the possible solutions in this GitHub issue and let me know if one of them worked for you?

github.com/mattermost/mattermost-server

Mattermost suddenly goes out of memory (OOM) and reboots

opened 08:38AM - 11 Jul 22 UTC

DummyThatMatters

Bug Report/Open

#### Summary Mattermost goes for an unexpected reboot periodically (every 1-2 w…orking days appox.) due to sudden increase in memory consumption. #### Steps to reproduce Mattermost 7.0.1 Team Edition, deployed on a pod in openshift (tryed to allocate from 2.6GB to 5.5GB RAM with the same result). Postgres 14.2 as a DB. Around ~4000 users, ~1200 of them are active ~ 26000 messages per day load. #### Expected behavior Mattermost workes stably without reboots. #### Observed behavior (that appears unintentional) Mattermost goes for a reboot every 1-2 working days. The cause of reboot is OOM. Here is the example of log of the memory consumption: ![image](https://user-images.githubusercontent.com/51426114/178220724-6768c54c-d0a8-401b-af2e-a0027a5fa94c.png) ![image](https://user-images.githubusercontent.com/51426114/178221161-eb008954-3d9c-45f1-af5d-7fbb98a00a9d.png) The same increase of load can be observed on CPU part as well: ![image](https://user-images.githubusercontent.com/51426114/178221465-bef40bc3-c22a-4612-8fc7-0f9cc4e61d32.png) As you can see there is sudden growth of resource utilisation out of nowhere. The logs are relatevely clean and logs raito didnt show any increase of operations number or increased user activity. We have done our small investigation and we think that it can be caused by unproper functioning of getPostsForChannel method. That assumption being done by inspecting mattermost go profile. Here is example of heap tree made via pprof tool: ![profile003](https://user-images.githubusercontent.com/51426114/178222825-11427639-3bfe-4366-a4c1-ba1f23641406.svg) Please help with investigating, we can provide additional info if it needed (if it can be collected via our tools and does not contain corporate data)

torbengb · March 8, 2023, 2:21pm

Thank you, Agriesser!

According to the page you provided, a temporary solution is suggested (here): enable Bleve Indexing in the system.
—I have now done this, and will monitor for any improvement. I will report back in 1-2 days.
Another suggestion (here) is to * Increase pids_limit in docker-compose.yml file for both mattermost and postrgesql*.
—I will not do this yet; first I want to see if (step 1) has any effect.
Third suggestion (here) is to set GRUB_CMDLINE_LINUX_DEFAULT="quiet consoleblank=0 cgroup_disable=memory".
—I will not do this yet; first I want to see if (step 1) or (step 2) have any effect.

torbengb · March 9, 2023, 6:27pm

Update:

enable Bleve Indexing did not resolve the issue. I am leaving it enabled.
I am now applying the next option: raise pids_limit of the mattermost service from 200 to 400.

I will report back again in 1-2 days.

agriesser · March 10, 2023, 4:07am

Thanks for keeping us updated!
Do you maybe also have relevant log lines from systemctl status mattermost or the mattermost.log file when this happens? Can you see any plugins dying in your logs?

torbengb · March 10, 2023, 7:12am

systemctl status mattermost = Unit mattermost.service could not be found. = I guess that’s because my MM installation is a docker container, not a local installation?

mattermost.log = I need some time to go through that log and compare against the times when MM was killed by the OS. Will report back.

any plugins dying = this OOM-kill happens even when the entire plugin system is disabled. It does not appear to be caused by plugins.

agriesser · March 10, 2023, 8:21am

Yes, you’re right - sorry, I confused your setup with one from another user. docker logs would be the command for you.

And great to know that you already ruled out the plugins, thanks for that!

torbengb · March 20, 2023, 8:54pm

Update: I did not have time to investigate further.

Just now, I checked dmesg -T | grep ill and it looks like the issue is no longer present?! Well, that’s good of course, but I don’t know what changed, and that worries me a little.

I will (try to) keep an eye on the matter and update this topic again!

agriesser · March 21, 2023, 4:08am

The last change you made was to raise the pids_limit, right?

torbengb · March 21, 2023, 10:48am

Yes, the pids_limit was raised from 200 to 400, nothing has been changed since then.

As a test, I have now lowered it to 200 again to see that the problem reappears, and then we’ll know for sure.

torbengb · March 24, 2023, 2:16pm

The memory leak has disappeared, even after lowering the pids_limit back to 200 and restarting the stack.

agriesser · March 25, 2023, 5:22am

That’s unfortunate for fixing a potential bug, but good for you If it ever happens again, feel free to get back here and let’s see if we can pin it down in the future then, if not, this issue will remain unresolved (and a mystery…).

TiYa-maker · May 26, 2023, 10:16am

Hello, i have the same problem but i don’t use docker-compose file, so how can i fix it please ?
i use 6.3.8 version
Thank you !

agriesser · June 9, 2023, 4:37am

Hi @TiYa-maker and welcome to the Mattermost forums!

6.3.8 is very old - any chances you could upgrade to the latest release (7.10.2)? This can be done in one step and is usually very easy.

mksa · June 20, 2024, 12:20am

Hi, I’m self-hosting Mattermost using docker compose, image mattermost-enterprise-edition version 9.8.0.

I notice the memory usage creeping up on my VM that is dedicated for this, there’s nothing else running on it.

As you can see, this graph is for a 2 week period. The sudden drop is after a server restart. The number of users is constant. Is this a memory leak issue?

mksa · July 5, 2024, 8:57pm

Some updates here; the behavior (increase in memory consumption) is consistent:

Additionally, I noticed that nginx is taking up way too much memory:

Topic		Replies	Views
Mattermost consumes more memory day by day Troubleshooting	4	2376	September 16, 2016
Problems with Mattermost/Docker installation Troubleshooting	2	3433	November 23, 2021
[SOLVED] Mattermost suddenly crashed and after reboot Web Container doesn't response Troubleshooting	14	5455	March 8, 2023
Mattermost EBS Docker restart = wipe? Troubleshooting	3	1257	November 20, 2017
Mattermost Ubuntu server stops suddenly Troubleshooting	11	605	June 24, 2023

Possible memory leak

Related topics