Summary
After upgrading Mattermost to version 7.1, the database performance has degraded to the point that searches are no longer functional.
Steps to reproduce
Use Mattermost 7.1, running via Gitlab, running with a t3.xlarge RDS instance in AWS running Postgresql 13.4.
Expected behavior
Mattermost continues to be around as performant as it was in previous versions, and search functions correctly.
Observed behavior
After the update to 7.1, the database performance became a major problem, and a large number of errors and began to show up in the logs. The main effects for users is channels and messages are loading slowly and search is basically non functional, it times out and is unable to get results on known good search terms. Hereâs an example of some of the logs weâre seeing:
{âtimestampâ:â2022-09-07 05:26:05.956 Zâ,âlevelâ:âerrorâ,âmsgâ:âUnable to get post counts.â,âcallerâ:âweb/context.go:105â,âpathâ:â/api/v4/analytics/oldâ,ârequest_idâ:âyesryqp6jf87jpmj4kte7bnhbwâ,âip_addrâ:â127.0.0.1â,âuser_idâ:âprxcr796d78g7mem1h4f9scx7eâ,âmethodâ:âGETâ,âerr_whereâ:âGetAnalyticsâ,âhttp_codeâ:500,âerr_detailsâ:âfailed to count Posts: pq: canceling statement due to user requestâ}
After the problems started we bumped it from a t3.large to an xlarge, but the problems didnât get any better. We decided to give that a shot based on another post here that had similar issues where it boiled down to database performance (here: Unable to get the file count for the channel). Our instance is bigger than theirs was (weâre around 24,000,000 posts), so itâs possible it would benefit from another power bump, but the cpu utilization for our RDS instance seems to point to a massive performance drop with the 7.1 upgrade.
You can pretty clearly see the point where the upgrade to 7.1 occurred on August 31. We went from a fairly consistent utilization to spiking to 100% during business hours. You can also see when we upgraded the instance to the xlarge, and the new cpu âbaselineâ usage dropped to around 10% during evenings and weekends, but the heavy spikes remained. There is another service that uses this RDS instance, but it did not receive any upgrades at the same time as Mattermost.
Any ideas or guidance here would be extremely appreciated.