My server (mattermost 5.9, E10 licence) has about 500 users and runs on a single 8-core server. The CPU and memory usage of the server have always been pretty low. However, in recent weeks, users have noticed some operations have become slower, with delays of maybe 2-10 seconds on the following:
- loading a saved file / image
- loading messages on channel switching
- uploading an image (can now take 30s where previously it would be just a few seconds)
- loading the list of users in the direct-channel switcher
Just to be specific about this last point, the operations are:
- Shift-Cmd-K for direct message switcher
- Start typing a username
- It takes 3-10 seconds before the list of possible completions appears; late last year I remember this being instantaneous
I know that some of these issues began when I switched from local disk storage to S3 storage. This might actually explain all the slowdowns: for example I guess that the list of users in the direct channel switcher will require loading the users’ icons from storage. I am investigating the S3 configuration and network routing to see if we can improve that. I recently configured the S3 region and endpoint explicitly rather than the default (“Mattermost attempts to get the appropriate region from AWS”) which may help.
I suspect some of the issues relate to the speed of database queries. The database is Amazon RDS. The config items “Maximum Idle Connections” and “Maximum Open Connections” have been set to 10 ever since I set up the server. Is there any guidance on a suitable choice for these? I have tried increasing these to 50 as an experiment, and I notice that the number of active connections has immediately jumped to 23.
Any guidance on tuning this would be welcome!