I spent time building and validating a single data center design that includes:
-
Runbooks covering setup, operations, failover, and troubleshooting for a 3-node HA PostgreSQL cluster
-
repmgr for automatic failover, HAProxy for read/write connection routing, Keepalived for VIP management
-
Terraform to spin up the full environment on AWS for testing and validation
-
Prometheus + Grafana + Loki monitoring stack (Docker Compose, usable anywhere)
-
Failure simulation playbook for testing failover scenarios under load
-
Load-tested against realistic workloads using mattermost-load-test-ng
In addition to the load testing and validation I did, it has been deployed in at least one customer environment.
Check it out! This will hopefully get into the official Mattermost docs soon.
Let me know if you have questions.
There’s also a branch that covers an extension to introduce cross-DC, disaster recovery to the design