Websocket and TLS errors

Summary
I see two different errors in my logs, at the moment, I am not sure wether they are related but I wasn’t seeing either before this issue. I upgraded to v7.1.3 and enabled HSTS a week ago and everything seemed fine until yesterday we I noticed Mattermost desktop being unavailable.

Steps to reproduce

  • Mattermost v7.1.3
  • TLS and TLSStrictTransport enabled, TLSMinVer: “1.3”
  • Forward80to443: True (Using 443 instead of vanilla 8065)
  • MySQL
  • Plugins settings{ Enable: “False”, EnableUploads: “False”, EnableHealthChecks: “True” …}

Observed behavior
Mattermost desktop app becomes unavailable on both Windows and macOS (expected since TLS handshake error)

Here are relevant error messages:

Websocket error

{“timestamp”:“2022-10-26 14:53:47.405 -04:00”,“level”:“error”,“msg”:“websocket.NextReader: closing websocket”,“caller”:“app/web_conn.go:830”,“user_id”:“”,“error”:"read tcp :443->:56087: i/o timeout”}

TLS handshake errors

{“timestamp”:“2022-10-26 15:29:48.029 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from :49385: EOF”,“caller”:“log/log.go:195”,“source”:"httpserver”}

{“timestamp”:“2022-10-26 15:34:02.959 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from :65109: read tcp :443->:65109: read: connection reset by peer”,“caller”:“log/log.go:195”,“source”:"httpserver”}

Any input is appreciated! Thanks

Hi Alejandro and welcome to the Mattermost forums!

This message can be ignored, it’s just a debug-like message which has been muted already in 7.4.
Are you sure that your clients support TLS 1.3 since you forced the minimum version to be TLS 1.3? Also, is there a reverse proxy in front of your Mattermost installation?

Hi agriesser and thanks for your reply!

I will ignore the websocket error messages on the logs.

We ask our users(about 35) to use modern browsers that can support TLS 1.3 which is why I cautiously took that choice and haven’t had any issues in the last two years. OTOH, I don’t have a reverse proxy in front of my Mattermost installation and only recently enabled TLSStrictTransport. Side note, I never fully understood how Mattermost was able to serve content on the browser without a web server such as NGINX or Apache.

Would the installation of an NGINX reverse proxy help address this TLS hand shake error or it won’t make any difference?

Mattermost comes with its own webserver integrated, so when you start the Mattermost application, it automatically spawns up a webserver on the configured ports.
Let’s postpone the installation of a reverse proxy in the first place, would be interesting to see what’s going on with your installation.

Are the Mattermost web apps working fine and it’s just the Desktop App that is affected by that? So if your clients try to access https://your-mattermost.domain, can they connect to the server just fine or does this also not work?

I see

This incident has only occurred twice thus far since I upgraded to v7.1.3 and when it happens, Mattermost service is unavailable for a few minutes and then it comes back up. I can’t tell for sure if it affected the web app since most if not 99% of the users access Mattermost via the desktop app where users on both Windows and mac OS noticed the down time.

Let me know how I should proceed. Thanks!

Ah, OK - I assumed that this is a permanent issue we’re working on.
If this just happened twice so far, do you happen to have additional logs when that happened or just the ones you already posted?

I spent the last 3 hours looking through the logs and I noticed that somehow I managed to miss what could presumably be the root cause of this issue. I found the same chain of event repeat during 2 times we experienced this issue, in fact, I found a 3rd instance that went unnoticed because it was outside of normal hours. Here is a summary of what I found on the logs:

These appear first and the message repeats for every user (only showing 3 messages). We’ve agreed to ignore them thus I am including them just to give as much info as possible:

{“timestamp”:“2022-10-25 14:50:00.043 -04:00”,“level”:“error”,“msg”:“websocket.NextReader: closing websocket”,“caller”:“app/web_conn.go:830”,“user_id”:" “,“error”:“read tcp 10.29.19.87:443->66.220.203.181:55854: i/o timeout”}
{“timestamp”:“2022-10-25 14:50:00.044 -04:00”,“level”:“error”,“msg”:“websocket.NextReader: closing websocket”,“caller”:“app/web_conn.go:830”,“user_id”:” “,“error”:“read tcp 10.29.19.87:443->192.168.1.14:65094: i/o timeout”}
{“timestamp”:“2022-10-25 14:50:00.044 -04:00”,“level”:“error”,“msg”:“websocket.NextReader: closing websocket”,“caller”:“app/web_conn.go:830”,“user_id”:” ",“error”:“read tcp 10.29.19.87:443->10.29.19.146:55068: i/o timeout”}

Here is the interesting part, these start to show(multiple times) which matches the time where the service goes down:

{“timestamp”:“2022-10-25 14:50:00.152 -04:00”,“level”:“error”,“msg”:“http2: received GOAWAY [FrameHeader GOAWAY len=20], starting graceful shutdown”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.153 -04:00”,“level”:“error”,“msg”:“http2: received GOAWAY [FrameHeader GOAWAY len=20], starting graceful shutdown”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.155 -04:00”,“level”:“error”,“msg”:“http2: received GOAWAY [FrameHeader GOAWAY len=20], starting graceful shutdown”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.161 -04:00”,“level”:“error”,“msg”:“http2: received GOAWAY [FrameHeader GOAWAY len=20], starting graceful shutdown”,“caller”:“log/log.go:195”,“source”:“httpserver”}

Followed by(also multiple times, using different client’s IPs):

{“timestamp”:“2022-10-25 14:50:00.442 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from clientIP:49336: EOF”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.443 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from clientIP:58159: write tcp Mattermost_IP:443->IP:58159: write: broken pipe”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.443 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from clientIP:58160: write tcp Mattermost_IP:443->clientIP:58160: write: broken pipe”,“caller”:“log/log.go:195”,“source”:“httpserver”}
{“timestamp”:“2022-10-25 14:50:00.444 -04:00”,“level”:“error”,“msg”:“http: TLS handshake error from clientIP:56652: write tcp Mattermost_IP->clientIP:56652: write: broken pipe”,“caller”:“log/log.go:195”,“source”:“httpserver”}

After a few minutes these messages stop appearing in the logs and the service is available again but no other log messages shows that Mattermost server actually stops or re-initializes… in other words, no signs from the logs of the service initializing or stopping after the issue “stops”.

What could possibly trigger these http2 FrameHeader GOAWAY? Are they coming from the outside or from Mattermost itself?

I have many questions but the above-mentioned are probably the most relevant. I tried to make sense of RFC#7540 in hopes of better understanding HTTP2 but I must confess that the language used there is a bit too advanced for my understanding.

It’s been a week and yet no feedback, I am still wondering what could have caused this issue.

Any kind of help is appreciated, thanks

Hi Alejandro,

sorry for my late reply here, I was busy over the last few weeks. 7.5.1 has been released in the meantime which contains a few fixes around websocket handling and logging with regards to unnecessary events. Could you try to upgrade your server to this version so we can see if your specific problem might already be fixed by that?