URLs that are non-UTF-8 format

This was brought up in a GitHub issue:


Starting a thread here to see if anyone else has hit this issue, and if anyone has ideas on solutions.

There are libraries that can decode EUC-JP, but we don’t have a way to detect when to use that type of decoding.

Hi, @lfbrock.

I think this library will be good.
jschardet https://www.npmjs.com/package/jschardet

Many supported charsets not only Japanese.

But I don’t know whether there is a need to support non-UTF-8 format…
I hope you find that information helpful.

Thanks @terukizm!

I’m not sure if non-UTF-8 format is needed either, but wanted to bring it up in case other users are hitting the issue.

We’ll take a look at that library if we decide to support it in future.

Hi @lfbrock, Thank you for considering my request.

I understand that you are using the decodeURIComponent() in order to detect the URL with the risk of XSS.

It is not an essential support to be able to decode a variety of character set for this matter.

decodeURIComponent() possible that doesn’t guarantee that there is no risk of JavaScript execution.

For example, decodeURIComponent() doesn’t throw exception in the harmful URL of the following:

It is not appropriate to use for the decodeURIComponent () to avoid XSS.

In fact, XSS doesn’t hold if the current implementation of the Mattermost be removed this decodeURIComponent (). marked.js will not be reached harmful string to MattermostMarkdownRenderer#link because it doesn’t determine the string and link items, including such as the javascript:// and <script>. So I submitted the PR.

If MattermostMarkdownRenderer would like to harmful URL invalidated as an independent module, it can be realized by the code, such as:

if (/[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]/.test(href)) {
	return '';	

Repeatedly say that may not be considered to correspond to various character set in MattermostMarkdownRenderer#link. Where it is only necessary sanitization by detecting just harmful URL.

Hi takasibagura,

decodeURIComponent is only part of the process that we’re using to prevent XSS attacks, so we do not expect it to throw an exception in every error case. The examples that you’ve included aren’t a problem because they can’t be used to run arbitrary javascript in a user’s browser.

We’ve discussed this matter internally and while we would like to offer support for non-UTF-8 character sets in the future, we don’t feel that we can support them at this time without potentially reopening the vulnerability that this code fixed.