What should hashtags support?

We’re looking at revising the definition of hashtags in Mattermost, trying to figure out a definition that is:

simple and covers the vast majority of needs.

Thinking right now is anything starting with # containing “letters and numbers” and the characters _, ., - in between “letters and numbers” up to 25 characters would resolve as a hashtag that could be applied to messages.

So test cases could be:

  • “I’m using #hásh-tag1!” autolinks #hásh-tag1
  • “I’m #hash_tag2. You?” autolinks #hash_tag2
  • “What’s a #hash.tag?” autolinks #hash.tag
  • #1, I think you’re right” autolinks #1

Questions:

  1. Any feedback on our initial approach to defining hashtags?

  2. Particularly from the i18n community, what are options for defining “letters and numbers”–currently hashtags have hardcoded support for umlauts, but not accent aigus, etc. we want to replace this with something widely accepted relative to global standards

  3. Any test cases that you feel would cause non-obvious behavior?

Just my thought is, why not accept all as a valid token, except the onces that are know (or expected) to cause problems?
I could assume that characters like , or or \ and ; can cause unhappy side effects. Having a limit on the hash sounds like something usefull