Title: Unable to Search Thai Language by Individual Words

English search can be used normally (because each word has a space).

But Thai word etc.
Message in Channel: “นานมาแล้ว มีกระต่าย
Searching for “กระต่าย” yields no results, but searching for “มีกระต่าย” does.

This issue may be due to the Thai language not using spaces between words like in English, making it impossible to search by individual words and requiring full sentences for a successful search. However, direct searches in Adminer or phpMyAdmin can find Thai words individually, suggesting that Mattermost might be transforming the search text for some performance optimization.

I propose introducing an option to disable this feature for non-English languages or languages that do not use spaces between words. This would allow us to utilize the search function by individual words. We’re considering using Mattermost as our team’s primary chat platform, but this issue is a significant blocker.

If anyone has suggestions or can offer assistance, I would greatly appreciate it.

Below is my docker-compose.yml file. The lines marked with # are my various unsuccessful attempts.

services:
  mattermost-db:
    container_name: mattermost-db
    image: postgres:11-alpine
    restart: always
    security_opt:
      - no-new-privileges:true
    pids_limit: 100
    read_only: true
    tmpfs:
      - /tmp
      - /var/run/postgresql
    volumes:
      - ${PATH_SSD}/postgresql_data:/var/lib/postgresql/data
    environment:
      - TZ=Asia/Bangkok
      - POSTGRES_DB
      - POSTGRES_USER
      - POSTGRES_PASSWORD_FILE
      #- POSTGRES_INITDB_ARGS=--lc-collate=th_TH.utf8 --lc-ctype=th_TH.utf8
    secrets:
      - postgres_password

#  mattermost-db:
#    container_name: mattermost-db
#    image: mysql:8.0.12
#    restart: always
#    environment:
#      - TZ=Asia/Bangkok
#      - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
#      - MYSQL_DATABASE=${POSTGRES_DB}
#      - MYSQL_USER=${POSTGRES_USER}
#      - MYSQL_PASSWORD=${MYSQL_ROOT_PASSWORD}
#    ports:
#      - "3306:3306"
#    volumes:
#      - ${PATH_SSD}/mysql_data:/var/lib/mysql
#    command: ['--character-set-server=utf8mb4', '--collation-server=utf8mb4_0900_ai_ci']

  mattermost-APP:
    container_name: mattermost-APP
    image: mattermost/${MATTERMOST_IMAGE}:${MATTERMOST_IMAGE_TAG}
    restart: always
    depends_on:
      - mattermost-db
    security_opt:
      - no-new-privileges:true
    pids_limit: 200
    read_only: ${MATTERMOST_CONTAINER_READONLY}
    tmpfs:
      - /tmp
    environment:
      - TZ
      - DOMAIN
      - MM_SERVICESETTINGS_ENABLEUPLOADS
      - MM_SERVICESETTINGS_SITEURL
      - MM_SQLSETTINGS_DRIVERNAME
      - MM_SQLSETTINGS_DATASOURCE
      - MM_BLEVESETTINGS_INDEXDIR

      - MM_EMAILSETTINGS_SMTPSERVER
      - MM_EMAILSETTINGS_SMTPPORT
      - MM_EMAILSETTINGS_CONNECTIONSECURITY
      - MM_EMAILSETTINGS_ENABLESMTPAUTH
      - MM_EMAILSETTINGS_SMTPUSERNAME
      - MM_EMAILSETTINGS_SMTPPASSWORD

You have to set the default_text_search_config to thai. It looks like Postgres doesn’t ship with a Thai dictionary by default. Searching the internet gives: GitHub - zdk/pg-search-thai: (CURRENTLY UNMAINTAINED ) Experimental - PostgreSQL Full Text Search Thai language extension - provides the capability to identify Thai language documents.. Currently, it looks like it’s unmaintained but maybe you can give it a shot and see if it works.

A better solution might be to use Elasticsearch instead if you are on the Enterprise version. Elasticsearch has a thai tokenizer which you can use to power your searches.

I am not sure how you are running those direct searches, but are you using to_tsvector and to_tsquery in the SQL query? Or it’s a normal regex match using %word%? Could you clarify the exact SQL query you are using?

Thank you very much for your response.

I’m attaching an image of a test I conducted using Adminer to search for the word ‘ปากกา’ (pen) within the full message ‘ปากกาสีแดง’ (red pen) as an example. This test utilized a straightforward SQL command:

SELECT * FROM posts WHERE message LIKE '%ปากกา%'

This allows for partial matches of Thai language terms normally. However, when attempting to search in Mattermost, whether using Bleve or not, the term “ปากกา” cannot be found unless searching for the full term “ปากกาสีแดง”, which then yields results.

This led me to wonder if it’s possible for the development team to add a feature that bypasses certain advanced search functionalities that require plugins, and instead directly performs a simple, direct database search. I understand this might result in the loss of certain search functionalities, but some users might prefer it if there are language limitations, as it’s better than not being able to search at all.

Alternatively, there might be other solutions I’m unaware of.

(I’ve tried using - POSTGRES_INITDB_ARGS=–lc-collate=th_TH.utf8 --lc-ctype=th_TH.utf8, but the search still does not yield results.)

Right, so using LIKE '%%' is not a search query. It just matches a pattern. A search query will tokenize the word and search for that word instead.

This led me to wonder if it’s possible for the development team to add a feature that bypasses certain advanced search functionalities that require plugins, and instead directly performs a simple, direct database search.

This wouldn’t be the right solution because it would make search functionality vary from language to language making it inconsistent functionally and also from a performance aspect. You have to realize that searching leverages full text indexes which makes it fast. Without using them, the DB will have considerably higher load for no good reason.

Unfortunately, if you are using the DB for search, then installing the tokenizer is your only way. You can see here our documentation for similar languages without space: Chinese, Japanese and Korean search - Mattermost documentation. Alternatively, you can use our enterprise edition and use Elasticsearch, which has an in-built tokenizer for the Thai language.

Thank you for the advice on setting up Chinese, Japanese, and Korean search configurations. I’ve tried implementing your suggestions, but unfortunately, they do not seem to work with the Thai language.

Your initial recommendation regarding pg-search-thai has been quite helpful, allowing us to effectively search in Thai. However, this solution has inadvertently affected our ability to search in English. Considering I’m not an expert in Postgresql, addressing this issue might take some time as we explore possible solutions. I really appreciate your taking the time to respond to my query.

Most of our communication currently happens on Line, and I’m trying to convince my staff to adopt another chat app specifically for work-related discussions and task tracking. Mattermost is my top choice for this purpose, although convincing my team, which mainly consists of part-time students, to use two different chat apps when we only communicate 1-2 days per week per person about work, is somewhat challenging.

Starting from the pg-search-thai issue to enable simultaneous English language search might be the way forward, or perhaps trialing the Enterprise E0 version to see if Mattermost suits us better is another route we could take.

I am truly grateful for your time and assistance.

I don’t think it’s possible to have tokenizers for 2 languages simultaneously working. The default_text_search_config can only be set to one language at a time. You might have to take a decision on what is the major language you want to support, and accordingly set up the system.

Yes, if you are willing to set up and configure Elasticsearch, it could potentially give better results. Given that Elasticsearch is built specifically for searching.

Happy to help!

1 Like