Files indexing error while upgrade to 5.35.0

Hello!
I trying upgrade mattermost from 5.34.2 to 5.35.0, but I’m getting error:

{"level":"error","ts":1621062182.626727,"caller":"commands/extract_content.go:84","msg":"Failed to extract file content","error":"failed to save the extracted file content: failed to update FileInfo content with id=3a5on3zgujnnmkkd1h5sk1ktzr: Error 1366: Incorrect string value: '\\xCC\\xE5\\xE4\\xE0\\xEB\\xFC...' for column 'Content' at row 1","errorVerbose":"Error 1366: Incorrect string value: '\\xCC\\xE5\\xE4\\xE0\\xEB\\xFC...' for column 'Content' at row 1\nfailed to update FileInfo content with id=3a5on3zgujnnmkkd1h5sk1ktzr\ngithub.com/mattermost/mattermost-server/v5/store/sqlstore.SqlFileInfoStore.SetContent\n\tgithub.com/mattermost/mattermost-server/v5/store/sqlstore/file_info_store.go:368\ngithub.com/mattermost/mattermost-server/v5/store/retrylayer.(*RetryLayerFileInfoStore).SetContent\n\tgithub.com/mattermost/mattermost-server/v5/store/retrylayer/retrylayer.go:3523\ngithub.com/mattermost/mattermost-server/v5/store/searchlayer.SearchFileInfoStore.SetContent\n\tgithub.com/mattermost/mattermost-server/v5/store/searchlayer/file_info_layer.go:106\ngithub.com/mattermost/mattermost-server/v5/store/timerlayer.(*TimerLayerFileInfoStore).SetContent\n\tgithub.com/mattermost/mattermost-server/v5/store/timerlayer/timerlayer.go:3015\ngithub.com/mattermost/mattermost-server/v5/app.(*App).ExtractContentFromFileInfo\n\tgithub.com/mattermost/mattermost-server/v5/app/file.go:1404\ngithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.extractContentCmdF\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/extract_content.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:852\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:897\ngithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.Run\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/root.go:14\nmain.main\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/main.go:31\nruntime.main\n\truntime/proc.go:204\nruntime.goexit\n\truntime/asm_amd64.s:1374\nfailed to save the extracted file content\ngithub.com/mattermost/mattermost-server/v5/app.(*App).ExtractContentFromFileInfo\n\tgithub.com/mattermost/mattermost-server/v5/app/file.go:1405\ngithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.extractContentCmdF\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/extract_content.go:82\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:852\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:897\ngithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.Run\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/root.go:14\nmain.main\n\tgithub.com/mattermost/mattermost-server/v5/cmd/mattermost/main.go:31\nruntime.main\n\truntime/proc.go:204\nruntime.goexit\n\truntime/asm_amd64.s:1374","fileInfoId":"3a5on3zgujnnmkkd1h5sk1ktzr"}

and mattermost won’t index more and stops. There’s many files and indexing should be long.

@jespino / @eric FYI ^

Is this also related to the documentation updates that are needed [MM-35766] Document File Content Search Details - Mattermost?

I have installed deps like this:

apt-get install poppler-utils wv unrtf tidy
go get github.com/JalfResi/justext

And getting same error.

Hi, @olodar

Can you please run the following SQL command and share the output of it?

SELECT * FROM FileInfo WHERE Id = "3a5on3zgujnnmkkd1h5sk1ktzr";

The error complains of an incorrect string value of Content column for that particular file. So, I’d like to know what is currently being stored there.

Does the error occur once for that file or does it also happen to others as well?

@olodar Our devs will look into it, but for now they suggested that it would be good to know more about the file that is being indexed. They are thinking that the file is not indexable but gives a false positive related to plain text extraction, and the database text field is not supporting the content that we tried to store. Other option is that the encoding of the field in the database may need to be something like utf-8 to work properly.

1 Like

Hello, @ahmaddanial, thanks for your answer!

SELECT * FROM FileInfo WHERE Id = "3a5on3zgujnnmkkd1h5sk1ktzr"; out:

Database changed
mysql> SELECT * FROM FileInfo WHERE Id = "3a5on3zgujnnmkkd1h5sk1ktzr";
+----------------------------+----------------------------+--------+---------------+---------------+----------+----------------------------------------------------------------------------------------------------------------------------------------+---------------+-------------+------------------+-----------+------+---------------------------+-------+--------+-----------------+-------------+---------+----------+
| Id                         | CreatorId                  | PostId | CreateAt      | UpdateAt      | DeleteAt | Path                                                                                                                                   | ThumbnailPath | PreviewPath | Name             | Extension | Size | MimeType                  | Width | Height | HasPreviewImage | MiniPreview | Content | RemoteId |
+----------------------------+----------------------------+--------+---------------+---------------+----------+----------------------------------------------------------------------------------------------------------------------------------------+---------------+-------------+------------------+-----------+------+---------------------------+-------+--------+-----------------+-------------+---------+----------+
| 3a5on3zgujnnmkkd1h5sk1ktzr | n685wqobxt87bn3zjjpp4ch6te |        | 1591632572427 | 1591632572427 |        0 | 20200608/teams/noteam/channels/87444oay7irfjkpshctyhohnhc/users/n685wqobxt87bn3zjjpp4ch6te/3a5on3zgujnnmkkd1h5sk1ktzr/Медаль.txt       |               |             | Медаль.txt       | txt       | 2219 | text/plain; charset=utf-8 |     0 |      0 |               0 | NULL        | NULL    | NULL     |
+----------------------------+----------------------------+--------+---------------+---------------+----------+----------------------------------------------------------------------------------------------------------------------------------------+---------------+-------------+------------------+-----------+------+---------------------------+-------+--------+-----------------+-------------+---------+----------+
1 row in set (0.00 sec)

Maybe this because of Russian symbols in file name or his content? But this file in UTF-8 encoding, so shouldn’t be problems.

@amy.blais Database in UTF-8.

Maybe that is the problem, mysql “UTF-8” is not “UTF-8”, you may need to use utf8mb4 instead of utf8. Can you verify that?

1 Like

Other thing is, it shouldn’t stop ther, it should keep going and keep extracting other files.

Hi, @olodar

We can run the SQL query on your database to verify the current charset and collation set for your Mattermost database:

SELECT @@character_set_database, @@collation_database;

If needed, we can then make the change as suggested by @jespino to verify if that helps. Make sure you backup the database first before performing any updates.

Sorry for long answer.
Selection_093
Already using utf8mb4

it should keep going and keep extracting other files

should it scan images, archives etc?

command

SELECT @@character_set_database, @@collation_database;

output:
SELECT @@character_set_database, @@collation_database;
±-------------------------±---------------------+
| @@character_set_database | @@collation_database |
±-------------------------±---------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
±-------------------------±---------------------+
1 row in set (0.00 sec)