Migration from MySQL 8 to Postgres Database

Hi community,
We are trying to migrate our MySQL 8 database to PostgreSQL 15,
Following this doc: Migration guidelines from MySQL to PostgreSQL — Mattermost documentation.

But the migration no data gets migrated and pgloader gives this error.

More details and Steps to Reproduce

We’ve been using mattermost old version - 5.37 and in the processes of upgrading to a newer version.
We are using MySQL 8 version database, as there is mentioned that PostgreSQL is the database of choice for Mattermost,
We are trying to migrate our MySQL 8 database to PostgreSQL 15,
Following this doc: Migration guidelines from MySQL to PostgreSQL — Mattermost documentation

What we’ve done so far to test and try out the process,

  • Created two DB docker containers one for MySQL 8 and for Psql15.
  • Loaded a copy of existing target data to mysql8 container based on mattermost version 6.7 (as already migration done to version 6.7).
  • Created empty database and user on Psql15 following the documentation.
  • Installed go and pgloader on the server, installed required go packages. commands follows,
sudo apt-get install pgloader
mkdir ~/tmp/go
cd ~/tmp/go
wget https://go.dev/dl/go1.21.3.linux-amd64.tar.gz

sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.21.3.linux-amd64.tar.gz

export PATH=$PATH:/usr/local/go/bin

# check
go version
# install packages
go install github.com/mattermost/morph/cmd/morph@v1
go install github.com/mattermost/dbcmp/cmd/dbcmp@latest

# for morph command
export PATH=$PATH:/home/$USER/go/bin

# Clone mattermost
git clone -b release-6.7 https://github.com/mattermost/mattermost.git --depth=1

# morph command
morph apply up --driver postgres --dsn "postgres://<user>:<pswd>@localhost:<docker exposed port>/mattermost?sslmode=disable" --path ./db/migrations/postgres --number -1

[note: the morph command give me some user permission errors first the migrations did not ran, when I used root postgres user instead of mmuser in the DSN it worked]

...
2023/10/26 01:20:41 ==  upgrade_oauth_mattermost_app_id: migrated (0.0060s)  # ===========================================
81 migrations applied.
  • Created the migration.load file as per the doc (I don’t have any expertise in postgres).

[note: the command is pgloader for me not pgLoader correct if it is a typo in the document.
this Line gives me error EXCLUDING TABLE NAMES MATCHING '~<IR_>', '~<focalboard>', 'schema_migrations' because the missing quotes in doc, I’ve added that
Also, ALTER SCHEMA public RENAME TO mattermost; this line gives me permission errors, that I fixed by changing psql user from mmuser to postgres
]

  • Finally the pgloader migration.load > migration.log executed but it finished quickly and the migration.log shows this,

The data not getting transferred, How can I correct this

The mentioned error was because of apt version of pgloader not supporting the MySQL 8 default charset/collation utf8mb4_0900_ai_ci, It was found from this stackoverflow answer. By removing the apt version and installing pgloader by building from source fixed the issue.

sudo apt remove --purge pgloader
git clone https://github.com/dimitri/pgloader.git
cd pgloader/

sudo apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
make save
./build/bin/pgloader --help


export PATH=$PATH:/home/$USER/tmp/pgloader/build/bin

pgloader migration.load > migration.log

But now the migration do start, but pgloader giving server errors Heap exhausted (no more space for allocation).

After referencing the pgloader documentation adjusted the WITH conditions given in the default migration.load file in this Documentation

WITH data only,
     workers = 4, concurrency = 1,
     multiple readers per thread, rows per range = 5000,
     create no tables, create no indexes,
     batch size = 50MB, prefetch rows = 500,
     preserve index names

And the pgloader command to.

pgloader --dynamic-space-size 1564 migration.load > migration.log

(For my 2 Core 4 GB server)

Now the errors gone, and the pgloader starts running, I have the original MySQL database sizes about 6 GB (MySQL data-directory size)…

But after running some time the pgloader CPU usage reaches very high and it stops with error…

Database error: Connection to database server lost.

It is happening after 2GB of data transferred to postgress data directory, I’ve tried tuning the values in the WITH, reducing --dynamic-space-size also even tried increasing the server capacity,
But getting the connection lost error when reaching the 1.8 GB - 2 GB mark.

Any Suggestions for fixing this issue.

1 Like

I’m curious here as I have a similar issue, and in my testing I’ve found that the “connection to server lost” error only shows up when I use morph (as the instructions insist) to put down a schema first.

If I let pgloader (or nmig) auto select a schema based on the tables/indexes it detects during the migration the error never comes. It’s not a fix, mattermost doesn’t seem to be able to use the database in this case, but it seems to confirm that the issue isn’t in either database server. More likely the issue is with the new schema and some data being transferred.

To test this can you try skipping the morph command, and adjusting your pgloader file to remove the options “create no tables, create no indexes”. I suspect like me you’ll see the database migration complete without issue.

1 Like

Crosspost from my similar Github issue you reached out on.

@Akshaychdev](Akshaychdev (Akshay Chandran) · GitHub) Let me know if you’ve had any luck. I found a way around this (no thanks to my weeks long case with MM support) for my instance and I’m happy to share to see if it will work for more folks.

I found that only the posts table was causing crashes, so I did a 2 part move and some data transforms after the fact.

  1. Create a .load file for the posts table migration and run the migration against your instance. This should work without crashing
LOAD DATABASE
FROM mysql://user:PASSWORD@localhost:3306/mattermost
INTO pgsql://user:PASSWORd@localhost:5432/mattermost
				WITH
				  workers = 2,
				  concurrency = 1,
				  preserve index names
				
				SET PostgreSQL PARAMETERS
				  maintenance_work_mem to '512MB',
				  work_mem to '48MB'
				
				CAST type int when (= precision 11) to integer drop typemod,
				     type bigint when (= precision 20) to bigint drop typemod,
				     type text to varchar drop typemod,
				     type json to jsonb drop typemod
				
				SET MySQL PARAMETERS
				  net_read_timeout  = '1500',
				  net_write_timeout = '1500'
				
				INCLUDING ONLY TABLE NAMES MATCHING 'posts'
				
				BEFORE LOAD DO
				  $ ALTER SCHEMA public RENAME TO mattermost; $,
				  $ ALTER TABLE public.posts ALTER COLUMN message   TYPE character varying(65535) COLLATE pg_catalog."default"; $,
				  $ ALTER TABLE public.posts ALTER COLUMN hashtags  TYPE character varying(1000)  COLLATE pg_catalog."default"; $,
				  $ ALTER TABLE public.posts ALTER COLUMN filenames TYPE character varying(4000)  COLLATE pg_catalog."default"; $,
				  $ ALTER TABLE public.posts ALTER COLUMN fileids   TYPE character varying(3000)  COLLATE pg_catalog."default"; $
				
				AFTER LOAD DO
				        $ ALTER SCHEMA mattermost RENAME TO public; $
;
  1. Then update the postgres tables constraint to match the what the morph command sets. I also took some time to compare the schema set by the above to the schema the morph command sets to make sure they match. You can do this and make adjustments, but it will probably work either way
    ALTER TABLE public.posts RENAME CONSTRAINT random name TO posts_pkey;
  2. Use your previous pgloader command to move all other tables besides posts (either with an include only tables or exclude tables command’
  3. Update your matttermost config driver/dsn to the new database
  4. profit?|
1 Like

Hey @gdelia-pm Can you please share the logs from postgres itself? The error from pgloader is not transparent to us to understand the issue.

Ah, I wish i could, but after literally hundreds of attempts, and a month long support case that has gone nowhere I figured I wouldn’t get any help on this. Once I found my suggestion to Akshay to consistently work I focused on reproducing it and not on finding the root cause any more. Thus the dev instances I tested this on have been destroyed and recreated a few times over along with those logs.

That said the logs didn’t give much insight on the issue, both the mysql and postgres logs in debug mode didn’t show any query errors. Instead on both I’d see all of the queries that were run, and eventually well after the pgloader crash on both sides I would see an idle connection get destroyed. I’m pretty confident at this point that pgloader is what was crashing as it tried to move migrated some data in the posts table to the new schema. I may try to reproduce the error again if I have bandwidth in the next 2 days, if I do I’ll share here.

Thanks for the update @gdelia-pm I suspect some of the casting rules may causing issues, not necessarily a trivial error but maybe an edge case on the pgloader. But so far your feedback is very valuable us to raise an eyebrow for the migration process. Thanks for helping us! Feel free to share whatever you find, meanwhile I’ll be deep diving into the Posts table migration in the upcoming days.

For what it’s worth I think the casting rules we directly set may not be the problem. Below is the behavior as I ultimately saw it.

  • Schema applied to posts table. Explicit casting rules set in pgloader config - pgloader crash
  • Schema applied to posts table. No explicit casting rules in pgloader config - pgloader crash
  • No schema applied to posts table. No explicit casting rules set in pgloader config - successful data migration
  • No schema applied to posts table. Explicit casting rules set in pgloader config - successful data migration

For the last two examples we’d have to make schema adjustments after the migration for the mattermost server to successfully read the database on start.

Hi @gdelia-pm, glad that you worked it out. I actually lost track of it, but I restarted the migration from the beginning. Currently, the source MySQL8 DB has the mattermost schema corresponding to version 8.1.
I’m following the steps you mentioned,

  1. I’ve created a new database and user with the postgres docker init, as per the prepare database documentation.

    CREATE DATABASE mattermost;
    CREATE USER mmuser WITH PASSWORD '<MY_PASS>';
    GRANT ALL PRIVILEGES ON DATABASE mattermost to mmuser;
    ALTER DATABASE mattermost OWNER TO mmuser;
    GRANT USAGE, CREATE ON SCHEMA PUBLIC TO mmuser;
    

    Schema right now,

    image

  2. Created new load file mm_posts.load and ran the pgloader command with pgloader mm_posts.load > migration.log, but getting error,

  3. Assuming It is because of the posts tables missing issue, I ve ran the morph migrations again after cloning the mattermost release-8.1.

    After that got the old database connection failed error when running the original migration.load file, @isacikgoz

    And this was error in migration.log.

  4. I’ve now altered the ALTER TABLE public.posts ** => ALTER TABLE mattermost.posts ** in mm_posts.load, rebuild the database. Then run the morph command to build up the post schema. Retried the posts table only migration (mm_posts.load),

    The process exits without error but no data getting migrated, the log out…

Can you suggest what I’m missing? I know it is some simple thing I’ve overlooked. Thanks in advance.

Ah, that was a mistake there, the alter tables should have referred to mattermost.posts not public.posts, sorry about that and good catch.

I think potentially the issue you may be seeing is if your db tables in mysql are case sensitive AND the posts table is capitalized at all. Mine aren’t but very possible yours are (I’ve seen evidence others were as well). Try checking the db for that and updating the load file to point to the correct case of the table name, likely the below.

INCLUDING ONLY TABLE NAMES MATCHING ‘Posts’

Can you both share the charset/collation of your Posts table along with the connection charset/collation that you are using? And I would appreciate the DDL of the Posts table if possible.

@isacikgoz

DDL for Posts table MySQL,

CREATE TABLE `Posts` (
  `Id` varchar(26) NOT NULL,
  `CreateAt` bigint DEFAULT NULL,
  `UpdateAt` bigint DEFAULT NULL,
  `EditAt` bigint DEFAULT NULL,
  `DeleteAt` bigint DEFAULT NULL,
  `IsPinned` tinyint(1) DEFAULT NULL,
  `UserId` varchar(26) DEFAULT NULL,
  `ChannelId` varchar(26) DEFAULT NULL,
  `RootId` varchar(26) DEFAULT NULL,
  `OriginalId` varchar(26) DEFAULT NULL,
  `Message` text,
  `Type` varchar(26) DEFAULT NULL,
  `Props` json DEFAULT NULL,
  `Hashtags` text,
  `Filenames` text,
  `FileIds` text,
  `HasReactions` tinyint(1) DEFAULT NULL,
  `RemoteId` varchar(26) DEFAULT NULL,
  PRIMARY KEY (`Id`),
  KEY `idx_posts_update_at` (`UpdateAt`),
  KEY `idx_posts_create_at` (`CreateAt`),
  KEY `idx_posts_delete_at` (`DeleteAt`),
  KEY `idx_posts_user_id` (`UserId`),
  KEY `idx_posts_is_pinned` (`IsPinned`),
  KEY `idx_posts_channel_id_update_at` (`ChannelId`,`UpdateAt`),
  KEY `idx_posts_channel_id_delete_at_create_at` (`ChannelId`,`DeleteAt`,`CreateAt`),
  KEY `idx_posts_root_id_delete_at` (`RootId`,`DeleteAt`),
  KEY `idx_posts_create_at_id` (`CreateAt`,`Id`),
  KEY `idx_posts_original_id` (`OriginalId`),
  FULLTEXT KEY `idx_posts_message_txt` (`Message`),
  FULLTEXT KEY `idx_posts_hashtags_txt` (`Hashtags`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Connection Charset and Collation -

Postgress DDL , charset and Collation.


The postgres data was taken in the current migration failed state.

Yes, thank you, that was the issue.

Now the pgloader has started, copied some data, and then process completed without any stderr. But Posts not fully migrated, The log shows this error:

ERROR Database error 22021: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY posts, line 1095412

@isacikgoz

I think it’s actually a common postgres migration error, there are data sequences it cannot handle properly, and rely on you to fix before the migration.

Try running something like this on your mysql database. Replace messages with other column names if that isn’t the one with the offending data.
SELECT * FROM Posts WHERE Message LIKE '%\u0000%';
If it comes back with results you could either edit them using a string replace or if you’re lazy like me just delete the posts.
DELETE FROM Posts WHERE Message '%\u0000%';
You can use DESCRIBE Posts; if you need to find the other column names

@Akshaychdev did fixing byte sequences work out?

Once I found my suggestion to Akshay to consistently work I focused on reproducing it and not on finding the root cause any more. Thus the dev instances I tested this on have been destroyed and recreated a few times over along with those logs.

It’s known that mysql-8 and pgloader having issues right now. Did you tried by compiling the tool itself?

I think it’s actually a common postgres migration error, there are data sequences it cannot handle properly, and rely on you to fix before the migration...

Hi Guys,

I have the same error:
> 2024-01-29T20:50:41.219999Z ERROR mysql: 76 fell through ECASE expression.

Can you please share your solution with me?
If possible with a step by step tutorial…I’m a beginner :slight_smile:

Hi@all,

so I have some news. I used newest pgloader soureces and then I was able to copy whole database to postgress via the follwoing pgloader config:

LOAD DATABASE
FROM mysql://mmuser:password@localhost:3306/mattermost
INTO pgsql://mmuser:password@localhost:5432/mattermost

 WITH include drop, create tables, create indexes, reset sequences

 SET maintenance_work_mem to '128MB', work_mem to '12MB', search_path to 'mattermost'

 CAST type datetime to timestamptz drop default drop not null using zero-dates-to-null,
      type date drop not null drop default using zero-dates-to-null

 BEFORE LOAD DO

$$ SET search_path TO mattermost; $$,
$$ DROP SCHEMA IF EXISTS mattermost CASCADE; $$,
$$ CREATE SCHEMA mattermost; $$;

Afterwards I removed by hand schema public and renamed Schema mattermost to public with the following commands:

DROP SCHEMA IF EXISTS public CASCADE;
ALTER SCHEMA mattermost RENAME TO public;

Then I can see old posts and users.

What do you think will it work with that approach? Or do I have to do some more manual work?