Thanks, I upped the postgresql connections to see if it allows me to continue the import. This seems to be successful so far. I did see reference to the experimental feature I enabled for Bleve indexing. I have disabled this and it appears to be processing the import now.
So disabling the bleve indexing and email notifications and breaking down the jobs into smaller parts seems to be the thing. The server also seemed to require at minimum 16-32GB of RAM and after turning off the oom-killer at the OS level might have helped, not sure about that one. Still have to try the last and largest import. Will let it run overnight.
Not sure where channel definitions are documented at this detail level, but could this be unsupported? It would be useful to know all supported channel types, maybe I can just assign it a supported channel type.
Going to just change the channel type to (P) since all of the others are of similar type and re-zip the jsonl. Hopefully thatâs the last error. I didnât find any additional matching patterns in the file.
Oddly enough the âchannel_membersâ field does have a value but itâs null:
{"type":"direct_post","direct_post":{"channel_members":null,"user":"first.last","type":null,"message":"This is a dummy message.","props":null,"create_at":1672794825652,"edit_at":null,"flagged_by":null,"reactions":null,"replies":[],"attachments":[]}}
The previous two âdirect_postâ items also have the identical value for âchannel_membersâ so Iâm confused. It also seems odd that there would be a direct post to a channel with no members. Would it make sense to exclude these data since it doesnât seem anyone should have access to it?
How would Mattermost handle this type of import data?
Edit: just decided to drop those posts since it seems the count was low (153) now on to dropping a message that included someone pinging Google. Re-zipping the file each time feels redundant to have the import process unzip it every time.
This could be posts to a channel which does not exist anymore and have been deleted in the source, or a channel which could not be converted properly and is therefore missing in the import. Can you find the message in the source files and if so, in what context?
I suspected the same, it seemed to be an impromptu channel created amongst a group of people that subsequently dropped out afterwards so thus justified my excluding the data associated. I wasnât able to identify the original data in Slack since access to those data would have required advanced knowledge about the platform I donât posses currently.
I dropped the single message that contained the message length error that contained the ping/response and re-ran the import job successfully! So Iâm guessing the characters used within the message somehow are consuming enough characters above the predefined limit mentioned earlier (since the message character length was only 9432) as I did not encounter any additional errors after that point.
Thank you for your invaluable support and communication in order to get this finished! Iâm not really sure which to mark as the solution since it appears to be a team effort.
I was more thinking of a zgrep -ir <pattern> <slackfile.zip> approach But I think itâs really just something along the lines of an export inconsistency which is perfectly fine being handled with a removal of these messages.
Before picking a solution here letâs make sure your system is really running after the import - no need to rush on that
Solution to the problem and burden of a large file import:
You can manually copy the large import zip file directly to â/<install_path_of_mattermost>/data/importâ without having to muck about with importing it via the command line or web interface. It will then show up when running âmmctl import show availableâ
Ensure your server has at least 16GB to 32GB of RAM (this is where mine operated for the size of import even though I increased it far above this amount) as the import process isnât as efficient as it could be yet.
Break it down into smaller chunks to manipulate easily. I started with a 1.1TB file and ended up with smaller 10GB to 600GB files.
Disable Bleve indexing if enabled (I had it enabled)
Identify any âdirect_postâ:{} entries that might contain âchannel_membersâ:null and consider dropping them or adding channel members to import this data as needed (I chose to drop it).
Identify any channels that are of âtypeâ:âGâ and change them to âtypeâ:âPâ or drop them if not needed.
I also disabled the Linux based OOM killer and increased the connection count in postgres.conf above 100 but I canât confirm or deny that actually was helpful in the solution of the initial problem since it seems Bleve indexing was the cause for too many connections to postgres.
@agriesser and @agnivade are due credit for this solution, this is just marked as solution since it was a progressive thing vs. a simple solution. Hopefully this TLDR; will help someone else in the future.