Importing 280 GB of Slack data ends up with an error

Sl4uGh7eR · July 30, 2022, 2:44pm

Summary
I’m trying to import full export (with attachments) from Slack which is around 280 GB.

Steps to reproduce
Use mmctl --local import upload ./mattermost-bulk-import.zip

Expected behavior
Upload file for importing

Observed behavior
After using the import command, a few minutes later this error shows in the terminal:
Error: failed to upload data: : Unable to write the file.,

If I try to repeat the command to import the data, there is another (different) error:
Error: failed to upload data: : Failed to upload data. First part must be at least 5242880 Bytes.,

Any help would be appreciated.

agriesser · July 30, 2022, 3:02pm

Hi Sl4uGh7eR and welcome to the Mattermost Forums!

Can you please check the Mattermost server logfiles and post the last few lines here? They should contain relevant information about what’s happening here.
What deployment model (Omnibus, Binary, Docker, Cloud, etc.) are you using for mattermost in what version and what edition?
If there’s an nginx reverse proxy in front of your mattermost application server, please also check the nginx error.log files (I’m pretty sure the uploaded file part is too big).

Best,
Alex

Sl4uGh7eR · July 30, 2022, 3:18pm

Thanks for the reply, I think I’ve probably found the source of the problem.
My Mattermost application is installed on a separate drive where I have 700 GB of free space.

Here is the last part of the log:

{"timestamp":"2022-07-30 14:29:27.281 Z","level":"error","msg":"Unable to write the file.","caller":"web/context.go:105","path":"/api/v4/uploads/5xa8duxmotypfdsojc9kis49ty","request_id":"q1fhybc5mpbafx43s45qwaijbr","ip_addr":"","user_id":"","method":"POST","err_where":"WriteFile","http_code":500,"err_details":"unable write the data in the file /var/opt/mattermost/data/import/5xa8duxmotypfdsojc9kis49ty_mattermost-bulk-import.zip.tmp: write /var/opt/mattermost/data/import/5xa8duxmotypfdsojc9kis49ty_mattermost-bulk-import.zip.tmp: no space left on device"}

/var/opt/mattermost is not on the separate drive but on the root drive, where I have only 15GB of space .
My friend (admin) will take a look at this and I will return with more information soon.

Sl4uGh7eR · July 30, 2022, 8:18pm

Hi Alex, my last response got stopped via Aksimet and I hope this one will not.

First, I was able to go past the error (there was a problem with some drives configuration that my admin get rid of), and uploaded the file, but the import started with another error:

{“timestamp”:“2022-07-30 16:49:19.777 Z”,“level”:“error”,“msg”:“SimpleWorker: Failed to get active user count”,“caller”:“jobs/base_workers.go:83”,“worker”:“ImportProcess”,“job_id”:“n5miibw8tib5zynn145qdy7rpy”,“error”:“BulkImport: Error importing channel. Team with name "MotionVFX" could not be found., resource: Team id: name=MotionVFX”}

I’m sure that is the correct name of the team I’ve used in mmetl and the same I’ve created in Mattermost.

As for the installation, it’s the latest one (7.1.2) on the Google Cloud Platform.
So it’s self-hosted Team Edition aka Starter Enterprise.

agriesser · July 31, 2022, 7:15am

Hi again,

yes, in the initial post one could see that the disk ran full, good to hear, that this is fixed now.
Maybe there’s different casing used for the team names - does the output of mmctl team list also show the team name “MotionVFX” written like this?

Sl4uGh7eR · July 31, 2022, 7:12pm

Hi again Alex,

New day new error - as you’ve replied I’ve realized that Team Name that mmetl requires is in reality a team id which is, on the server side, lowercase. I’ve modified the jsonl file to match the teamid but now another problem.

“error”: “We could not count the users. — BulkImport: Unable to import team membership because no more members are allowed in that team, limit exceeded: what: TeamMember count: 101 metadata: team members limit exceeded”,

In config.json I’ve set the limit to 500 - do you have any more useful tips?

Sl4uGh7eR · July 31, 2022, 7:40pm

After a few tries, the value of 500 appeared in the MM system console and I was able to start importing.
Of course, there is another error, about a missing file in the map:
“error”: “We could not count the users. — BulkImport: Error while processing bulk import attachments., attachment "data/bulk-export-attachments/HERE_PATH_TO_THE_PDF_FILE.pdf" not found in map”,

Is there a chance that slack-advanced-exporter skipped some files? How can I fix this OR better, SKIP all missing files?

agriesser · July 31, 2022, 7:52pm

Yes, just wanted to mention that modifying config.json does not take effect immediately. Some values will only be set after a restart of the server and some can be forced by reloading the config ( mmctl config reload ).

root@host:~# grep MaxUsersPerTeam /opt/mattermost/config/config.json
        "MaxUsersPerTeam": 1001,
root@host:~# /opt/mattermost/bin/mmctl  config get TeamSettings.MaxUsersPerTeam
1000
root@host:~# /opt/mattermost/bin/mmctl  config reload
root@host:~# /opt/mattermost/bin/mmctl  config get TeamSettings.MaxUsersPerTeam
1001

I’m not sure if there’s an option to skip missing files, but can you identify the file with the broken reference? I’m wondering if it’s really not there or if there maybe is some path mismatch or the like.

Update: Just found this issue - sounds a bit like yours.

Sl4uGh7eR · July 31, 2022, 7:55pm

Thanks for clarifying (I’ve rebooted the machine to see the changes).

As for the file, it’s the first attachment in the JSON line file, so I assume that there was a problem with slack-advanced-export.

I’m doing now everything from scratch and before zipping the file for import (after mmetl conversion) I will check if the first attachment in the file is visible in the bulk-export-attachments/ directory.

I will let you know what error will be next (or if the file is NOT in the right place).

Sl4uGh7eR · July 31, 2022, 8:55pm

Just read about the issue mentioned in your update - gonna give it a try, but first, need to download all attachments again (I’ve started from a fresh slack export).

Another (future) question. Can I import again, via mmctl, for example, the last 24 hours of chat history? Or there will be a lot of conflicts/errors (users already created, channels exist, etc)? The current slack export will need like 10 hours to download all attachments + a few hours to convert into MM format + zip the file + import (with a bugfix from the update).
This means that I will have like 10-12 hours of slack chat data that is not in the current export (we wanted to finish it over the weekend but I failed).

agriesser · August 1, 2022, 5:01am

As far as I can see in the slack documentation, you can export dateranges and this is what I did back then when I migrated from Hipchat, because the single file was too large and I wanted to test the delta imports anyways, so what I did was export year 2017, import it and see if it works, then exported year 2018, import it additionally and continued until I reached a pretty current date and then I started to make daily exports and the final export was happening at 00:01 for the day before (which only took a few minutes then, export, download, convert, import) because there wasn’t much data to import anymore and the method was tested a few times already for the imports from the years before.

With the Hipchat export back then, you could choose to exports users and so I did only do that for the first export, not sure how this will look like on slack, but just give it a try and let me know (not using slack here so I do not have any experience with that, sorry), but in the worst case, you will need to skip the users from all further imports and just import the channels.

Sl4uGh7eR · August 1, 2022, 12:20pm

So far so good - 220k messages and attachments already imported and it’s going without any problems for a few hours now. There are still like 1,4M messages more

Thanks for the help Alex! Really appreciate that!

If I face any new problems I will surely ask again.

agriesser · August 1, 2022, 12:49pm

Awesome - fingers crossed!
Did the workaround with the folder location as mentioned previously help you with this problem or did you have to do something else?

Also, are you now running daterange exports? Not sure what the most recent messagedate was when you clicked on the export, if you cannot choose the exact timestamp for subsequent exports, you might have duplicate messages, but I’m not sure if the importer detects that…

Sl4uGh7eR · August 1, 2022, 1:44pm

The MM crashed along with VM after 260k messages, but without any problems, I was able to process the same import job (now in logs I can see only “skipping” for the attachments) after rebooting the machine. Probably messages will not be doubled because for one hour no new post appeared in the statistics.

The workaround with the “data” folder is what caused everything to work.
I’m still trying the entire range export (from 01.01.2016 to 29.07.2022) but if I face any problems again, I have already prepared year-separated export files already converted to MM format (with a workaround).

Fingers crossed! Cheers!

Sl4uGh7eR · August 2, 2022, 2:20pm

We’ve figured that VM was crashing because low on memory (4GB wasn’t enough). After moving to 16 GB, the import went really fast, but after 700k posts, another error came up:

{“timestamp”:“2022-08-02 16:11:12.967 +02:00”,“level”:“error”,“msg”:“SimpleWorker: Failed to get active user count”,“caller”:“jobs/base_workers.go:83”,“worker”:“ImportProcess”,“job_id”:“zekp37qbrirkpykhqbcg54wm1r”,“error”:“importMultiplePostLines: Unable to save the Post., failed to save Post: pq: could not extend file "base/34979/35669": No space left on device”}

Any clues what caused that one?

agriesser · August 2, 2022, 2:48pm

base/34979/35669 is an indicator for a PostgreSQL database path; Is your PostgreSQL instance running on the same system or is it on a different one? Please check the available diskspace on the database server.

Sl4uGh7eR · August 2, 2022, 3:02pm

Hi Alex It’s on the same system, on the same drive as the mattermost server (700 GB in size, around 600 GB free now). I’ve launched another file from the import list (I’ve separated them by years-quarters) and this one goes so far without any problems, I will try to import again the one that caused the error and will let you know if it is still giving the error.

Sl4uGh7eR · August 2, 2022, 3:21pm

The following file was imported without any problems. I’ve restarted the previous one, a few files were skipped (to a moment where DB crashed) and it’s going so far - I love this migration
Please keep your fingers crossed, as only the last three quarters remain

agriesser · August 2, 2022, 4:29pm

I think the importer might use transactions and therefore temporarily locks a lot of additional diskspace until the transaction is finished and everything is being freed again; your PostgreSQL server might also be configured to write WAL files for all the changes and the import will cause a lot of changes… So keep an eye on your diskspace or import smaller chunks (as you’re doing now anyways), so I think this should work out hopefully then with the last few delta imports

Sl4uGh7eR · August 3, 2022, 9:59am

I would like to thank you Alex for all the wisdom you’ve passed to me here Our MM server is up and running with everything imported.

BUT… A new day a new problem This time, regardless of the MaxFileSize setting (2048 MB now), using files larger than 50 MB causes them to hang at 0% with the “Uploading…” status.
Nothing in the logs Have an idea what is happening?

Topic		Replies	Views
Import From Slack: Error Uploading Data Command Line Interface	86	3717	September 22, 2022
Large File import from Slack fails with i/o timeout Troubleshooting	35	1308	March 30, 2023
Large slack import issues on mattermost 5.18.0 Command Line Interface	3	1063	January 7, 2020
Error”: “We could not count the users.“, in slack to mattermost migration Troubleshooting	38	1736	September 23, 2022
Bulk import error from Slack Troubleshooting	2	1135	August 31, 2023

Importing 280 GB of Slack data ends up with an error

Related topics