[SOLVED] Remove Duplicate posts from import feature

Is there a particular way to remove duplicate posts? I recently deployed Mattermost and imported data from slack, and with some bad luck. I accidentally imported twice, so now I have ‘double posts’ in all the channels that were imported.

I’ve looked into the database and can confirm I have two entries under the ‘posts’ table. I haven’t touched the DB yet. I’m using MariaDB. Thoughts?

@ionbasa have you used the team you did the Slack Import into for anything else yet? If not, the easiest fix is probably to just create a fresh team and do the Slack Import again.

If you can’t just start again with a fresh team, because there are other messages you need to keep in that team, then this is quite a lot more involved to fix - you’ll need to modify the database removing one of every pair of posts where the UserId/ChannelId/CreateAt/Message fields are identical - probably something that would require scripting unless it is a very small number of messages.

Would it be possible to delete channels instead and rerun the import feature? I’ve already had ~90 users reset their passwords on their accounts, I’d rather not do it twice.

If you are happy to remove all posts from the currently imported channels, then I think doing a DELETE from Posts where ChannelId='{id}'; would be the easiest way of cleaning it up in the database.

I think you could also just delete the channels in the Mattermost UI and let the importer recreate them, but that will leave all the cruft in the database still, only now it would be marked as “deleted”.

1 Like

I ended up deleting the posts from all the channels and deleting all the imported channels the the database. Re-running the import tool fixed it, no double posts, and users kept their user accounts and passwords. Thanks!

Thanks @ionbasa,

Glad the issue is resolved for you. I’ll close this off for now.

Following steps worked for me.
ssh to your instance
change your local user to postgress (eg. su postgress)
login to postgress shell with psql
run following commands to remove duplicates from posts

DELETE FROM posts AS t1
WHERE EXISTS (
SELECT 1 FROM posts AS t2 
WHERE t2.userid = t1.userid
AND t2.channelid = t1.channelid 
AND t2.createat = t1.createat
AND t2.message = t1.message
AND t2.id > t1.id );

Thanks for taking the time to reply (the thread has already been solved). I actually solved this already by running a similar SQL command, except with syntax for MySQL/MariaDB. Nonetheless, your post is useful for others who may stumble upon this thread.

I have the same issue but I’m running mysql. Could you share the exact commands you used?

Hey, this is what I used with mysql:

DELETE from Posts
where ID in
(SELECT * FROM (
select max(p.Id) as Id from Posts p group by concat(p.ChannelId,p.CreateAt,p.Message) having count(p.Id) > 1
) as x);