Character corruption of Japanese pdf files

I uploaded several pdf files written in Japanese characters.
Only one of them looks fine on the screen.
The others are displayed with character corruption.

Does anyone have a similar problem?
Is there any method to solve it?

日本語のpdfファイルをアップロードすると多くが文字化けしてしまいました。
しかし文字化けしないものもあります。
このような現象はよく知られているのでしょうか?
また、解決の方法をご存じの方がおられましたらご教示いただけないでしょうか?

Hi @mikken and welcome to the Mattermost Forums!

You mean the mattermost integrated PDF-preview shows the corruption, but when you download the PDF files, you can open them just fine and there’s no corruption anymore? When you open the PDF files with your default browser, can you also see the corruption there?

Hi @agriesser, thank you for your response.

when you download the PDF files, you can open them just fine and there’s no corruption anymore?

Yes, when I open the PDF files with my default browser, there is no corruption.

(I should have said it in my previous message. Sorry.)

Can you maybe share such a PDF with us or create one so we are able to redproduce the problem? The text you posted in your first message includes the substring “pdf” which is probably the corruption you’re talking about but in order to try to reproduce this issue, an example PDF would be awesome (or sample text which might cause this issue).

There are several pdf files in the following web site.

In the first four files,

  • with corruption
    vol.30, pp.(1)-(22), 2016-03-05
    vol.32, pp.(45)-(57), 2018-03-05

  • Without corruption
    vol.33, pp.(23)-(53), 2019-03-05
    vol.29, pp.11-30, 2015-03-05

I’ll be glad if you could give me any suggestion.

Thanks, I can confirm that on my system here.
I picked the first PDF (vol.33, pp.(23)-(53)) and when opening it in the Mattermost preview, this is what it looks like:

Outside of Mattermost, I can also see the headlines:

Thanks for your confirmation.
(Actually I didn’t realize the corruption you mentioned in vol.33, pp.(23)-(53), 2019-03-05 because I could read the main part of the document.
When I uploaded vol.30 and vol.32 to Mattermost, I could not read even the main body of the documents.)

It seems to be a font issue
image
I think the headers are in a font that would need a cmap file, which is not found by pdf.js. From what I read here, it’s a matter of configuration.