PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 4:02 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Fri Sep 08, 2017 3:07 am 
Offline

Joined: Fri Sep 08, 2017 2:50 am
Posts: 2
The method that CLexer uses to handle hexadecimal strings (in CLexer.ScanHexadecimalString()) leads to problems with CMap's.

In one PDF file that unfortunately I cannot upload due to copyright restrictions, I have the token: <0030> Tj after a Tf token loads a CMap'd font. In this font's CMap, 0030 is mapped to 004D, the capital letter 'M' (Which is what Adobe displays in this place). However, the way that the CLexer is scanning the token, it is converting '0030' into two hexadecimals, '00' and '30', so the resulting operand is incorrectly filled with '\00' (A null, followed by a zero).

I am not sure of a proper workaround for this at the moment, I am considering importing the PDFSharp source to my project so that I can rewrite the entire function to just assign the literal <0030> to the operand, and then handling Hexadecimal strings in my own code. I just wanted to point out the possible failure in this usage case.

Verified that this function has not changed in the latest 1.50 beta 4, from the build I'm using.

I'm not fully certain whether to classify this is a bug. I think it's pretty complex - whether a hexadecimal value in a HexadecimalString is 2 or 4 digits depends on the font being used. I guess if I can find a way to determine the font is using 4-digit hex, then I can combine the \00 back into 0030 and look it up. So this might not be a bug as much as it is just a complicated thing I'm dealing with that I got tripped up on at this stage...


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 08, 2017 6:46 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
Rayanth wrote:
I'm not fully certain whether to classify this is a bug.
One thing is the PDF reference.
Does the PDF file (which we cannot check) comply with the PDF reference? Maybe there should be a prefix like FEFF to switch to double-byte mode or something else ...

If this is a common problem, then it should be easy to find a public PDF that allows to replicate the issue.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 08, 2017 9:26 pm 
Offline

Joined: Fri Sep 08, 2017 2:50 am
Posts: 2
It does comply with 1.6 as far as I can tell, and after further research I've decided this is not a bug so much as a thing to watch for when dealing with multi-byte fonts. Since PDFSharp doesn't appear to have a handle for reading Type0 CIDFonts from a PDF file, only for creating them, I will just have to keep this in mind when I write my in font handler, and in my interpreter.

I do realize PDFSharp is not currently intended to read files in depth, but so far it's my only viable option and I've come a long way with using it for my purpose. I just got hung up on this particular type of font.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 53 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group