PDFsharp & MigraDoc Foundation :: View topic

PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/

CMaps for Text Extraction http://forum.pdfsharp.com/viewtopic.php?f=2&t=2906	Page 1 of 1

Author:	cscangarella [ Wed Aug 20, 2014 6:53 pm ]
Post subject:	CMaps for Text Extraction
I've been working on a text extractor for pdfs using the PDFsharp library - first and foremost I'd like to thank everyone who has worked on this library. It's been a ton of help and I would have given up this project a long time ago without it. Things are coming quite well, and for the most part I've finished this task. However, any content that use fonts that require a CMap don't extract correctly (understandably, as their bytes are mapped to unicode values). Are there any PDFsharp classes that can help out with this? I can always go into the ToUnicode stream and parse it out myself, but I don't believe in reinventing the wheel so I figured that I'd ask. I've noticed PdfSharp.Fonts.CMapInfo but am unsure of it's usage.

Author:	Caivs [ Sun Jan 27, 2019 7:46 pm ]
Post subject:	Re: CMaps for Text Extraction
May be something has changed since the original post. Does PDFsharp has any features to parse the /ToUnicode stream and get a character map from it?

Page 1 of 1	All times are UTC
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/