|PDFsharp & MigraDoc Foundation
|Extract Text with position on page.
|Page 1 of 1|
|Author:||Beast96GT [ Thu May 30, 2019 9:12 pm ]|
|Post subject:||Extract Text with position on page.|
First let me say thank you for PDFsharp!
I'm using the PDFReader to low-level extract the charcters/text, etc, from a PDF. It seems that what the ContentReader.ReadContent() reads in and only has character information. It's very important that I extract (at the least) the Y-value position for the text.
So, I'm looking at PDFReader.Open() and I'm looking at the code and trying to step through it to see how the Document/Pages/Elements are put together and where I might do this, but I thought I would ask if there's an easier way to do this or if my approach is wrong.
|Author:||TH-Soft [ Fri May 31, 2019 5:54 am ]|
|Post subject:||Re: Extract Text with position on page.|
PDF can be very complicated using indirection, composition, transformation, ...
Look at this sample:
http://pdfsharp.net/wiki/GetFile.aspx?F ... XForms.png
There is no simple general solution to tell which text is drawn where.
It'll be simpler if all your documents come from a single application.
|Page 1 of 1||All times are UTC|
|Powered by phpBB® Forum Software © phpBB Group