PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

Extract Text with position on page.
http://forum.pdfsharp.com/viewtopic.php?f=2&t=3977
Page 1 of 1

Author:  Beast96GT [ Thu May 30, 2019 9:12 pm ]
Post subject:  Extract Text with position on page.

Hello,

First let me say thank you for PDFsharp!

I'm using the PDFReader to low-level extract the charcters/text, etc, from a PDF. It seems that what the ContentReader.ReadContent() reads in and only has character information. It's very important that I extract (at the least) the Y-value position for the text.

So, I'm looking at PDFReader.Open() and I'm looking at the code and trying to step through it to see how the Document/Pages/Elements are put together and where I might do this, but I thought I would ask if there's an easier way to do this or if my approach is wrong.

Thank you!

Chris

Author:  TH-Soft [ Fri May 31, 2019 5:54 am ]
Post subject:  Re: Extract Text with position on page.

Hi!

PDF can be very complicated using indirection, composition, transformation, ...
Look at this sample:
http://pdfsharp.net/wiki/GetFile.aspx?F ... XForms.png

There is no simple general solution to tell which text is drawn where.
It'll be simpler if all your documents come from a single application.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/