PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/ |
|
Extract Text with position on page. http://forum.pdfsharp.com/viewtopic.php?f=2&t=3977 |
Page 1 of 1 |
Author: | Beast96GT [ Thu May 30, 2019 9:12 pm ] |
Post subject: | Extract Text with position on page. |
Hello, First let me say thank you for PDFsharp! I'm using the PDFReader to low-level extract the charcters/text, etc, from a PDF. It seems that what the ContentReader.ReadContent() reads in and only has character information. It's very important that I extract (at the least) the Y-value position for the text. So, I'm looking at PDFReader.Open() and I'm looking at the code and trying to step through it to see how the Document/Pages/Elements are put together and where I might do this, but I thought I would ask if there's an easier way to do this or if my approach is wrong. Thank you! Chris |
Author: | TH-Soft [ Fri May 31, 2019 5:54 am ] |
Post subject: | Re: Extract Text with position on page. |
Hi! PDF can be very complicated using indirection, composition, transformation, ... Look at this sample: http://pdfsharp.net/wiki/GetFile.aspx?F ... XForms.png There is no simple general solution to tell which text is drawn where. It'll be simpler if all your documents come from a single application. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |