PDFsharp & MigraDoc Foundation :: View topic - Extract Text with position on page.

PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/

Extract Text with position on page. http://forum.pdfsharp.com/viewtopic.php?f=2&t=3977	Page 1 of 1

Author:	Beast96GT [ Thu May 30, 2019 9:12 pm ]
Post subject:	Extract Text with position on page.
Hello, First let me say thank you for PDFsharp! I'm using the PDFReader to low-level extract the charcters/text, etc, from a PDF. It seems that what the ContentReader.ReadContent() reads in and only has character information. It's very important that I extract (at the least) the Y-value position for the text. So, I'm looking at PDFReader.Open() and I'm looking at the code and trying to step through it to see how the Document/Pages/Elements are put together and where I might do this, but I thought I would ask if there's an easier way to do this or if my approach is wrong. Thank you! Chris

Author:	TH-Soft [ Fri May 31, 2019 5:54 am ]
Post subject:	Re: Extract Text with position on page.
Hi! PDF can be very complicated using indirection, composition, transformation, ... Look at this sample: http://pdfsharp.net/wiki/GetFile.aspx?F ... XForms.png There is no simple general solution to tell which text is drawn where. It'll be simpler if all your documents come from a single application.

Page 1 of 1	All times are UTC
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/