PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sat Apr 27, 2024 10:13 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: Read Text of a PDF-File
PostPosted: Tue Oct 07, 2008 2:36 pm 
Offline

Joined: Tue Oct 07, 2008 2:28 pm
Posts: 1
Hi,

I'm using PDFSharp for a short time. I'm trying to read the whole text of a PDF-file, for example headlines and textbodies. But I didn't find a way to do this.
Actually I tried to use PDFDictionary to navigate in some objects (e. g. "/MediaBox","/xObject") but there was no success.

Can somebody give me an advice? For example what class(es) (and methods) has to be used.

Thanks.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Nov 07, 2008 10:36 am 
Offline

Joined: Tue Oct 14, 2008 1:08 pm
Posts: 8
Location: England
You should probably look at Migradoc to accomplish this. I would imagine the steps you are:

    Load your PDF into a Migradoc Document object.
    You can then iterate through each section within it. (I have not tried importing a prec-reated PDF file into Migradoc so not sure how this works)
    Assuming you get this far you can then iterate through each Element within the section which would look something like:


Code:
List<string> allText= new List<string>();
 foreach (DocumentObject element in Section.Elements)
 {

    if (element is MigraDoc.DocumentObjectModel.Text)
   {

     MigraDoc.DocumentObjectModel.Text textObj =
                                  (MigraDoc.DocumentObjectModel.Text)element;

     allText.Add(textObj.Content);
   }
}


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Mar 17, 2009 4:26 pm 
Offline

Joined: Tue Mar 17, 2009 4:12 pm
Posts: 2
Unfortunately I wasn't able to apply this solution to my context.
So I've implemented another solution that uses a low level parsing of pdf content.

My solution has been posted here:
http://forum.pdfsharp.net/viewtopic.php?p=1603&mforum=pdfsharp#1603

I hope this will help you.
Enjoy it! 8)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 391 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group