PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Tue Oct 15, 2019 11:27 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Wed Aug 14, 2019 3:34 pm 
Offline

Joined: Wed Aug 14, 2019 3:11 pm
Posts: 5
Hi,

I've been using the provided example http://www.pdfsharp.net/wiki/ExportImages-sample.ashx to extract images from each page in a PDF. I've extended it to also support /FlateDecode (in the case where the colour space is RGB) and this is working fine (although I'd love to know how to handle cmyk since loads of our PDFs use it)
But I have a few PDF documents where some or all of the images are not detected AT ALL (i.e. no /XObject /Image items are detected when processing the PDF page by page, but the images are clearly there if you open in Adobe Reader). If I open the PDF in Notepad++ I can clearly see the /XObject /Image items, so I know they are present in the PDF.

So I approached the problem in a different manner. I used the "Internals" class to access "GetAllObjects()" and read through each object without a care about which page they were on. Code snippet below:

Code:
        // Get a list of all objects
        PdfObject[] arrPDFObjects = objPDFDocument.Internals.GetAllObjects();
        if (arrPDFObjects != null) {
          Console.WriteLine("Number of objects: " + arrPDFObjects.Length);
          foreach (PdfObject objThisPDFObject in arrPDFObjects) {
            PdfReference objThisPdfObjectReference = objThisPDFObject.Reference;
            if (objThisPdfObjectReference != null) {
              PdfDictionary xObject = objThisPdfObjectReference.Value as PdfDictionary;
              // Is external object an image?
              if (xObject == null) {
                // Null value

              } else if (xObject.Elements.GetString("/Subtype") == "/Image") {
                Console.WriteLine("Image found. Id = " + objThisPdfObjectReference.ObjectID);
                // Export the image
                ExportImage(xObject, ref valImageCount);
              }
            }
          }

        } else {
          Console.WriteLine("No objects");
        }



So while I can extract the images, I don't understand why they aren't found when I process the PDF page by page (using the sample on your Wiki). These images are clearly on the page since you can see them in Adobe Reader (i.e. they aren't orphaned objects).

So I guess my question is:
Is there some other method by which an image can be on a page which isn't detected by the provided example and how should I be detecting these images?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 15, 2019 8:56 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 2980
Location: Cologne, Germany
bonds007 wrote:
Is there some other method by which an image can be on a page which isn't detected by the provided example and how should I be detecting these images?
Each page has a list of resources. The sample exports images listed as resources of that page.

IIRC you can also draw XObjects on pages and XObjects can also contain images.
In the case of nested objects you have to search images recursively.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 15, 2019 6:31 pm 
Offline

Joined: Wed Aug 14, 2019 3:11 pm
Posts: 5
Thanks for the information. I will try to discover how a PDF viewer "knows" that these images are on page 1 even though the image isn't listed as a resource of page 1.

Simon


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 420 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group