PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

Iterating through a PDF and retrieving image properties
http://forum.pdfsharp.com/viewtopic.php?f=2&t=185
Page 1 of 1

Author:  Vermis [ Mon Jul 02, 2007 5:00 pm ]
Post subject:  Iterating through a PDF and retrieving image properties

The project I'm working on needs to go through a directory of PDFS and spit out information about all of the images in them. The PDFs will all be created from scanned images, so the focus of this is slightly narrow.

I have some proof of concept code working with the limited testing that I've done so far (far from being production code). However, I've only just started working with PDFsharp and this doesn't seem to be very elegant or robust. Is there a better way that I'm overlooking?

Code:
Dim pDoc As PdfSharp.Pdf.PdfDocument
Dim pDict As PdfSharp.Pdf.PdfDictionary
Dim pRef As PdfSharp.Pdf.Advanced.PdfReference
Dim iPage As Integer = 0
Dim iWidth As String = ""
Dim iHeight As String = ""
Dim iColor As String = ""
Dim iBits As String = ""
Dim iFilter As String = ""
Dim tmp As String = ""

' Open the PDF in read-only mode
pDoc = PdfSharp.Pdf.IO.PdfReader.Open("C:\Test Files\Color PDFs\8902-01-0003.pdf", PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)

' Loop through each page, find the image, and report on it
For iPage = 0 To pdoc.Pages.Count - 1
   iWidth = ""
   iHeight = ""
   iColor = ""
   iBits = ""
   iFilter = ""

   ' Does this page have a Resources Element?
   If pDoc.Pages(iPage).Elements.Contains("/Resources") Then
      pDict = pDoc.Pages(iPage).Elements("/Resources")

      ' Does the Resources Element contain an XObject?
      If pDict.Elements.Contains("/XObject") Then
         pDict = pDict.Elements("/XObject")

         ' Does the XObject contain an Im1 image element?
         If pDict.Elements.Contains("/Im1") Then
            pRef = pDict.Elements("/Im1")

            ' Get the dictionary by the reference under Im1
            pDict = pDoc.Internals.GetObject(pRef.ObjectID)

            ' Get image details
            If pDict.Elements.Contains("/Width") Then iWidth = pDict.Elements("/Width").ToString
            If pDict.Elements.Contains("/Height") Then iHeight = pDict.Elements("/Height").ToString
            If pDict.Elements.Contains("/ColorSpace") Then
               iColor = pDict.Elements("/ColorSpace").ToString
               If iColor.Substring(0, 1) = "/" Then iColor = iColor.Substring(1)
            End If
            If pDict.Elements.Contains("/BitsPerComponent") Then
               iBits = pDict.Elements("/BitsPerComponent").ToString
            End If
            If pDict.Elements.Contains("/Filter") Then
               iFilter = pDict.Elements("/Filter").ToString
               If iFilter.Substring(0, 1) = "/" Then iFilter = iFilter.Substring(1)
            End If
         End If

         ' {0} Delim
         ' {1} Filename
         ' {2} Page Number
         ' {3} Page Width (inch)
         ' {4} Page Height (inch)
         ' {5} Page Orientation
         ' {6} Image Dimensions (pixels)
         ' {7} Bits Per Component
         ' {8} Colorspace
         ' {9} Decode Filter
         tmp = String.Format("{1}{0}{2}{0}{3}{0}{4}{0}{5}{0}{6}{0}{7}{0}{8}{0}{9}", _
                                       "|", _
                                       "filename.ext goes here", _
                                       iPage + 1, _
                                       String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Width).Inch), _
                                       String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Height).Inch), _
                                       pDoc.Pages(iPage).Orientation.ToString, _
                                       iWidth & "x" & iHeight, _
                                       iBits, _
                                       iColor, _
                                       iFilter)

         Console.WriteLine(tmp)
      End If
   End If

Next

pdoc.Close()
pDoc = Nothing


The output looks like this:
Code:
filename.ext goes here|1|11.04|8.49|Portrait|3311x2544|1|DeviceGray|CCITTFaxDecode
filename.ext goes here|2|8.49|11.06|Portrait|2544x3315|1|DeviceGray|CCITTFaxDecode
...
filename.ext goes here|39|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode
filename.ext goes here|40|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/