The project I'm working on needs to go through a directory of PDFS and spit out information about all of the images in them. The PDFs will all be created from scanned images, so the focus of this is slightly narrow.
I have some proof of concept code working with the limited testing that I've done so far (far from being production code). However, I've only just started working with PDFsharp and this doesn't seem to be very elegant or robust. Is there a better way that I'm overlooking?
Code:
Dim pDoc As PdfSharp.Pdf.PdfDocument
Dim pDict As PdfSharp.Pdf.PdfDictionary
Dim pRef As PdfSharp.Pdf.Advanced.PdfReference
Dim iPage As Integer = 0
Dim iWidth As String = ""
Dim iHeight As String = ""
Dim iColor As String = ""
Dim iBits As String = ""
Dim iFilter As String = ""
Dim tmp As String = ""
' Open the PDF in read-only mode
pDoc = PdfSharp.Pdf.IO.PdfReader.Open("C:\Test Files\Color PDFs\8902-01-0003.pdf", PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)
' Loop through each page, find the image, and report on it
For iPage = 0 To pdoc.Pages.Count - 1
iWidth = ""
iHeight = ""
iColor = ""
iBits = ""
iFilter = ""
' Does this page have a Resources Element?
If pDoc.Pages(iPage).Elements.Contains("/Resources") Then
pDict = pDoc.Pages(iPage).Elements("/Resources")
' Does the Resources Element contain an XObject?
If pDict.Elements.Contains("/XObject") Then
pDict = pDict.Elements("/XObject")
' Does the XObject contain an Im1 image element?
If pDict.Elements.Contains("/Im1") Then
pRef = pDict.Elements("/Im1")
' Get the dictionary by the reference under Im1
pDict = pDoc.Internals.GetObject(pRef.ObjectID)
' Get image details
If pDict.Elements.Contains("/Width") Then iWidth = pDict.Elements("/Width").ToString
If pDict.Elements.Contains("/Height") Then iHeight = pDict.Elements("/Height").ToString
If pDict.Elements.Contains("/ColorSpace") Then
iColor = pDict.Elements("/ColorSpace").ToString
If iColor.Substring(0, 1) = "/" Then iColor = iColor.Substring(1)
End If
If pDict.Elements.Contains("/BitsPerComponent") Then
iBits = pDict.Elements("/BitsPerComponent").ToString
End If
If pDict.Elements.Contains("/Filter") Then
iFilter = pDict.Elements("/Filter").ToString
If iFilter.Substring(0, 1) = "/" Then iFilter = iFilter.Substring(1)
End If
End If
' {0} Delim
' {1} Filename
' {2} Page Number
' {3} Page Width (inch)
' {4} Page Height (inch)
' {5} Page Orientation
' {6} Image Dimensions (pixels)
' {7} Bits Per Component
' {8} Colorspace
' {9} Decode Filter
tmp = String.Format("{1}{0}{2}{0}{3}{0}{4}{0}{5}{0}{6}{0}{7}{0}{8}{0}{9}", _
"|", _
"filename.ext goes here", _
iPage + 1, _
String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Width).Inch), _
String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Height).Inch), _
pDoc.Pages(iPage).Orientation.ToString, _
iWidth & "x" & iHeight, _
iBits, _
iColor, _
iFilter)
Console.WriteLine(tmp)
End If
End If
Next
pdoc.Close()
pDoc = Nothing
The output looks like this:
Code:
filename.ext goes here|1|11.04|8.49|Portrait|3311x2544|1|DeviceGray|CCITTFaxDecode
filename.ext goes here|2|8.49|11.06|Portrait|2544x3315|1|DeviceGray|CCITTFaxDecode
...
filename.ext goes here|39|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode
filename.ext goes here|40|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode