PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/ |
|
Iterating through a PDF and retrieving image properties http://forum.pdfsharp.com/viewtopic.php?f=2&t=185 |
Page 1 of 1 |
Author: | Vermis [ Mon Jul 02, 2007 5:00 pm ] |
Post subject: | Iterating through a PDF and retrieving image properties |
The project I'm working on needs to go through a directory of PDFS and spit out information about all of the images in them. The PDFs will all be created from scanned images, so the focus of this is slightly narrow. I have some proof of concept code working with the limited testing that I've done so far (far from being production code). However, I've only just started working with PDFsharp and this doesn't seem to be very elegant or robust. Is there a better way that I'm overlooking? Code: Dim pDoc As PdfSharp.Pdf.PdfDocument Dim pDict As PdfSharp.Pdf.PdfDictionary Dim pRef As PdfSharp.Pdf.Advanced.PdfReference Dim iPage As Integer = 0 Dim iWidth As String = "" Dim iHeight As String = "" Dim iColor As String = "" Dim iBits As String = "" Dim iFilter As String = "" Dim tmp As String = "" ' Open the PDF in read-only mode pDoc = PdfSharp.Pdf.IO.PdfReader.Open("C:\Test Files\Color PDFs\8902-01-0003.pdf", PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly) ' Loop through each page, find the image, and report on it For iPage = 0 To pdoc.Pages.Count - 1 iWidth = "" iHeight = "" iColor = "" iBits = "" iFilter = "" ' Does this page have a Resources Element? If pDoc.Pages(iPage).Elements.Contains("/Resources") Then pDict = pDoc.Pages(iPage).Elements("/Resources") ' Does the Resources Element contain an XObject? If pDict.Elements.Contains("/XObject") Then pDict = pDict.Elements("/XObject") ' Does the XObject contain an Im1 image element? If pDict.Elements.Contains("/Im1") Then pRef = pDict.Elements("/Im1") ' Get the dictionary by the reference under Im1 pDict = pDoc.Internals.GetObject(pRef.ObjectID) ' Get image details If pDict.Elements.Contains("/Width") Then iWidth = pDict.Elements("/Width").ToString If pDict.Elements.Contains("/Height") Then iHeight = pDict.Elements("/Height").ToString If pDict.Elements.Contains("/ColorSpace") Then iColor = pDict.Elements("/ColorSpace").ToString If iColor.Substring(0, 1) = "/" Then iColor = iColor.Substring(1) End If If pDict.Elements.Contains("/BitsPerComponent") Then iBits = pDict.Elements("/BitsPerComponent").ToString End If If pDict.Elements.Contains("/Filter") Then iFilter = pDict.Elements("/Filter").ToString If iFilter.Substring(0, 1) = "/" Then iFilter = iFilter.Substring(1) End If End If ' {0} Delim ' {1} Filename ' {2} Page Number ' {3} Page Width (inch) ' {4} Page Height (inch) ' {5} Page Orientation ' {6} Image Dimensions (pixels) ' {7} Bits Per Component ' {8} Colorspace ' {9} Decode Filter tmp = String.Format("{1}{0}{2}{0}{3}{0}{4}{0}{5}{0}{6}{0}{7}{0}{8}{0}{9}", _ "|", _ "filename.ext goes here", _ iPage + 1, _ String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Width).Inch), _ String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Height).Inch), _ pDoc.Pages(iPage).Orientation.ToString, _ iWidth & "x" & iHeight, _ iBits, _ iColor, _ iFilter) Console.WriteLine(tmp) End If End If Next pdoc.Close() pDoc = Nothing The output looks like this: Code: filename.ext goes here|1|11.04|8.49|Portrait|3311x2544|1|DeviceGray|CCITTFaxDecode
filename.ext goes here|2|8.49|11.06|Portrait|2544x3315|1|DeviceGray|CCITTFaxDecode ... filename.ext goes here|39|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode filename.ext goes here|40|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |