PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/ |
|
Elements of PDFs Using Powershell http://forum.pdfsharp.com/viewtopic.php?f=2&t=4317 |
Page 1 of 1 |
Author: | PDFsFun546 [ Sat Jan 22, 2022 4:19 am ] |
Post subject: | Elements of PDFs Using Powershell |
I have a lot of PDFs and I'm looking for a way to just identify what items are in them, such as the number of text blocks, forms, images, etc. Just a list of the elements so we can determine what PDFs to look at further. Is there a method to determine the actual elements in PDFs using PDFSharp? I'm using Powershell to scan the documents. These are the methods we were trying but they do not appear to give much detail. Code: $input.Internals.Catalog.Elements
$input.Contents.Elements $input.Info.Elements |
Author: | TH-Soft [ Mon Jan 24, 2022 8:38 am ] |
Post subject: | Re: Elements of PDFs Using Powershell |
PDFsFun546 wrote: These are the methods we were trying but they do not appear to give much detail. Not many details in your question.On SO you mention Sitecore, but do not provide other details there either. https://stackoverflow.com/q/70790714/162529 Where does "$input" come from? |
Author: | PDFsFun546 [ Mon Feb 21, 2022 10:26 pm ] |
Post subject: | Re: Elements of PDFs Using Powershell |
We tried the writing out the elements in the above post and did not see a way to get the actual elements. $input comes from the following code. $input = [PdfSharp.Pdf.IO.PdfReader]::Open($stream, [PdfSharp.Pdf.IO.PdfDocumentOpenMode]::ReadOnly) |
Author: | TH-Soft [ Tue Feb 22, 2022 8:57 am ] |
Post subject: | Re: Elements of PDFs Using Powershell |
There are no text block objects. Text is part of the page contents. Here is a C# sample that searches for images and exports JPEG images: http://pdfsharp.net/wiki/ExportImages-sample.ashx There are samples for extracting text on this forum and elsewhere on the Internet. I hope this helps to get you started. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |