PDFsharp & MigraDoc Foundation • View topic - Elements of PDFs Using Powershell

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

Elements of PDFs Using Powershell

Moderator: Stefan Lange

Page 1 of 1

[ 4 posts ]

Print view

Previous topic | Next topic

Author

Message

PDFsFun546

Post subject: Elements of PDFs Using Powershell

Posted: Sat Jan 22, 2022 4:19 am

Joined: Fri Jan 21, 2022 4:41 am
Posts: 2

I have a lot of PDFs and I'm looking for a way to just identify what items are in them, such as the number of text blocks, forms, images, etc. Just a list of the elements so we can determine what PDFs to look at further. Is there a method to determine the actual elements in PDFs using PDFSharp? I'm using Powershell to scan the documents. These are the methods we were trying but they do not appear to give much detail.

Code:

$input.Internals.Catalog.Elements
$input.Contents.Elements
$input.Info.Elements

Top

TH-Soft

Post subject: Re: Elements of PDFs Using Powershell

Posted: Mon Jan 24, 2022 8:38 am

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 916
Location: CCAA

PDFsFun546 wrote:

These are the methods we were trying but they do not appear to give much detail.

Not many details in your question.
On SO you mention Sitecore, but do not provide other details there either.
https://stackoverflow.com/q/70790714/162529

Where does "$input" come from?

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

PDFsFun546

Post subject: Re: Elements of PDFs Using Powershell

Posted: Mon Feb 21, 2022 10:26 pm

Joined: Fri Jan 21, 2022 4:41 am
Posts: 2

We tried the writing out the elements in the above post and did not see a way to get the actual elements.

$input comes from the following code.

$input = [PdfSharp.Pdf.IO.PdfReader]::Open($stream, [PdfSharp.Pdf.IO.PdfDocumentOpenMode]::ReadOnly)

Top

TH-Soft

Post subject: Re: Elements of PDFs Using Powershell

Posted: Tue Feb 22, 2022 8:57 am

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 916
Location: CCAA

There are no text block objects. Text is part of the page contents.

Here is a C# sample that searches for images and exports JPEG images:
http://pdfsharp.net/wiki/ExportImages-sample.ashx

There are samples for extracting text on this forum and elsewhere on the Internet.

I hope this helps to get you started.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

Page 1 of 1

[ 4 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: No registered users and 375 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum