PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Tue Mar 19, 2024 6:42 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Wed May 19, 2021 1:39 pm 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
My app produces a really large all-text PDF and it seems like I should be able to get it much smaller.

If I take my example PDF of 176MB, 7zip will compress the file down to 4MB.

The PDF size is mostly the Content Stream. I have tested optimization using Adobe and many online services but they do not reduce the size by very little. I tested printing to a new PDF and that only increased the size.

The caveat is that the PDF is produced by another application and I have little control with how the PDF is produced.

Any suggestions on how I might greatly reduce the size?

Image


Top
 Profile  
Reply with quote  
PostPosted: Wed May 19, 2021 2:07 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 905
Location: CCAA
Hi!
DotnetMe wrote:
Any suggestions on how I might greatly reduce the size?
I don't think opening and saving with PDFsharp will create a smaller file than Adobe Reader.

Not much can be said for sure without seeing the PDF.

Creating the PDF right from the start with PDFsharp would give you better control about the size.

Content streams can be "zipped" inside the PDF. Maybe the streams contain redundant information, allowing 7zip to gain much compressing across streams.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Wed May 19, 2021 2:23 pm 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
TH-Soft wrote:
Hi!
DotnetMe wrote:
Any suggestions on how I might greatly reduce the size?
Content streams can be "zipped" inside the PDF. Maybe the streams contain redundant information, allowing 7zip to gain much compressing across streams.

So currently the Content Stream might not be compressed at all. Can PDFsharp replace a content Stream of an existing PDF?

Is there an example of compressing a Content Stream? Maybe I can then piece together replacing the existing stream with a compressed one.

The PDF's are created using Crystal Reports.. and are highly inefficient garbage.

I really appreciate the help.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 19, 2021 4:13 pm 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
I am not yet sure how I can compress the Content Stream.

In my test I am creating a new PDF with compression options set, and then adding each page of the source PDF to the new PDF.

The compression settings seem to have no effect. The PDF files are the same size.

Code:
PdfDocument outputDocument = new PdfDocument();
outputDocument.Options.NoCompression = false;
outputDocument.Options.CompressContentStreams = true;
outputDocument.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;


UPDATE:

My testing method definitely works. I performed the same steps but with compression set to off for the new PDF. The new PDF was double the size. I then took the uncompressed PDF and can it again but with compression on. The new PDF was the same size as my original compressed PDF.

So I guess it comes down to the compression method being not that great for PDF's?

Maybe Flate is the best it can currently do. I compared the PDF's in a binary editor and I can see the portions being compressed. Though there seems to be so much room for improvement by Adobe for compressing PDF's.

Oh well... I guess further file size reduction is beyond what is available in the PDF spec.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 06, 2021 4:45 am 
Offline

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18
Could you provide an example of the pdf if that's ok?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 06, 2021 5:33 am 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
rsoeung wrote:
Could you provide an example of the pdf if that's ok?


I'm not permitted to, but it is entirely text without any images. But 50% of the text on each page is repeated. Think legal document.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 06, 2021 6:10 pm 
Offline

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18
That's fine. Almost everything in the PDF can be compressed however currently PDFSharp doesn't offer that capability. I'm in the middle of writing an extension to PDFSharp to compress more than content streams. I'll post it in this thread after I complete this.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 07, 2021 3:56 pm 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
rsoeung wrote:
That's fine. Almost everything in the PDF can be compressed however currently PDFSharp doesn't offer that capability. I'm in the middle of writing an extension to PDFSharp to compress more than content streams. I'll post it in this thread after I complete this.


Excellent... thanks!


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 21, 2022 12:13 pm 
Offline

Joined: Wed May 19, 2021 1:18 pm
Posts: 6
Just checking if there is any progress in adding the compression.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot] and 52 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group