PDFsharp & MigraDoc Foundation • View topic - PDF size increases after merging

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

PDF size increases after merging

Moderator: Stefan Lange

Page 1 of 1

[ 4 posts ]

Print view

Previous topic | Next topic

Author

Message

Cwswpl

Post subject: PDF size increases after merging

Posted: Tue Mar 19, 2019 2:47 am

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3

Greetings!

I am using PDFsharp to create four single page reports, which it does very nicely. These reports include no images, only text, lines and rectangles, and are all approx 40~50 kB each.

Now I am trying to get PDFsharp to merge these together into a single document of four pages. I have used your samples as a guide, and it is working well.

However, instead of the expected file size of approx 200 kB, it is much bigger at around 360 kB. As a comparison, if I manually do the same thing with a third-party tool like PDFFill PDF Tools I get the lower file size expected.

I don't like waste in long-term storage files, so I am very puzzled by this and would like to resolve it. One of your articles suggests using:

docOutput.Options.CompressContentStreams = true;

but this make no difference, and perhaps is only intended for images.

Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?

Top

TH-Soft

Post subject: Re: PDF size increases after merging

Posted: Tue Mar 19, 2019 8:22 am

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 920
Location: CCAA

Hi!

Cwswpl wrote:

Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?

We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.

The best you can do: Write a program that creates all documents in a single run - the four-pager and four single-pagers.
The second-best approach: Create the four-pager and then split it in a second run.

I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

Cwswpl

Post subject: Re: PDF size increases after merging

Posted: Tue Mar 19, 2019 9:21 am

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3

Hi Thomas, thanks for the comments.

TH-Soft wrote:

We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.

Thanks, an educated guess is all I'm hoping for.

I can't post the documents because they are confidential. I also don't think I can generate all pages with the same process, because they originate in different systems, although using PDFsharp everywhere.

My code is probably not that helpful either, since I've wrapped it inside helper objects, but the process is basically what you might expect:

1. Source pages are created by drawing into XGraphics connected to a PdfPage

2. Each source document is saved to binary via a MemoryStream using PdfDocument.Save()

3. The documents are reloaded in the merging process with PdfReader.Open(strm, PdfDocumentOpenMode.Import);

4. With every document, source pages are transferred to the merged document with docMerge.AddPage()

5. The merged document is saved to binary via a MemoryStream using PdfDocument.Save()

Quote:

I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.

I was wondering about that, but surely the organisation is already inefficient with single documents, as they must all carry their own copy of the fonts. It's hard to understand why bringing them together should make it any worse. If anything, it might help, if some of the fonts are the same, then duplicates could get dropped.

Is there some analyser tool for PDF documents which identifies the content elements embedded in them, and how big they are? This might point me in the right direction.

I've tried just viewing the content in a text editor, to see if I can spot any difference in the patterns, but the PDF format is too messy to the human eye to make much progress this way.

Top

Cwswpl

Post subject: Re: PDF size increases after merging

Posted: Wed Mar 20, 2019 3:39 am

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3

Actually I solved this, and it wasn't really connected to PDFsharp.

I was reusing the same MemoryStream for the final output of the merged document that I had already used for loading the existing single pages.

Seems the previous input buffers were preserved in the stream, and were accumulating as surplus baggage at the end.

So now I am clearing the MemoryStream before the final write, using ms.SetLength(0). This produces an output file size of 210 kB as expected, and all makes sense now.

Interesting that the previous corrupt PDF opened OK, even though it had extra the junk tacked on the end.

Thanks for the help.

Top

Page 1 of 1

[ 4 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 67 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum