PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

PDF size increases after merging
http://forum.pdfsharp.com/viewtopic.php?f=2&t=3936
Page 1 of 1

Author:  Cwswpl [ Tue Mar 19, 2019 2:47 am ]
Post subject:  PDF size increases after merging

Greetings!

I am using PDFsharp to create four single page reports, which it does very nicely. These reports include no images, only text, lines and rectangles, and are all approx 40~50 kB each.

Now I am trying to get PDFsharp to merge these together into a single document of four pages. I have used your samples as a guide, and it is working well.

However, instead of the expected file size of approx 200 kB, it is much bigger at around 360 kB. As a comparison, if I manually do the same thing with a third-party tool like PDFFill PDF Tools I get the lower file size expected.

I don't like waste in long-term storage files, so I am very puzzled by this and would like to resolve it. One of your articles suggests using:

docOutput.Options.CompressContentStreams = true;

but this make no difference, and perhaps is only intended for images.

Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?

Author:  TH-Soft [ Tue Mar 19, 2019 8:22 am ]
Post subject:  Re: PDF size increases after merging

Hi!
Cwswpl wrote:
Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?
We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.

The best you can do: Write a program that creates all documents in a single run - the four-pager and four single-pagers.
The second-best approach: Create the four-pager and then split it in a second run.

I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.

Author:  Cwswpl [ Tue Mar 19, 2019 9:21 am ]
Post subject:  Re: PDF size increases after merging

Hi Thomas, thanks for the comments.

TH-Soft wrote:
We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.


Thanks, an educated guess is all I'm hoping for.

I can't post the documents because they are confidential. I also don't think I can generate all pages with the same process, because they originate in different systems, although using PDFsharp everywhere.

My code is probably not that helpful either, since I've wrapped it inside helper objects, but the process is basically what you might expect:

1. Source pages are created by drawing into XGraphics connected to a PdfPage

2. Each source document is saved to binary via a MemoryStream using PdfDocument.Save()

3. The documents are reloaded in the merging process with PdfReader.Open(strm, PdfDocumentOpenMode.Import);

4. With every document, source pages are transferred to the merged document with docMerge.AddPage()

5. The merged document is saved to binary via a MemoryStream using PdfDocument.Save()


Quote:
I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.


I was wondering about that, but surely the organisation is already inefficient with single documents, as they must all carry their own copy of the fonts. It's hard to understand why bringing them together should make it any worse. If anything, it might help, if some of the fonts are the same, then duplicates could get dropped.

Is there some analyser tool for PDF documents which identifies the content elements embedded in them, and how big they are? This might point me in the right direction.

I've tried just viewing the content in a text editor, to see if I can spot any difference in the patterns, but the PDF format is too messy to the human eye to make much progress this way.

Author:  Cwswpl [ Wed Mar 20, 2019 3:39 am ]
Post subject:  Re: PDF size increases after merging

Actually I solved this, and it wasn't really connected to PDFsharp.

I was reusing the same MemoryStream for the final output of the merged document that I had already used for loading the existing single pages.

Seems the previous input buffers were preserved in the stream, and were accumulating as surplus baggage at the end.

So now I am clearing the MemoryStream before the final write, using ms.SetLength(0). This produces an output file size of 210 kB as expected, and all makes sense now.

Interesting that the previous corrupt PDF opened OK, even though it had extra the junk tacked on the end.

Thanks for the help.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/