PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 9:54 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Tue Mar 19, 2019 2:47 am 
Offline

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3
Greetings!

I am using PDFsharp to create four single page reports, which it does very nicely. These reports include no images, only text, lines and rectangles, and are all approx 40~50 kB each.

Now I am trying to get PDFsharp to merge these together into a single document of four pages. I have used your samples as a guide, and it is working well.

However, instead of the expected file size of approx 200 kB, it is much bigger at around 360 kB. As a comparison, if I manually do the same thing with a third-party tool like PDFFill PDF Tools I get the lower file size expected.

I don't like waste in long-term storage files, so I am very puzzled by this and would like to resolve it. One of your articles suggests using:

docOutput.Options.CompressContentStreams = true;

but this make no difference, and perhaps is only intended for images.

Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 19, 2019 8:22 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
Hi!
Cwswpl wrote:
Can anyone suggest what is the cause of this, and how I might store the merged document in the same efficient format as the original single-page files?
We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.

The best you can do: Write a program that creates all documents in a single run - the four-pager and four single-pagers.
The second-best approach: Create the four-pager and then split it in a second run.

I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 19, 2019 9:21 am 
Offline

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3
Hi Thomas, thanks for the comments.

TH-Soft wrote:
We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.


Thanks, an educated guess is all I'm hoping for.

I can't post the documents because they are confidential. I also don't think I can generate all pages with the same process, because they originate in different systems, although using PDFsharp everywhere.

My code is probably not that helpful either, since I've wrapped it inside helper objects, but the process is basically what you might expect:

1. Source pages are created by drawing into XGraphics connected to a PdfPage

2. Each source document is saved to binary via a MemoryStream using PdfDocument.Save()

3. The documents are reloaded in the merging process with PdfReader.Open(strm, PdfDocumentOpenMode.Import);

4. With every document, source pages are transferred to the merged document with docMerge.AddPage()

5. The merged document is saved to binary via a MemoryStream using PdfDocument.Save()


Quote:
I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.


I was wondering about that, but surely the organisation is already inefficient with single documents, as they must all carry their own copy of the fonts. It's hard to understand why bringing them together should make it any worse. If anything, it might help, if some of the fonts are the same, then duplicates could get dropped.

Is there some analyser tool for PDF documents which identifies the content elements embedded in them, and how big they are? This might point me in the right direction.

I've tried just viewing the content in a text editor, to see if I can spot any difference in the patterns, but the PDF format is too messy to the human eye to make much progress this way.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 20, 2019 3:39 am 
Offline

Joined: Tue Mar 19, 2019 2:08 am
Posts: 3
Actually I solved this, and it wasn't really connected to PDFsharp.

I was reusing the same MemoryStream for the final output of the merged document that I had already used for loading the existing single pages.

Seems the previous input buffers were preserved in the stream, and were accumulating as surplus baggage at the end.

So now I am clearing the MemoryStream before the final write, using ms.SetLength(0). This produces an output file size of 210 kB as expected, and all makes sense now.

Interesting that the previous corrupt PDF opened OK, even though it had extra the junk tacked on the end.

Thanks for the help.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 120 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group