Hi Thomas, thanks for the comments.
TH-Soft wrote:
We don't have any PDF files and we don't see any of your code, so don't expect more than an educated guess.
Thanks, an educated guess is all I'm hoping for.
I can't post the documents because they are confidential. I also don't think I can generate all pages with the same process, because they originate in different systems, although using PDFsharp everywhere.
My code is probably not that helpful either, since I've wrapped it inside helper objects, but the process is basically what you might expect:
1. Source pages are created by drawing into XGraphics connected to a PdfPage
2. Each source document is saved to binary via a MemoryStream using PdfDocument.Save()
3. The documents are reloaded in the merging process with PdfReader.Open(strm, PdfDocumentOpenMode.Import);
4. With every document, source pages are transferred to the merged document with docMerge.AddPage()
5. The merged document is saved to binary via a MemoryStream using PdfDocument.Save()
Quote:
I don't understand why four documents combined to one should be much larger than the sum of the four single documents. But this approach surely wastes disk space as fonts are not optimised by PDFsharp.
I was wondering about that, but surely the organisation is already inefficient with single documents, as they must all carry their own copy of the fonts. It's hard to understand why bringing them together should make it any worse. If anything, it might help, if some of the fonts are the same, then duplicates could get dropped.
Is there some analyser tool for PDF documents which identifies the content elements embedded in them, and how big they are? This might point me in the right direction.
I've tried just viewing the content in a text editor, to see if I can spot any difference in the patterns, but the PDF format is too messy to the human eye to make much progress this way.