PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

Merging PDFDocument data objects (not Pdfs written to fs)
http://forum.pdfsharp.com/viewtopic.php?f=2&t=3952
Page 1 of 1

Author:  PAquaticus [ Wed May 08, 2019 11:24 am ]
Post subject:  Merging PDFDocument data objects (not Pdfs written to fs)

Hello everyone,

I work with an array of PDFDocuments
Code:
PdfDocument[] results;
.
Is there a way to merge this array of documents to one big PDFDocument without writing it to the disk?

I have already tried to create an array of pages and add them to a document but I can't because they must belong to a document.
I already searched in the web and went over these solutions which did not help me because I do not want to write the PDF to disk before merging them.

http://www.pdfsharp.net/wiki/Concatenat ... ample.ashx
http://pdfsharp.com/PDFsharp/index.php% ... temid%3D60
http://pdfsharp.net/wiki/CombineDocuments-sample.ashx

I work with version 1.51.5185-beta

Cheers and thanks for the help

Author:  Thomas Hoevel [ Wed May 08, 2019 12:41 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

You can write the PDFs to MemoryStream objects if you do not want to write them to disk.

Author:  PAquaticus [ Wed May 08, 2019 3:34 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

But then I would have to write all pdfs to memory stream just to afterwards read them again. Is there no way to use PDFDocuments directly? I have to work with massive amounts of PDF's and being time efficient is very important for my project.

Author:  Thomas Hoevel [ Wed May 08, 2019 4:42 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

PAquaticus wrote:
Is there no way to use PDFDocuments directly?
Why do you create multiple files in the first place? Simply create one big file right from the start.

Author:  PAquaticus [ Mon May 13, 2019 8:39 am ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

Thomas Hoevel wrote:
PAquaticus wrote:
Is there no way to use PDFDocuments directly?
Why do you create multiple files in the first place? Simply create one big file right from the start.


Because I want to create up to a million pdf's in as little time as possible for a project I am working on. I have time restrictions so I have to make use of everything I can. Therefore I create the pdf's in parallel. But writing every single one to the disk does bottleneck. My Nvme can handle many iop's but I read in other posts in this forum that Pdfsharp is not threadsafe which still appears to be true in my case. So I would like to concatenate them and write them as one big file.

Author:  Thomas Hoevel [ Mon May 13, 2019 4:06 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

PAquaticus wrote:
My Nvme can handle many iop's but I read in other posts in this forum that Pdfsharp is not threadsafe which still appears to be true in my case.
Sounds as if you are not sure whether PDFsharp is threadsafe.
AFAIK the only issues are with font handling - and there are known workarounds. Posts about version 1.32 and older can be outdated.

Author:  pdfuser1 [ Fri Nov 15, 2019 5:39 am ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

Thomas Hoevel wrote:
Why do you create multiple files in the first place? Simply create one big file right from the start.

Hi,

I have a similar use case. Say, I have 100 transactions to generate receipts for. I create multiple one-page files and place them into individual folders (say, C:\receipts\1\receipt.pdf, C:\receipts\2\receipt.pdf, ... C:\receipts\99\receipt.pdf, C:\receipts\100\receipt.pdf). But I also need to create one large batch file that contains all of the 100 individual one-page receipts.

What I am doing right now (and it works), is to loop through the 100 records and, in each of the 100 iterations:
  1. Generate a PdfPage.
  2. Add the one page to a PdfDocument
  3. Save the document to the appropriate folder
  4. Re-open the just generated file
  5. Create a new Pdfpage from the 1st (and only) page in it.
  6. Append the page to the batch PdfDocument

After the loop exits, save the 100-page batch PDF file.

This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient. What I would like to do is avoid Steps 4 and 5. Instead, I'd like to add the PdfPage from Step 1 to two different PdfDocuments.

Is it possible? Thank you for any suggestions.

Author:  TH-Soft [ Fri Nov 15, 2019 7:20 am ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

pdfuser1 wrote:
This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient.
I'd modify step 1 to create two identical PDF pages at the same time - add one to a single-page PDF, add the other to the big PDF.

Author:  pdfuser1 [ Fri Nov 15, 2019 2:17 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

TH-Soft wrote:
pdfuser1 wrote:
This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient.
I'd modify step 1 to create two identical PDF pages at the same time - add one to a single-page PDF, add the other to the big PDF.

This is a great suggestion, but I actually generate the page by opening a one-page PDF template file (and then add small chunks of text to it). So, I would have to open the template twice, which is still just inefficient (the same number of file system I/O operations).

Ideally, I would want to be able to make a copy of my PdfPage object. But I couldn't get .Clone() to work. :( Is there a way to copy/clone a page or a document object? Perhaps, I am not using the .Clone() method correctly? Is there example I could follow?

Thank you again.

Author:  Thomas Hoevel [ Fri Nov 15, 2019 2:53 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

pdfuser1 wrote:
Is there a way to copy/clone a page or a document object?
I don't know if "Clone()" works properly for pages.

With respect to file size of the big file, it is most likely better to create the big file first and split that into many single-page files later.
To minimize disk IO, save the file to a MemoryStream and read it from there.
Things that are somewhat inefficient can still be very fast on modern computers.

Author:  pdfuser1 [ Fri Nov 15, 2019 4:39 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

Thomas Hoevel wrote:
pdfuser1 wrote:
Is there a way to copy/clone a page or a document object?
I don't know if "Clone()" works properly for pages.

With respect to file size of the big file, it is most likely better to create the big file first and split that into many single-page files later.
To minimize disk IO, save the file to a MemoryStream and read it from there.
Things that are somewhat inefficient can still be very fast on modern computers.
Yes, agree. Even with the current "inefficient" process, I can generate 5000 individual files, save them to individual networked folders and then generate a 5000-page 800MB batch file and save it to the network, all under 10 minutes. And that only happens once a year in my case. Most other times, it's a few dozens (or, at worst, a few hundred files).

So, all this is more of a learnign experience for me, in case if I ever need to worry about efficiency in another project. With that said, I have no experience with Streams or MemoryStreams, in general. I did try to go that route, but got nowhere. I probably wasn't closing the stream correctly because Adobe Acrobat was prompting me to save the file before closing. The file size on disk was indicative of a 3-page document, but I couldn't see anything beyond the first page. Is there a good example I can follow?

I truly appreciate your suggestions so far and am very thankful for the PDFsharp library. It's a great tool that we, developers, get to use for free. :shock:

Author:  Thomas Hoevel [ Fri Nov 15, 2019 5:05 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

The only "official" PDFsharp sample that uses MemoryStream is the web server sample:
http://www.pdfsharp.net/wiki/Clock-sample.ashx

There are many examples for MemoryStream on the web. After saving the document to the MemoryStream and before reading it again, you have to set the position of the MemoryStream to 0.

Author:  pdfuser1 [ Mon Nov 25, 2019 5:49 pm ]
Post subject:  Re: Merging PDFDocument data objects (not Pdfs written to fs

I got saving to MemoryStream working. It was very easy. I was definitely overthinking it.

Saving to a MemoryStream instead of to the file system and thus avoiding an extra read from the file system saves me about 25% of the time. 5,000 individual one-page PDFs and then a 5,000-page batch document are produced 2 minutes faster (in 6:40 mins vs. 8:40).

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/