PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sat Apr 27, 2024 1:20 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Mon Jun 29, 2009 10:09 am 
Offline

Joined: Mon Jun 29, 2009 9:30 am
Posts: 2
I've been sitting here all morning trying to traverse the structure of a pdf (to replace som text in a pdf).

My objective is to have pdf:s that are constructed in advance (call 'em templates) that have %identifier% in text within them. I've been thinking in the lines of reading all text from the PDF and replacing all %identifier% i find with their corresponding values.

The problem is, I can't seem to find them. I start parsing from PdfDocument.Pages.Element.Values. There I find a PdfArray with a PdfReference that point to the PdfPage. So far so good. One would expect that the elements containg the text would reside in the Content. However, I cannot find it. I'm starting to wonder if there are different ways to save a PDF and if I have done it incorrectly (Save as from Word 2007).

Anyone got any pointers or suggestions?

(I don't iterate trought the PdfPage:s directly due to the fact that my recursive method takes a PdfItem[] as the param for elements to inspect).


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 29, 2009 10:58 am 
Offline

Joined: Tue Feb 20, 2007 1:36 pm
Posts: 8
Hi,

Text in PDF is essentially a one-way proposition, especially when the font encoding is non-standard. So, the best I have achieved up till now (though not with the current version of PDFsharp) is to extract the sequence of words on a page, with the relative position and size of the respective BoundingBoxes. Then it is possible to cover a BB with a rectangle and write the new string over that.

As you can see, this approach is not well suited for flowing text, as the length of the rendered replacement must be identical to or a little less than the length of the original string.

But you were talking about templates, so I would choose another approach entirely:
Create your PDFs with the help of MigraDoc. As a block of text is a plain ASCII string before actual creation of the PDF, it should be trivial to do replacements in your program (e.g. "Hello %name%!" becomes "Hello Sally!") and then create the PDFs on-the-fly.

Peter


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 29, 2009 1:00 pm 
Offline

Joined: Mon Jun 29, 2009 9:30 am
Posts: 2
Thank you for your quick reply. Well, yes, that is also an option. The thing is that I'd like to have a pdf as a template containing tags. So the end-user just can create a document with %xxx%. However, I see that this isn't a good idea =\.

I going to try another library (which i used before, but that's very bloated) and see if it has some support for it. And check the streams and see if I can LZDecode it.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 29, 2009 2:05 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3096
Location: Cologne, Germany
kelthar wrote:
The problem is, I can't seem to find them.

Your chance to find "%identifier%" will be greater if you use a fixed-pitch font for the text (e. g. Courier).

With proportional fonts, words will often be drawn in small parts (one to three letters) for best results with letter spacing and kerning pairs and whatever.
To find "%identifier%" you'll have to concatenate the texts that are side-by-side in one line (most tools will emit letters from left to right (but you cannot even be sure of this)).

Telling letter spacing from word spacing is a category where many PDF2RTF (or PDF2DOC) converters fail.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 425 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group