PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

How to compare PDF documents in unit tests
http://forum.pdfsharp.com/viewtopic.php?f=2&t=4112
Page 1 of 1

Author:  Marvino [ Mon Mar 16, 2020 4:38 pm ]
Post subject:  How to compare PDF documents in unit tests

I have some very simple code that:

1. Adds a watermark to a PDF document A (like explained here - first kind) - the font I use is "Arial"
2. Then adds "cover pages" from another PDF document B
3. Adds bookmarks (outline) for the added "cover pages"
4. Save the result as PDF document C again

I tried to write unit tests for this code.
My idea was to add A, B and C as binary files, run the code in my unit test and then compare the results with my pre-existing version of "C".
That works just fine on my computer.

But when I run the same unit test in TeamCity (as part of our CI), the test fails and the created PDF document is indeed different.
When I compare the file to my local version (text compare), it seems that the binary contents are indeed very different. But I don't know why. My best guess is that the TeamCity server has different fonts installed and that this somehow affects the output (though I think it's a bit strange because I thought "Arial" should be the same on all Windows machines).

Anyway, what would be a good way to test this?
I don't have to do a binary comparison - if there was a way to extract the watermark from the output PDF somehow, then I could also check if the watermark text is what I expect it to be, etc.
Any suggestions?

Author:  Thomas Hoevel [ Mon Mar 16, 2020 5:29 pm ]
Post subject:  Re: How to compare PDF documents in unit tests

A binary comparison is the best possible regression test.

It should work if you take the Arial TTF files from the Teamcity server and install them on your local machine to generate the exemplar for the binary comparison.

Checking only for the watermark string could succeed for files that are completely useless.

Author:  Marvino [ Mon Mar 16, 2020 6:20 pm ]
Post subject:  Re: How to compare PDF documents in unit tests

But then my co-workers also would have to install the Arial TTF files from the server on their machines etc. etc. - not an ideal solution, unless I can somehow package the file with my tests. Is that possible?

How could I check for the watermark string?

Author:  Marvino [ Fri Mar 20, 2020 7:05 pm ]
Post subject:  Re: How to compare PDF documents in unit tests

So there is no way to either

a) Package the TTF file with the unit test, so that the output would be the same on every machine, or
b) Extract the watermark string?

Would be nice if someone could clarify that. Thanks!

Author:  Thomas Hoevel [ Mon Mar 23, 2020 8:42 am ]
Post subject:  Re: How to compare PDF documents in unit tests

You can include the font in your project in use IFontResolver (assuming WPF build of PDFsharp) to have all machines use the same font.

You can extract the watermark text for comparison, but that would be a weaker check.
Simple implementation: Locate the watermark in the PDF and copy that block to your unit test.

Author:  Marvino [ Mon Mar 23, 2020 9:01 am ]
Post subject:  Re: How to compare PDF documents in unit tests

Thank you! You mean this kind of code, right? http://developer.th-soft.com/developer/ ... -migradoc/
Is that what made it into the final version 1.50.5147? We use the NuGet package - or would we have to use a special WPF build?

As an alternative, how would I extract the watermark? Would be great if you could point me in the right direction. Thanks!

Author:  Thomas Hoevel [ Tue Mar 24, 2020 8:44 am ]
Post subject:  Re: How to compare PDF documents in unit tests

Marvino wrote:
As an alternative, how would I extract the watermark? Would be great if you could point me in the right direction. Thanks!
I would make a Diff of the PDFs with and without the Watermark to identify the watermark code. Thus you know what to search for in the unit test.

Author:  Marvino [ Tue Mar 24, 2020 6:27 pm ]
Post subject:  Re: How to compare PDF documents in unit tests

Thomas Hoevel wrote:
Marvino wrote:
As an alternative, how would I extract the watermark? Would be great if you could point me in the right direction. Thanks!
I would make a Diff of the PDFs with and without the Watermark to identify the watermark code. Thus you know what to search for in the unit test.


I'm not sure I understand. I have looked at the files in a diff tool. The difference is just binary data. I have no idea what to look for.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/