PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

bad Reading from Encryped PDF
http://forum.pdfsharp.com/viewtopic.php?f=3&t=4280
Page 1 of 1

Author:  MoKoNaFactory [ Thu Aug 05, 2021 8:06 am ]
Post subject:  bad Reading from Encryped PDF

Dear everyone.
thanks for reading this report.
I use Ver 1.50.5185 Bata.
I found Bug in data Read Prosess.

My client take me a lot of PDF files. There is need password for Open.
Password is same one.
My work is sort and rebuild pages by Bookmark's Title written by UTF16BigEndian, and remove password.
password remove is no trouble. readed page data copy to New PdfDocument Object and save it.
But few File have bad result.
I checked Bad result File on Project debug.
I found readed Title Data Broken.
I check Target data on notepad application. And found same Cause in data broken.
broken data have Encrypted data start with '<FEFF'
I know '<FEFF' is 'UTF-16 BOM' in Unicode string. But, no mean at Encryped data.
this thing is so critical bug. Because
1.data is broken. and Acrobat reader is sucsess to decode and show.
2.not throw Exception.
3.PdfItem.Value Broken. I can't Fix in My Project
4.Outbreak is rare(1/65536).but We Can't sense It.
5.I consider Decode Prosess have Bug,affect to other Object.
Please Fix it.

Additional info on 8/6
I can't take PDF file it is Confidential information.
But I can Show Related Infomation within the issue.

Encrypt Define Object
read Password is AAAaaa0.
-----------------------------------------------------------------------------
39 0 obj
<<
/Filter /Standard
/V 2
/R 3
/Length 128
/P -4
/O <8C3CA796FE76B252B98CA3B49C8FA992441CE5352426AD83E55DAED83A13BAC4>
/U <2AF8E7811DF4A627EC6A0E6AE2CCD89B28BF4E5E4E758A4164004E56FFFA0108>
>>
endobj
-----------------------------------------------------------------------------
Good Case
get 診療録 Japanese String On PDFSharp and Acrobat
-----------------------------------------------------------------------------
9 0 obj
<<
/Title <E797BC2B04625B50>
/Parent 3 0 R
/Dest /__WKANCHOR_2
/Count 0
/Next 10 0 R
>>
endobj
-----------------------------------------------------------------------------
bad Case
get 診療録 更新履歴 Japanese String On Acrobat
get ᝥ㤌〸 Broken String On PDFSharp
-----------------------------------------------------------------------------
10 0 obj
<<
/Title <FEFF3CEE245BAAA10CB1906B1415AA09DFAB>
/Parent 3 0 R
/Dest /__WKANCHOR_4
/Count 0
/Prev 9 0 R
>>
endobj
-----------------------------------------------------------------------------

Author:  MoKoNaFactory [ Tue Aug 17, 2021 9:23 am ]
Post subject:  Re: bad Reading from Encryped PDF

I found Fix point of this issue.

PDF.IO.Lexer Class and PDF.IO.Parser Class and PDF.IO.PDFReader Class are Fixing Target.
I consider for stopping UTF-16 header deleting block at Encryperd data exist.

1.Lexer Class need "IsEncrypted" mode bool flag.It is private Propaties.
-----------------------------------------------------------------------------
private bool IsEncrypted{get;set;}
-----------------------------------------------------------------------------
and set false in Constructor.

2."IsEncrypted" Enabler method. it is public acsesser.
-----------------------------------------------------------------------------
public void SetEncrypted()
{
this.IsEncrypted=true;
}
-----------------------------------------------------------------------------
3.Parser Object need Mode seting method in Includded Lexer object.it is calling to 2.
-----------------------------------------------------------------------------
public void SetLexerEncrypt()
{
_lexer.SetEccrypted();
}
-----------------------------------------------------------------------------
4.PDFReader class,Open Method,write to calling for 3. at block of Encrypt object Exist branch.
-----------------------------------------------------------------------------
// Is document encrypted?
PdfReference xrefEncrypt = document._trailer.Elements[PdfTrailer.Keys.Encrypt] as PdfReference;
if (xrefEncrypt != null)
{
parser.SetLexerEncrypt();//Adding this Line
-----------------------------------------------------------------------------
5.add branch to stop Lexer Object's Unicode header delete block . need 3 Fixing.
-----------------------------------------------------------------------------
if (_token.Length >= 2 && _token[0] == '\xFE' && _token[1] == '\xFF')
{
if(!IsEncrypted)//Adding This branch
{//Adding This branch
-----------------------------------------------------------------------------
if (_token.Length >= 2 && _token[0] == '\xFF' && _token[1] == '\xFE')
{
if(!IsEncrypted)//Adding This branch
{//Adding This branch
-----------------------------------------------------------------------------
string chars = _token.ToString();
int count = chars.Length;
if (count > 2 && chars[0] == (char)0xFE && chars[1] == (char)0xFF)
{
if(!IsEncrypted)//Adding This branch
{//Adding This branch
-----------------------------------------------------------------------------
It is Good working.
I hope that it will be useful to those who read the encrypted data.

Author:  MoKoNaFactory [ Mon Sep 13, 2021 7:49 am ]
Post subject:  Re: bad Reading from Encryped PDF

I Found file including broken Encrypt Object.
That File Have password object

/U <FEFF******>

password is same one,but This File was Fail to open.
Fail reason is 'The specified password is invalid.'

but Acrobat Reader and My fixed DLL is sucsess open.
I strong recommand this Fix.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/