PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sat Nov 09, 2024 12:45 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Fri Apr 26, 2019 11:20 am 
Offline

Joined: Fri Apr 26, 2019 11:15 am
Posts: 1
Hi

I'm just wondering if I can add accessibility tags to pdf with PDFSharp now. I found someone asked the same question years ago, and the answer was like this function was in the wish list then. Is it available now? If so, may you give me some tips, please?

Thx a lot


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 23, 2021 2:01 am 
Offline

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18
Here's a small example. This should get you started.

Code:
        static void Main(string[] args)
        {
            string doc1 = @"C:\doc1.pdf";

            CreatePdf(doc1);

            CSequence contents;

            using (var doc = PdfReader.Open(doc1, PdfDocumentOpenMode.Modify))
            {
                MakeAccessible(doc);

                doc.Save(doc1);
                doc.Close();
            }

             Process.Start(doc1);

        }

        public static void CreatePdf(string filename)
        {
            // Create a new PDF document
            PdfDocument document = new PdfDocument();
            document.Info.Title = "Created with PDFsharp";

            // Create an empty page
            PdfPage page = document.AddPage();

            AddTextToPage(page);

            // Save the document...
            document.Save(filename);
            // ...and start a viewer.
        }

        public static void AddTextToPage(PdfPage page)
        {
            // Get an XGraphics object for drawing
            XGraphics gfx = XGraphics.FromPdfPage(page);

            // Create a font
            XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic, new XPdfFontOptions(PdfFontEncoding.WinAnsi));

            // Draw the text
            gfx.DrawString("Hello, World!", font, XBrushes.Black,
              new XRect(0, 0, page.Width, page.Height),
              XStringFormats.Center);
        }

        // Please refer to the pdf tech specs on what all entails in the content stream
        // https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
        public static void MakeAccessible(PdfDocument pdf)
        {
            // if you don't have Adobe Acrobat to run accessibility check then
            // first download the Free Accessibility Checker (https://www.access-for-all.ch/en/pdf-accessibility-checker.html)
            // or open up an online accessibility checker in a browser

            // There are certain requirements the document has to meet
            // to make it near 100% compliant

            // Set you pdf version to PDF ver 1.7
            pdf.Version = 17;

            // Set the author of the document
            pdf.Info.Author = "Me, Myself, and I";

            // Date document was created
            pdf.Info.CreationDate = DateTime.Now;

            // The app/system that created the document
            pdf.Info.Creator = "My PDF Generator";

            // subject this document is about
            pdf.Info.Subject = "PDF Accessibility Test";

            #region /***** BEGIN ACCESSIBILITY REQUIREMENTS *****/

            pdf.Language = "en-US";
            pdf.Info.Title = "PDF Accessibiity";

            pdf.ViewerPreferences.DisplayDocTitle = true;
            pdf.ViewerPreferences.Direction = PdfReadingDirection.LeftToRight;

            // flag indicating the document conforms to Tagged PDF conventions
            PdfCatalog catalog = pdf.Internals.Catalog;
            PdfDictionary markInfo = new PdfDictionary(pdf);
            catalog.Elements.Add("/MarkInfo", markInfo);
            markInfo.Elements.Add("/Marked", PdfBoolean.True);

            // add outlines (bookmarks)
            // add root bookmark and set to first page as goto action
            pdf.Outlines.Add(pdf.Info.Title, pdf.Pages[0], true, PdfOutlineStyle.Bold);

            PdfArray parentRootTree = new PdfArray(pdf);
            pdf.Internals.AddObject(parentRootTree);

            List<PdfDictionary> tagList = new List<PdfDictionary>();

            // create tags for content elements within each page
            for (int i = 0; i < pdf.Pages.Count; i++)
            {
                PdfPage page = pdf.Pages[i];

                // tag page as a structured parent object and assign a unique integer identifier
                page.Elements.Add("/StructParents", new PdfInteger(i));

                // specify a tab order.
                // The possible values are R(row order), C(column order),
                // and S (structure order)
                page.Elements.Add("/Tabs", new PdfName("/S"));

                // iterate thru each content item and assign a
                // marked content id and give it a structure type (Span, P, Table, TH, TR, etc.)
                PdfContents contents = page.Contents;
                if (contents != null && contents.Elements.Count > 0)
                {
                    PdfArray elementRootNode = new PdfArray(pdf);
                    pdf.Internals.AddObject(elementRootNode);

                    CSequence pageContents = ContentReader.ReadContent(page);
                    int numOfContentElements = pageContents.Count;

                    int mcid = 0; //starting marked content identifier
                    COperator currentElement = null;
                    COperator previousElement = null;
                    COperator nextElement = null;
                    CString newStringContent = null;
                    CSequence newContent = new CSequence();

                    // assign each content elements a MCID and put it into a
                    // marked content block
                    for (int j = 0; j < numOfContentElements; j++)
                    {
                        currentElement = (COperator)pageContents[j];

                        if (j > 0)
                            previousElement = (COperator)pageContents[j - 1];
                        if (j > 0 && j + 1 < numOfContentElements)
                            nextElement = (COperator)pageContents[j + 1];

                        // marked content if not already marked
                        if (currentElement.OpCode.OpCodeName == OpCodeName.q &&
                            (previousElement == null || previousElement.OpCode.OpCodeName != OpCodeName.BMC))
                        {

                            /*** BEGIN CONTENT STREAM CHANGES ***/
                            String elementStructureType = "/Span";

                            COperator beginDataContent = OpCodes.OperatorFromName("BDC");
                            newStringContent = new CString();
                            newStringContent.Value = $"{elementStructureType} ";
                            newStringContent.CStringType = CStringType.Dictionary;
                            beginDataContent.Operands.Add(newStringContent);

                            newStringContent = new CString();
                            newStringContent.Value = $"<</MCID {mcid.ToString()}>> ";
                            newStringContent.CStringType = CStringType.Dictionary;
                            beginDataContent.Operands.Add(newStringContent);

                            newContent.Add(beginDataContent);
                            newContent.Add(currentElement);
                            /*** END CONTENT STREAM CHANGES ***/

                            // add structure tags to page associating it to the marked content identifier
                            PdfDictionary structElement = new PdfDictionary(pdf);
                            structElement.Elements.Add("/K", new PdfInteger(mcid));
                            structElement.Elements.Add("/Pg", page.Reference.Value);
                            structElement.Elements.Add("/S", new PdfName("/P"));
                            elementRootNode.Elements.Add(structElement);

                            tagList.Add(structElement);
                        }
                        else if (currentElement.OpCode.OpCodeName == OpCodeName.Q &&
                            (previousElement == null || nextElement.OpCode.OpCodeName != OpCodeName.EMC))
                        {
                            // add the closing tag of the marked content block
                            COperator endDataContent = OpCodes.OperatorFromName("EMC");
                            newStringContent = new CString();
                            newStringContent.Value = OpCodeName.q.ToString() + "\n";
                            newStringContent.CStringType = CStringType.Dictionary;
                            endDataContent.Operands.Add(newStringContent);

                            newContent.Add(endDataContent);
                        }
                        else
                        {
                            newContent.Add(currentElement);
                        }

                        mcid++;
                    }

                    page.Contents.ReplaceContent(newContent);

                    parentRootTree.Elements.Add(new PdfInteger(i));
                    parentRootTree.Elements.Add(elementRootNode);
                }


            }

            #region /*** BEGIN STRUCTURE TAGS ***/
            // the PdfStructureTreeRoot class is there in  https://github.com/empira/PDFsharp
            // however it's not available from PdfSharp in Nuget
            PdfDictionary structureRoot = new PdfDictionary(pdf);
            pdf.Internals.AddObject(structureRoot);
            pdf.Internals.Catalog.Elements.Add("/StructTreeRoot", structureRoot);

            //
            PdfDictionary docStructElement = new PdfDictionary(pdf);
            docStructElement.Elements.Add("/P", structureRoot.Reference);
            docStructElement.Elements.Add("/S", new PdfName("/Document"));
            pdf.Internals.AddObject(docStructElement);

            //
            PdfArray tagRefs = new PdfArray(pdf);
            foreach (var item in tagList)
            {
                item.Elements.Add("/P", docStructElement.Reference.Value);
                pdf.Internals.AddObject(item);
                tagRefs.Elements.Add(item);
            }
            pdf.Internals.AddObject(tagRefs);
            docStructElement.Elements.Add("/K", tagRefs.Reference.Value);

            PdfArray docStructTreeArray = new PdfArray(pdf);
            docStructTreeArray.Elements.Add(docStructElement);

            PdfDictionary numDict = new PdfDictionary(pdf);
            numDict.Elements.Add("/Nums", parentRootTree.Reference.Value);
            pdf.Internals.AddObject(numDict);

            structureRoot.Elements.Add("/K", docStructTreeArray);
            structureRoot.Elements.Add("/ParentTree", numDict.Reference);
            structureRoot.Elements.Add("/ParentTreeNextKey", new PdfInteger(parentRootTree.Count() + 1));
            structureRoot.Elements.Add("/Type", new PdfName("/StructTreeRoot"));

            #endregion /*** END STRUCTURE TAGS ***/

            #endregion /***** END ACCESSIBILITY REQUIREMENTS *****/


        }



Best,
Reas


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 21, 2021 9:19 am 
Offline

Joined: Wed Apr 21, 2021 8:07 am
Posts: 1
Dear rsoeung

Thanks for the MakeAccessible method. It got me started on the pdf/ua accessibility.

But it´s pretty hard to determine what tags I´m missing to add. I use PAC 3(https://www.access-for-all.ch/ch/pdf-werkstatt.html) to validate my PDF.

It shows it has 62 content errors that I try to fix. But the MakeAccessible method does not have an effect on the elements. Could you please guide me to how I fix the "Text object not tagged" and "Path object not tagged" ?

Pac 3 Status(Pac3.png)https://drive.google.com/file/d/1rjNMJwYdgVmEF0hXBK21NoapQHutwy_P/view?usp=sharing
PdfFromHtml.pdf: https://drive.google.com/file/d/1_OxJlgRiKeP7iIxmQFjQKcYdrZmziUol/view?usp=sharing


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 25, 2021 12:20 pm 
Offline

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18
Hi bandersenDK,

I use a simple tool called Pdf File Analyzer https://www.codeproject.com/Articles/450254/PDF-File-Analyzer-With-Csharp-Parsing-Classes-Vers to analyze the stream and contents of a pdf. The "Text object not tagged" and "Path object not tagged" errors refer to the contents stream within a "XObject" form. You'd have to modify the sample code to iterate through this content and add tags appropriately.


-rsoeung


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 22, 2022 3:27 am 
Offline
Supporter

Joined: Sat Oct 18, 2008 4:15 pm
Posts: 50
Hi rsoeung

thanks for your sample. I was able to make it compile when replacing parentRootTree.Count with parentRootTree.Elements.Count
It does not work on pdf that was just created (not saved) - it fails on ContentReader.ReadContent(page);
and when I save PDF and make it accessible resulting PDF is no good. First page is completely gone, some graphs are gone or moved.

Is there maybe a way to add tags as I create document ?


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2022 6:24 pm 
Offline

Joined: Tue Aug 23, 2022 6:11 pm
Posts: 1
Hi,

We are trying to create an accessible PDF using PDFSharp. But contrary to the example above (which seems to add accessibility tags in a second pass) I would like to write the accessibility tags along with the actual content.

Also, I would like to have the PDF validated with the PAC3 accessibility validation tool: https://www.access-for-all.ch/en/pdf-ac ... ecker.html

So far, it seems that I have managed to write correct(?) BDC / EMC markers to the content. Also I have managed to write a correct tag structure that is displayed in Acrobat. However, tag & content elements seem to be "unconnected". Also PAC3 complains about non-tagged text objects.

Could someone consult on how to do the correct tag generation using PDFSharp?

Thank you,

Andi


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 23, 2022 6:34 pm 
Offline
Supporter

Joined: Sat Oct 18, 2008 4:15 pm
Posts: 50
good luck my friend. I ended up going HTML route to get accessibility :(
we are dropping PDFSharp and just leave it for the user to print to PDF is he needs it for some reason
but please, do post here if you figure something


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: AdsBot [Google], Google [Bot] and 21 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group