PDFsharp & MigraDoc Foundation • View topic - May I add accessibility tags to PDF with PDFSharp?

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

May I add accessibility tags to PDF with PDFSharp?

Moderator: Stefan Lange

Page 1 of 1

[ 7 posts ]

Print view

Previous topic | Next topic

Author

Message

hoop22

Post subject: May I add accessibility tags to PDF with PDFSharp?

Posted: Fri Apr 26, 2019 11:20 am

Joined: Fri Apr 26, 2019 11:15 am
Posts: 1

Hi

I'm just wondering if I can add accessibility tags to pdf with PDFSharp now. I found someone asked the same question years ago, and the answer was like this function was in the wish list then. Is it available now? If so, may you give me some tips, please?

Thx a lot

Top

rsoeung

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Sat Jan 23, 2021 2:01 am

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18

Here's a small example. This should get you started.

Code:

        static void Main(string[] args)
        {
            string doc1 = @"C:\doc1.pdf";

            CreatePdf(doc1);

            CSequence contents;

            using (var doc = PdfReader.Open(doc1, PdfDocumentOpenMode.Modify))
            {
                MakeAccessible(doc);

                doc.Save(doc1);
                doc.Close();
            }

             Process.Start(doc1);

        }

        public static void CreatePdf(string filename)
        {
            // Create a new PDF document
            PdfDocument document = new PdfDocument();
            document.Info.Title = "Created with PDFsharp";

            // Create an empty page
            PdfPage page = document.AddPage();

            AddTextToPage(page);

            // Save the document...
            document.Save(filename);
            // ...and start a viewer.
        }

        public static void AddTextToPage(PdfPage page)
        {
            // Get an XGraphics object for drawing
            XGraphics gfx = XGraphics.FromPdfPage(page);

            // Create a font
            XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic, new XPdfFontOptions(PdfFontEncoding.WinAnsi));

            // Draw the text
            gfx.DrawString("Hello, World!", font, XBrushes.Black,
              new XRect(0, 0, page.Width, page.Height),
              XStringFormats.Center);
        }

        // Please refer to the pdf tech specs on what all entails in the content stream
        // https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
        public static void MakeAccessible(PdfDocument pdf)
        {
            // if you don't have Adobe Acrobat to run accessibility check then
            // first download the Free Accessibility Checker (https://www.access-for-all.ch/en/pdf-accessibility-checker.html)
            // or open up an online accessibility checker in a browser

            // There are certain requirements the document has to meet
            // to make it near 100% compliant

            // Set you pdf version to PDF ver 1.7
            pdf.Version = 17;

            // Set the author of the document
            pdf.Info.Author = "Me, Myself, and I";

            // Date document was created
            pdf.Info.CreationDate = DateTime.Now;

            // The app/system that created the document
            pdf.Info.Creator = "My PDF Generator";

            // subject this document is about
            pdf.Info.Subject = "PDF Accessibility Test";

            #region /***** BEGIN ACCESSIBILITY REQUIREMENTS *****/

            pdf.Language = "en-US";
            pdf.Info.Title = "PDF Accessibiity";

            pdf.ViewerPreferences.DisplayDocTitle = true;
            pdf.ViewerPreferences.Direction = PdfReadingDirection.LeftToRight;

            // flag indicating the document conforms to Tagged PDF conventions
            PdfCatalog catalog = pdf.Internals.Catalog;
            PdfDictionary markInfo = new PdfDictionary(pdf);
            catalog.Elements.Add("/MarkInfo", markInfo);
            markInfo.Elements.Add("/Marked", PdfBoolean.True);

            // add outlines (bookmarks)
            // add root bookmark and set to first page as goto action
            pdf.Outlines.Add(pdf.Info.Title, pdf.Pages[0], true, PdfOutlineStyle.Bold);

            PdfArray parentRootTree = new PdfArray(pdf);
            pdf.Internals.AddObject(parentRootTree);

            List<PdfDictionary> tagList = new List<PdfDictionary>();

            // create tags for content elements within each page
            for (int i = 0; i < pdf.Pages.Count; i++)
            {
                PdfPage page = pdf.Pages[i];

                // tag page as a structured parent object and assign a unique integer identifier
                page.Elements.Add("/StructParents", new PdfInteger(i));

                // specify a tab order. 
                // The possible values are R(row order), C(column order),
                // and S (structure order)
                page.Elements.Add("/Tabs", new PdfName("/S"));

                // iterate thru each content item and assign a 
                // marked content id and give it a structure type (Span, P, Table, TH, TR, etc.)
                PdfContents contents = page.Contents;
                if (contents != null && contents.Elements.Count > 0)
                {
                    PdfArray elementRootNode = new PdfArray(pdf);
                    pdf.Internals.AddObject(elementRootNode);

                    CSequence pageContents = ContentReader.ReadContent(page);
                    int numOfContentElements = pageContents.Count;

                    int mcid = 0; //starting marked content identifier
                    COperator currentElement = null;
                    COperator previousElement = null;
                    COperator nextElement = null;
                    CString newStringContent = null;
                    CSequence newContent = new CSequence();

                    // assign each content elements a MCID and put it into a 
                    // marked content block
                    for (int j = 0; j < numOfContentElements; j++)
                    {
                        currentElement = (COperator)pageContents[j];

                        if (j > 0)
                            previousElement = (COperator)pageContents[j - 1];
                        if (j > 0 && j + 1 < numOfContentElements)
                            nextElement = (COperator)pageContents[j + 1];

                        // marked content if not already marked
                        if (currentElement.OpCode.OpCodeName == OpCodeName.q &&
                            (previousElement == null || previousElement.OpCode.OpCodeName != OpCodeName.BMC))
                        {

                            /*** BEGIN CONTENT STREAM CHANGES ***/
                            String elementStructureType = "/Span";

                            COperator beginDataContent = OpCodes.OperatorFromName("BDC");
                            newStringContent = new CString();
                            newStringContent.Value = $"{elementStructureType} ";
                            newStringContent.CStringType = CStringType.Dictionary;
                            beginDataContent.Operands.Add(newStringContent);

                            newStringContent = new CString();
                            newStringContent.Value = $"<</MCID {mcid.ToString()}>> ";
                            newStringContent.CStringType = CStringType.Dictionary;
                            beginDataContent.Operands.Add(newStringContent);

                            newContent.Add(beginDataContent);
                            newContent.Add(currentElement);
                            /*** END CONTENT STREAM CHANGES ***/

                            // add structure tags to page associating it to the marked content identifier
                            PdfDictionary structElement = new PdfDictionary(pdf);
                            structElement.Elements.Add("/K", new PdfInteger(mcid));
                            structElement.Elements.Add("/Pg", page.Reference.Value);
                            structElement.Elements.Add("/S", new PdfName("/P"));
                            elementRootNode.Elements.Add(structElement);

                            tagList.Add(structElement);
                        }
                        else if (currentElement.OpCode.OpCodeName == OpCodeName.Q &&
                            (previousElement == null || nextElement.OpCode.OpCodeName != OpCodeName.EMC))
                        {
                            // add the closing tag of the marked content block
                            COperator endDataContent = OpCodes.OperatorFromName("EMC");
                            newStringContent = new CString();
                            newStringContent.Value = OpCodeName.q.ToString() + "\n";
                            newStringContent.CStringType = CStringType.Dictionary;
                            endDataContent.Operands.Add(newStringContent);

                            newContent.Add(endDataContent);
                        }
                        else
                        {
                            newContent.Add(currentElement);
                        }

                        mcid++;
                    }

                    page.Contents.ReplaceContent(newContent);

                    parentRootTree.Elements.Add(new PdfInteger(i));
                    parentRootTree.Elements.Add(elementRootNode);
                }


            }

            #region /*** BEGIN STRUCTURE TAGS ***/
            // the PdfStructureTreeRoot class is there in  https://github.com/empira/PDFsharp
            // however it's not available from PdfSharp in Nuget
            PdfDictionary structureRoot = new PdfDictionary(pdf);
            pdf.Internals.AddObject(structureRoot);
            pdf.Internals.Catalog.Elements.Add("/StructTreeRoot", structureRoot);

            //
            PdfDictionary docStructElement = new PdfDictionary(pdf);
            docStructElement.Elements.Add("/P", structureRoot.Reference);
            docStructElement.Elements.Add("/S", new PdfName("/Document"));
            pdf.Internals.AddObject(docStructElement);

            //
            PdfArray tagRefs = new PdfArray(pdf);
            foreach (var item in tagList)
            {
                item.Elements.Add("/P", docStructElement.Reference.Value);
                pdf.Internals.AddObject(item);
                tagRefs.Elements.Add(item);
            }
            pdf.Internals.AddObject(tagRefs);
            docStructElement.Elements.Add("/K", tagRefs.Reference.Value);

            PdfArray docStructTreeArray = new PdfArray(pdf);
            docStructTreeArray.Elements.Add(docStructElement);

            PdfDictionary numDict = new PdfDictionary(pdf);
            numDict.Elements.Add("/Nums", parentRootTree.Reference.Value);
            pdf.Internals.AddObject(numDict);

            structureRoot.Elements.Add("/K", docStructTreeArray);
            structureRoot.Elements.Add("/ParentTree", numDict.Reference);
            structureRoot.Elements.Add("/ParentTreeNextKey", new PdfInteger(parentRootTree.Count() + 1));
            structureRoot.Elements.Add("/Type", new PdfName("/StructTreeRoot"));

            #endregion /*** END STRUCTURE TAGS ***/

            #endregion /***** END ACCESSIBILITY REQUIREMENTS *****/


        }

Best,
Reas

Top

bandersenDK

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Wed Apr 21, 2021 9:19 am

Joined: Wed Apr 21, 2021 8:07 am
Posts: 1

Dear rsoeung

Thanks for the MakeAccessible method. It got me started on the pdf/ua accessibility.

But it´s pretty hard to determine what tags I´m missing to add. I use PAC 3(https://www.access-for-all.ch/ch/pdf-werkstatt.html) to validate my PDF.

It shows it has 62 content errors that I try to fix. But the MakeAccessible method does not have an effect on the elements. Could you please guide me to how I fix the "Text object not tagged" and "Path object not tagged" ?

Pac 3 Status(Pac3.png)https://drive.google.com/file/d/1rjNMJwYdgVmEF0hXBK21NoapQHutwy_P/view?usp=sharing
PdfFromHtml.pdf: https://drive.google.com/file/d/1_OxJlgRiKeP7iIxmQFjQKcYdrZmziUol/view?usp=sharing

Top

rsoeung

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Sun Apr 25, 2021 12:20 pm

Joined: Sat Jan 09, 2021 7:09 am
Posts: 18

Hi bandersenDK,

I use a simple tool called Pdf File Analyzer https://www.codeproject.com/Articles/450254/PDF-File-Analyzer-With-Csharp-Parsing-Classes-Vers to analyze the stream and contents of a pdf. The "Text object not tagged" and "Path object not tagged" errors refer to the contents stream within a "XObject" form. You'd have to modify the sample code to iterate through this content and add tags appropriately.

-rsoeung

Top

gastan

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Sat Jan 22, 2022 3:27 am

Supporter

Joined: Sat Oct 18, 2008 4:15 pm
Posts: 50

Hi rsoeung

thanks for your sample. I was able to make it compile when replacing parentRootTree.Count with parentRootTree.Elements.Count
It does not work on pdf that was just created (not saved) - it fails on ContentReader.ReadContent(page);
and when I save PDF and make it accessible resulting PDF is no good. First page is completely gone, some graphs are gone or moved.

Is there maybe a way to add tags as I create document ?

Top

superandi

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Tue Aug 23, 2022 6:24 pm

Joined: Tue Aug 23, 2022 6:11 pm
Posts: 1

Hi,

We are trying to create an accessible PDF using PDFSharp. But contrary to the example above (which seems to add accessibility tags in a second pass) I would like to write the accessibility tags along with the actual content.

Also, I would like to have the PDF validated with the PAC3 accessibility validation tool: https://www.access-for-all.ch/en/pdf-ac ... ecker.html

So far, it seems that I have managed to write correct(?) BDC / EMC markers to the content. Also I have managed to write a correct tag structure that is displayed in Acrobat. However, tag & content elements seem to be "unconnected". Also PAC3 complains about non-tagged text objects.

Could someone consult on how to do the correct tag generation using PDFSharp?

Thank you,

Andi

Top

gastan

Post subject: Re: May I add accessibility tags to PDF with PDFSharp?

Posted: Tue Aug 23, 2022 6:34 pm

Supporter

Joined: Sat Oct 18, 2008 4:15 pm
Posts: 50

good luck my friend. I ended up going HTML route to get accessibility

we are dropping PDFSharp and just leave it for the user to print to PDF is he needs it for some reason
but please, do post here if you figure something

Top

Page 1 of 1

[ 7 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: No registered users and 27 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum