PDFsharp & MigraDoc Foundation http://forum.pdfsharp.com/ |
|
May I add accessibility tags to PDF with PDFSharp? http://forum.pdfsharp.com/viewtopic.php?f=2&t=3948 |
Page 1 of 1 |
Author: | hoop22 [ Fri Apr 26, 2019 11:20 am ] |
Post subject: | May I add accessibility tags to PDF with PDFSharp? |
Hi I'm just wondering if I can add accessibility tags to pdf with PDFSharp now. I found someone asked the same question years ago, and the answer was like this function was in the wish list then. Is it available now? If so, may you give me some tips, please? Thx a lot |
Author: | rsoeung [ Sat Jan 23, 2021 2:01 am ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
Here's a small example. This should get you started. Code: static void Main(string[] args) { string doc1 = @"C:\doc1.pdf"; CreatePdf(doc1); CSequence contents; using (var doc = PdfReader.Open(doc1, PdfDocumentOpenMode.Modify)) { MakeAccessible(doc); doc.Save(doc1); doc.Close(); } Process.Start(doc1); } public static void CreatePdf(string filename) { // Create a new PDF document PdfDocument document = new PdfDocument(); document.Info.Title = "Created with PDFsharp"; // Create an empty page PdfPage page = document.AddPage(); AddTextToPage(page); // Save the document... document.Save(filename); // ...and start a viewer. } public static void AddTextToPage(PdfPage page) { // Get an XGraphics object for drawing XGraphics gfx = XGraphics.FromPdfPage(page); // Create a font XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic, new XPdfFontOptions(PdfFontEncoding.WinAnsi)); // Draw the text gfx.DrawString("Hello, World!", font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Center); } // Please refer to the pdf tech specs on what all entails in the content stream // https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf public static void MakeAccessible(PdfDocument pdf) { // if you don't have Adobe Acrobat to run accessibility check then // first download the Free Accessibility Checker (https://www.access-for-all.ch/en/pdf-accessibility-checker.html) // or open up an online accessibility checker in a browser // There are certain requirements the document has to meet // to make it near 100% compliant // Set you pdf version to PDF ver 1.7 pdf.Version = 17; // Set the author of the document pdf.Info.Author = "Me, Myself, and I"; // Date document was created pdf.Info.CreationDate = DateTime.Now; // The app/system that created the document pdf.Info.Creator = "My PDF Generator"; // subject this document is about pdf.Info.Subject = "PDF Accessibility Test"; #region /***** BEGIN ACCESSIBILITY REQUIREMENTS *****/ pdf.Language = "en-US"; pdf.Info.Title = "PDF Accessibiity"; pdf.ViewerPreferences.DisplayDocTitle = true; pdf.ViewerPreferences.Direction = PdfReadingDirection.LeftToRight; // flag indicating the document conforms to Tagged PDF conventions PdfCatalog catalog = pdf.Internals.Catalog; PdfDictionary markInfo = new PdfDictionary(pdf); catalog.Elements.Add("/MarkInfo", markInfo); markInfo.Elements.Add("/Marked", PdfBoolean.True); // add outlines (bookmarks) // add root bookmark and set to first page as goto action pdf.Outlines.Add(pdf.Info.Title, pdf.Pages[0], true, PdfOutlineStyle.Bold); PdfArray parentRootTree = new PdfArray(pdf); pdf.Internals.AddObject(parentRootTree); List<PdfDictionary> tagList = new List<PdfDictionary>(); // create tags for content elements within each page for (int i = 0; i < pdf.Pages.Count; i++) { PdfPage page = pdf.Pages[i]; // tag page as a structured parent object and assign a unique integer identifier page.Elements.Add("/StructParents", new PdfInteger(i)); // specify a tab order. // The possible values are R(row order), C(column order), // and S (structure order) page.Elements.Add("/Tabs", new PdfName("/S")); // iterate thru each content item and assign a // marked content id and give it a structure type (Span, P, Table, TH, TR, etc.) PdfContents contents = page.Contents; if (contents != null && contents.Elements.Count > 0) { PdfArray elementRootNode = new PdfArray(pdf); pdf.Internals.AddObject(elementRootNode); CSequence pageContents = ContentReader.ReadContent(page); int numOfContentElements = pageContents.Count; int mcid = 0; //starting marked content identifier COperator currentElement = null; COperator previousElement = null; COperator nextElement = null; CString newStringContent = null; CSequence newContent = new CSequence(); // assign each content elements a MCID and put it into a // marked content block for (int j = 0; j < numOfContentElements; j++) { currentElement = (COperator)pageContents[j]; if (j > 0) previousElement = (COperator)pageContents[j - 1]; if (j > 0 && j + 1 < numOfContentElements) nextElement = (COperator)pageContents[j + 1]; // marked content if not already marked if (currentElement.OpCode.OpCodeName == OpCodeName.q && (previousElement == null || previousElement.OpCode.OpCodeName != OpCodeName.BMC)) { /*** BEGIN CONTENT STREAM CHANGES ***/ String elementStructureType = "/Span"; COperator beginDataContent = OpCodes.OperatorFromName("BDC"); newStringContent = new CString(); newStringContent.Value = $"{elementStructureType} "; newStringContent.CStringType = CStringType.Dictionary; beginDataContent.Operands.Add(newStringContent); newStringContent = new CString(); newStringContent.Value = $"<</MCID {mcid.ToString()}>> "; newStringContent.CStringType = CStringType.Dictionary; beginDataContent.Operands.Add(newStringContent); newContent.Add(beginDataContent); newContent.Add(currentElement); /*** END CONTENT STREAM CHANGES ***/ // add structure tags to page associating it to the marked content identifier PdfDictionary structElement = new PdfDictionary(pdf); structElement.Elements.Add("/K", new PdfInteger(mcid)); structElement.Elements.Add("/Pg", page.Reference.Value); structElement.Elements.Add("/S", new PdfName("/P")); elementRootNode.Elements.Add(structElement); tagList.Add(structElement); } else if (currentElement.OpCode.OpCodeName == OpCodeName.Q && (previousElement == null || nextElement.OpCode.OpCodeName != OpCodeName.EMC)) { // add the closing tag of the marked content block COperator endDataContent = OpCodes.OperatorFromName("EMC"); newStringContent = new CString(); newStringContent.Value = OpCodeName.q.ToString() + "\n"; newStringContent.CStringType = CStringType.Dictionary; endDataContent.Operands.Add(newStringContent); newContent.Add(endDataContent); } else { newContent.Add(currentElement); } mcid++; } page.Contents.ReplaceContent(newContent); parentRootTree.Elements.Add(new PdfInteger(i)); parentRootTree.Elements.Add(elementRootNode); } } #region /*** BEGIN STRUCTURE TAGS ***/ // the PdfStructureTreeRoot class is there in https://github.com/empira/PDFsharp // however it's not available from PdfSharp in Nuget PdfDictionary structureRoot = new PdfDictionary(pdf); pdf.Internals.AddObject(structureRoot); pdf.Internals.Catalog.Elements.Add("/StructTreeRoot", structureRoot); // PdfDictionary docStructElement = new PdfDictionary(pdf); docStructElement.Elements.Add("/P", structureRoot.Reference); docStructElement.Elements.Add("/S", new PdfName("/Document")); pdf.Internals.AddObject(docStructElement); // PdfArray tagRefs = new PdfArray(pdf); foreach (var item in tagList) { item.Elements.Add("/P", docStructElement.Reference.Value); pdf.Internals.AddObject(item); tagRefs.Elements.Add(item); } pdf.Internals.AddObject(tagRefs); docStructElement.Elements.Add("/K", tagRefs.Reference.Value); PdfArray docStructTreeArray = new PdfArray(pdf); docStructTreeArray.Elements.Add(docStructElement); PdfDictionary numDict = new PdfDictionary(pdf); numDict.Elements.Add("/Nums", parentRootTree.Reference.Value); pdf.Internals.AddObject(numDict); structureRoot.Elements.Add("/K", docStructTreeArray); structureRoot.Elements.Add("/ParentTree", numDict.Reference); structureRoot.Elements.Add("/ParentTreeNextKey", new PdfInteger(parentRootTree.Count() + 1)); structureRoot.Elements.Add("/Type", new PdfName("/StructTreeRoot")); #endregion /*** END STRUCTURE TAGS ***/ #endregion /***** END ACCESSIBILITY REQUIREMENTS *****/ } Best, Reas |
Author: | bandersenDK [ Wed Apr 21, 2021 9:19 am ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
Dear rsoeung Thanks for the MakeAccessible method. It got me started on the pdf/ua accessibility. But it´s pretty hard to determine what tags I´m missing to add. I use PAC 3(https://www.access-for-all.ch/ch/pdf-werkstatt.html) to validate my PDF. It shows it has 62 content errors that I try to fix. But the MakeAccessible method does not have an effect on the elements. Could you please guide me to how I fix the "Text object not tagged" and "Path object not tagged" ? Pac 3 Status(Pac3.png)https://drive.google.com/file/d/1rjNMJwYdgVmEF0hXBK21NoapQHutwy_P/view?usp=sharing PdfFromHtml.pdf: https://drive.google.com/file/d/1_OxJlgRiKeP7iIxmQFjQKcYdrZmziUol/view?usp=sharing |
Author: | rsoeung [ Sun Apr 25, 2021 12:20 pm ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
Hi bandersenDK, I use a simple tool called Pdf File Analyzer https://www.codeproject.com/Articles/450254/PDF-File-Analyzer-With-Csharp-Parsing-Classes-Vers to analyze the stream and contents of a pdf. The "Text object not tagged" and "Path object not tagged" errors refer to the contents stream within a "XObject" form. You'd have to modify the sample code to iterate through this content and add tags appropriately. -rsoeung |
Author: | gastan [ Sat Jan 22, 2022 3:27 am ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
Hi rsoeung thanks for your sample. I was able to make it compile when replacing parentRootTree.Count with parentRootTree.Elements.Count It does not work on pdf that was just created (not saved) - it fails on ContentReader.ReadContent(page); and when I save PDF and make it accessible resulting PDF is no good. First page is completely gone, some graphs are gone or moved. Is there maybe a way to add tags as I create document ? |
Author: | superandi [ Tue Aug 23, 2022 6:24 pm ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
Hi, We are trying to create an accessible PDF using PDFSharp. But contrary to the example above (which seems to add accessibility tags in a second pass) I would like to write the accessibility tags along with the actual content. Also, I would like to have the PDF validated with the PAC3 accessibility validation tool: https://www.access-for-all.ch/en/pdf-ac ... ecker.html So far, it seems that I have managed to write correct(?) BDC / EMC markers to the content. Also I have managed to write a correct tag structure that is displayed in Acrobat. However, tag & content elements seem to be "unconnected". Also PAC3 complains about non-tagged text objects. Could someone consult on how to do the correct tag generation using PDFSharp? Thank you, Andi |
Author: | gastan [ Tue Aug 23, 2022 6:34 pm ] |
Post subject: | Re: May I add accessibility tags to PDF with PDFSharp? |
good luck my friend. I ended up going HTML route to get accessibility we are dropping PDFSharp and just leave it for the user to print to PDF is he needs it for some reason but please, do post here if you figure something |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |