PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.com/

Copy Elements from PdfDictionary to a New PDF
http://forum.pdfsharp.com/viewtopic.php?f=2&t=4084
Page 1 of 1

Author:  VanguardAnon [ Sat Jan 18, 2020 1:48 pm ]
Post subject:  Copy Elements from PdfDictionary to a New PDF

Hallo,

I am having issues with some PDF File manipulation.
I have a PDF file that is generated by clients, it contains some vector Data that i need to extract to a seperate PDF in order to send to a printer machine. The original PDF contains other text and images that are irrelevant to the printer but just has the specs for the artwork.

So far in my research i have found the PdfDictionary elements where the Vector Data is stored, but if i try and clone this to a new PDF the PDF result is blank.

On the dictionary i found the elements in /Resources /XObjects and then they are saved a /Form<key> elements
So i was hoping to just extract these elements to put on the new PDF like this:

Code:
 static void Main(string[] args)
        {
            string filePath = @"C:\test.pdf";
            string ox = @"C:\output.pdf";

           
            PdfDocument OriginalDocument = PdfReader.Open(filePath);

            List<PdfDictionary> VectorList = new List<PdfDictionary>();
            List<string> remove = new List<string>();
            List<PdfItem> vectors = new List<PdfItem>();


            PdfDocument PrintingDocument = new PdfDocument();

            PdfPage PrintingPage = PrintingDocument.AddPage();
            PdfDictionary PrintingDictionary = new PdfDictionary(PrintingDocument);
           


            foreach (PdfPage page in OriginalDocument.Pages)
            {
                var dict = page.Elements.KeyNames;
               

                PdfDictionary resource =  page.Elements.GetDictionary("/Resources");

                //Clone the resource Dictionary Tree
                PdfItem resourceClone = page.Elements.GetDictionary("/Resources").Clone();

                //Try add the cloned tree to new PDF
                PrintingDictionary.Elements.Add("/Resources", resourceClone);
                //assign internal dictionary like done in tutorial( http://pdfsharp.com/PDFsharp/index.php%3Foption%3Dcom_content%26task%3Dview%26id%3D20%26Itemid%3D50 )
                PrintingDocument.Internals.AddObject(PrintingDictionary);

                PrintingPage.Elements.Add("/Resources", resourceClone);


                PdfDictionary resourceTest = PrintingPage.Elements.GetDictionary("/Resources");

                if (resource != null)
                {
                    PdfDictionary objects = resource.Elements.GetDictionary("/XObject");

                    if (objects != null)
                    {
                        foreach (string itemKey in objects.Elements.Keys)
                        {
                            Console.WriteLine(itemKey);
                            if (itemKey.StartsWith("/Form"))
                            {
                                // This is where the Vector Data i need to copy to the new PDF Is

                            }
                        }

                    }
                }
            }
            PrintingDocument.Save(ox);
        }

    }

Author:  VanguardAnon [ Mon Jan 20, 2020 1:26 am ]
Post subject:  Re: Copy Elements from PdfDictionary to a New PDF

Hi, Just an Update i have been able to get the object stream for the vector data, but now i am not sure how to convert this to a new XObject that i can place on a new PDF


Code:
if (resource != null)
                {
                    PdfDictionary objects = resource.Elements.GetDictionary("/XObject");
                    if (objects != null)
                    {
                        foreach (string itemKey in objects.Elements.Keys)
                        {
                            PdfItem item = objects.Elements[itemKey];
                            var reference = item as PdfReference;

                            if (reference != null)
                            {
                                PdfDictionary xObject = reference.Value as PdfDictionary;
                               // Debug.WriteLine(xObject.Elements.GetString("/Subtype"));
                                if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Form")
                                {

                                    //Debug.WriteLine(xObject);
                                    string filter = xObject.Elements.GetName("/Filter");
                                    Debug.WriteLine("Filter " + filter);
                                    //Vector data stream
                                    byte[] bytes = xObject.Stream.Value;
                                    PdfDocument doc = new PdfDocument();
                                    Stream stream = new MemoryStream(bytes);
                                    //XPdfForm pdfForm = XPdfForm.FromStream(stream);
                                   
                                }
                            }
                        }
                    }
                }

Author:  (void) [ Mon Jan 20, 2020 5:57 pm ]
Post subject:  Re: Copy Elements from PdfDictionary to a New PDF

I doubt you will see anything on the imported page if you just copy the "Resources" Dictionary.
I suggest having a look at the "Contents" entry of the Pdf-Page. (and maybe "Annots" as well)
Taken directly from the Pdf-Spec (available for download from Adobe) where the "Contents" dictionary is described:
Quote:
A content stream (see 7.8.2, "Content Streams") that shall describe the contents of this page. If this entry is absent, the page shall be empty.

Having the Pdf-Spec at hand is generally not a bad idea...

Maybe someone is able to help you if you are able to attach the mentioned pdf.

Author:  VanguardAnon [ Mon Jan 20, 2020 11:09 pm ]
Post subject:  Re: Copy Elements from PdfDictionary to a New PDF

Thanks for the reply.
Here is a sample PDF https://www.dropbox.com/s/hp8bxr7ehxqn8 ... e.pdf?dl=0

I essentially need to either copy the 6 Vector icons which are located in OCG Layers to a new PDF, or delete all the Stroke/Vector Lines from the original file and save that.

When the artwork gets sent to a printer if the other lines are there i am not able to fill the Icons as a Vector since the outside box encloses them.

EDIT: I have discovered if using Adobe Acrobat DC if i manually move all the OCG items into the one OCG Container i get the desired results from my software. Is there a way with PDFSharp to merge these 6 OCG Groups into one?

EDIT2:

I have now found out how i can change the way the OCG Layers are referenced i believe...

When i set the Name key to PrintingLayer for each OCG element, when i import the file into the software it recognises it as a sigle layer which is great!
Code:
var properties = resource?.Elements.GetDictionary("/Properties");
                if (!(properties is null))
                {
                    foreach (var property in properties)
                    {
                        if (property.Value is PdfReference propertyValue &&
                            propertyValue.Value is PdfDictionary propertyDict)
                        {

                            // Write the key and name of the layer
                            var layerName = propertyDict.Elements.GetString("/Name");
                            propertyDict.Elements.SetName("/Name", "PrintingLayer");
                            Debug.WriteLine($"Key: {property.Key} Value: {layerName}");

                            vectors.Add(property.Key);

                        }
                    }
                }


So now the only issue is, since sometimes these 6 icons in the sample PDF can sometimes be replaced by Text Objects, the text objects are not read in onto an OCG Layer.
Is there a way for me to Parse all Text Objects, find their Position and Value and then just use the XGraphics options in PDFSharp to just re-draw the text to a layer?

Thanks!

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/