PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Mon Feb 17, 2020 1:30 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Sat Jan 18, 2020 1:48 pm 
Offline

Joined: Sat Jan 18, 2020 1:34 pm
Posts: 5
Hallo,

I am having issues with some PDF File manipulation.
I have a PDF file that is generated by clients, it contains some vector Data that i need to extract to a seperate PDF in order to send to a printer machine. The original PDF contains other text and images that are irrelevant to the printer but just has the specs for the artwork.

So far in my research i have found the PdfDictionary elements where the Vector Data is stored, but if i try and clone this to a new PDF the PDF result is blank.

On the dictionary i found the elements in /Resources /XObjects and then they are saved a /Form<key> elements
So i was hoping to just extract these elements to put on the new PDF like this:

Code:
 static void Main(string[] args)
        {
            string filePath = @"C:\test.pdf";
            string ox = @"C:\output.pdf";

           
            PdfDocument OriginalDocument = PdfReader.Open(filePath);

            List<PdfDictionary> VectorList = new List<PdfDictionary>();
            List<string> remove = new List<string>();
            List<PdfItem> vectors = new List<PdfItem>();


            PdfDocument PrintingDocument = new PdfDocument();

            PdfPage PrintingPage = PrintingDocument.AddPage();
            PdfDictionary PrintingDictionary = new PdfDictionary(PrintingDocument);
           


            foreach (PdfPage page in OriginalDocument.Pages)
            {
                var dict = page.Elements.KeyNames;
               

                PdfDictionary resource =  page.Elements.GetDictionary("/Resources");

                //Clone the resource Dictionary Tree
                PdfItem resourceClone = page.Elements.GetDictionary("/Resources").Clone();

                //Try add the cloned tree to new PDF
                PrintingDictionary.Elements.Add("/Resources", resourceClone);
                //assign internal dictionary like done in tutorial( http://pdfsharp.com/PDFsharp/index.php%3Foption%3Dcom_content%26task%3Dview%26id%3D20%26Itemid%3D50 )
                PrintingDocument.Internals.AddObject(PrintingDictionary);

                PrintingPage.Elements.Add("/Resources", resourceClone);


                PdfDictionary resourceTest = PrintingPage.Elements.GetDictionary("/Resources");

                if (resource != null)
                {
                    PdfDictionary objects = resource.Elements.GetDictionary("/XObject");

                    if (objects != null)
                    {
                        foreach (string itemKey in objects.Elements.Keys)
                        {
                            Console.WriteLine(itemKey);
                            if (itemKey.StartsWith("/Form"))
                            {
                                // This is where the Vector Data i need to copy to the new PDF Is

                            }
                        }

                    }
                }
            }
            PrintingDocument.Save(ox);
        }

    }


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 1:26 am 
Offline

Joined: Sat Jan 18, 2020 1:34 pm
Posts: 5
Hi, Just an Update i have been able to get the object stream for the vector data, but now i am not sure how to convert this to a new XObject that i can place on a new PDF


Code:
if (resource != null)
                {
                    PdfDictionary objects = resource.Elements.GetDictionary("/XObject");
                    if (objects != null)
                    {
                        foreach (string itemKey in objects.Elements.Keys)
                        {
                            PdfItem item = objects.Elements[itemKey];
                            var reference = item as PdfReference;

                            if (reference != null)
                            {
                                PdfDictionary xObject = reference.Value as PdfDictionary;
                               // Debug.WriteLine(xObject.Elements.GetString("/Subtype"));
                                if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Form")
                                {

                                    //Debug.WriteLine(xObject);
                                    string filter = xObject.Elements.GetName("/Filter");
                                    Debug.WriteLine("Filter " + filter);
                                    //Vector data stream
                                    byte[] bytes = xObject.Stream.Value;
                                    PdfDocument doc = new PdfDocument();
                                    Stream stream = new MemoryStream(bytes);
                                    //XPdfForm pdfForm = XPdfForm.FromStream(stream);
                                   
                                }
                            }
                        }
                    }
                }


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 5:57 pm 
Offline

Joined: Tue Sep 30, 2014 12:29 pm
Posts: 25
I doubt you will see anything on the imported page if you just copy the "Resources" Dictionary.
I suggest having a look at the "Contents" entry of the Pdf-Page. (and maybe "Annots" as well)
Taken directly from the Pdf-Spec (available for download from Adobe) where the "Contents" dictionary is described:
Quote:
A content stream (see 7.8.2, "Content Streams") that shall describe the contents of this page. If this entry is absent, the page shall be empty.

Having the Pdf-Spec at hand is generally not a bad idea...

Maybe someone is able to help you if you are able to attach the mentioned pdf.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 11:09 pm 
Offline

Joined: Sat Jan 18, 2020 1:34 pm
Posts: 5
Thanks for the reply.
Here is a sample PDF https://www.dropbox.com/s/hp8bxr7ehxqn8 ... e.pdf?dl=0

I essentially need to either copy the 6 Vector icons which are located in OCG Layers to a new PDF, or delete all the Stroke/Vector Lines from the original file and save that.

When the artwork gets sent to a printer if the other lines are there i am not able to fill the Icons as a Vector since the outside box encloses them.

EDIT: I have discovered if using Adobe Acrobat DC if i manually move all the OCG items into the one OCG Container i get the desired results from my software. Is there a way with PDFSharp to merge these 6 OCG Groups into one?

EDIT2:

I have now found out how i can change the way the OCG Layers are referenced i believe...

When i set the Name key to PrintingLayer for each OCG element, when i import the file into the software it recognises it as a sigle layer which is great!
Code:
var properties = resource?.Elements.GetDictionary("/Properties");
                if (!(properties is null))
                {
                    foreach (var property in properties)
                    {
                        if (property.Value is PdfReference propertyValue &&
                            propertyValue.Value is PdfDictionary propertyDict)
                        {

                            // Write the key and name of the layer
                            var layerName = propertyDict.Elements.GetString("/Name");
                            propertyDict.Elements.SetName("/Name", "PrintingLayer");
                            Debug.WriteLine($"Key: {property.Key} Value: {layerName}");

                            vectors.Add(property.Key);

                        }
                    }
                }


So now the only issue is, since sometimes these 6 icons in the sample PDF can sometimes be replaced by Text Objects, the text objects are not read in onto an OCG Layer.
Is there a way for me to Parse all Text Objects, find their Position and Value and then just use the XGraphics options in PDFSharp to just re-draw the text to a layer?

Thanks!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group