PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Fri May 03, 2024 3:38 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Mon Dec 08, 2008 9:43 am 
Offline

Joined: Wed Dec 03, 2008 11:59 am
Posts: 6
Hello all,

I had a German text from a textarea with ö, ä, ß, ü, etc. But they are written in the PDF like a square.

I create a Migradoc document and then I render it with PdfDocumentRenderer in the following way:


Code:
//First I parse the HTML text

htmlText = htmlText.Replace("&#196 ;", "Ä");
htmlText= htmlText.Replace("&#203 ;", "Ë");
htmlText= htmlText.Replace("&#207 ;", "Ï");
htmlText= htmlText.Replace("&#214 ;", "Ö");
htmlText= htmlText.Replace("&#220 ;", "Ü");

htmlText = htmlText.Replace("&#228 ;", "ä");
htmlText = htmlText.Replace("&#235 ;", "ë");
htmlText = htmlText.Replace("&#239 ;", "ï");
htmlText = htmlParagraphs.Replace("&#246 ;", "o");
htmlText = htmlParagraphs.Replace("&#252 ;", "ü");
htmlText = htmlParagraphs.Replace("&#223 ;", "ß");
       


Document document = new Document();

//Then I create the sections and paragraphs with the text
[...]

//Finally I create the PdfDocumentRenderer object like this:

PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
renderer.Document = document;
renderer.RenderDocument();

//And send it to the browser
Response.Clear();
Response.ClearContent();
Response.ClearHeaders();
Response.Buffer = true;
Response.ContentType = "application/pdf";
Response.AddHeader("content-length", stream.Length.ToString());
Response.BinaryWrite(stream.ToArray());
Response.Flush();
stream.Close();
Response.End();



[NOTE: in the original code there is no space between the code and the semicolon (ex: &#196 ;) but i did it like this to avoid the browser codes it.]

But as I said before the diaeresis and other special characters are displayed as empty square 􀀀.

Thank you!

_________________
MPasc


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 08, 2008 1:38 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3097
Location: Cologne, Germany
Hi!

Could this be the error:
Code:
htmlText = htmlText.Replace("&#239 ;", "ï");
htmlText = htmlParagraphs.Replace("&#246 ;", "o");


All previous replacements at htmlText are overwritten with the new assignment from htmlParagraphs.

All ANSI characters should work (be sure to activate Unicode if you want to include non-ANSI characters).

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 08, 2008 1:44 pm 
Offline

Joined: Wed Dec 03, 2008 11:59 am
Posts: 6
Thomas Hoevel wrote:
Hi!

Could this be the error:
Code:
htmlText = htmlText.Replace("&#239 ;", "ï");
htmlText = htmlParagraphs.Replace("&#246 ;", "o");


All previous replacements at htmlText are overwritten with the new assignment from htmlParagraphs.

All ANSI characters should work (be sure to activate Unicode if you want to include non-ANSI characters).


Sorry, the htmlParagraphs was the original name of the variable and I change it here to make my explanation clearer. So the original code is:

Code:
        htmlParagraphs = htmlParagraphs.Replace("Ä", "Ä");
        htmlParagraphs = htmlParagraphs.Replace("Ë", "Ë");
        htmlParagraphs = htmlParagraphs.Replace("Ï", "Ï");
        htmlParagraphs = htmlParagraphs.Replace("Ö", "Ö");
        htmlParagraphs = htmlParagraphs.Replace("Ü", "Ü");

        htmlParagraphs = htmlParagraphs.Replace("ä", "ä");
        htmlParagraphs = htmlParagraphs.Replace("ë", "ë");
        htmlParagraphs = htmlParagraphs.Replace("ï", "ï");
        htmlParagraphs = htmlParagraphs.Replace("ö", "o");
        htmlParagraphs = htmlParagraphs.Replace("ü", "ü");
        htmlParagraphs = htmlParagraphs.Replace("ß", "ß");

htmlParagrapsh is just a String with the html coded text.

However, you mentioned that I should make sure to activate Unicode. When I create the PdfDocumentRenderer I set it like:

Code:
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);


Should I do any other thing to activate unicode then?

Thank you!![/code]

_________________
MPasc


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 08, 2008 3:24 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3097
Location: Cologne, Germany
Mpasc wrote:
But as I said before the diaeresis and other special characters are displayed as empty square ��.

I guess I was on the wrong track.

Which font do you use?
The empty square is normally the default character for anything that's not implemented in a font.
The default font for MigraDoc is "Verdana".

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 09, 2008 8:58 am 
Offline

Joined: Wed Dec 03, 2008 11:59 am
Posts: 6
Hello,

I use Arial.

Following the example HelloMigradoc I define the style in a method like this:

Code:
 public static void DefineStyles(Document
{
            MigraDoc.DocumentObjectModel.Style style;

            // Get the predefined style Normal.
            style = document.Styles["Normal"];

            // Modify the style
            style.Font.Name = "Arial";
            style.Font.Size = 10;
            style.Font.Bold = false;
            style.ParagraphFormat.Alignment = ParagraphAlignment.Justify;
            style.ParagraphFormat.SpaceBefore = 12;
            style.ParagraphFormat.SpaceAfter = 12;

            //Style for Heading1

            style = document.Styles["Heading1"];
            style.Font.Name = "Arial";
            style.Font.Size = 14;
            style.Font.Bold = true;
            style.Font.Color = Colors.DarkBlue;
            style.ParagraphFormat.PageBreakBefore = true;
            style.ParagraphFormat.SpaceAfter = 6;

            // Create a new style called TextBox based on style Normal
            style = document.Styles.AddStyle("TextBox", "Normal");
            style.Font.Bold = true;
            style.Font.Size = 40;
           
            style.ParagraphFormat.Borders.Width = 2.5;
            style.ParagraphFormat.Borders.Distance = 3;
}


And then, in another method, I create the paragraphs and set the style:

Code:
            public static Paragraph CreateParagraph(Document document, String text, String style)
        {
            //the style parameter is Normal or TextBox
           
            Paragraph paragraph = document.LastSection.AddParagraph();
            paragraph.Style = style;
            paragraph.AddFormattedText(HTMLParser.getUntaggedText(text), style);

            return paragraph;
           
        }


The results are:
    - The TextBox style does not work (all text has Normal style then)
    - The vowels with diaeresis are still replaced by blank squares

Any clue?

Thank you very much.

_________________
MPasc


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 09, 2008 10:00 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3097
Location: Cologne, Germany
Mpasc wrote:
And, besides, I still see the blank squares instead of the diaeresis.

Umlaute do work with PDFsharp.
Are Umlaute handled correctly in your source code? Visual Studio I presume? Did you set file encoding to UTF-8?
Do you see correct strings in the Debugger?

Did you try to save a PDF file on the server? Check the Umlaute there.
Maybe they get lost while transfering the file from the server to the client.

Have you tried using HtmlDecode instead of replacing the characters?

BTW: "Normal" is the default style that is used if the Style of a paragraph is null.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 10, 2008 2:29 pm 
Offline

Joined: Wed Dec 03, 2008 11:59 am
Posts: 6
Hello,

I already take the text from the textarea with HtmlDecode. However, I get the diaeresis with the code. For example, I get "&#239 ;"(with no blank spaces between the characters) for ï, so I tried to replace them as I did.

I have debugged the application and I can see that the string has the blank squares already in the server, so it is not due to that they are being lost during the transfer to the client.

Regarding to your other suggestions:
    - What is umlaute?
    - I work with Microsoft Visual Studio 2005.
    - How can I encode to UTF-8? Should I set it somehow in the Migradoc document?


Thank you very much for all you help!

_________________
MPasc


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 11, 2008 8:56 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3097
Location: Cologne, Germany
Hello!
Mpasc wrote:
I have debugged the application and I can see that the string has the blank squares already in the server

That leaves me rather clueless.

So I'd say there are two possible explanations:
  • The special characters are already lost while replacing
  • The special characters do not exist in the fonts on the server
  • With remote debugging: maybe characters get lost between server and debugger client


AFAIK a "string" in C# is always Unicode. Special characters are no problem for C#.

There's no diaresis in German. We have ÄÖÜäöü and call them "Umlaute". The ligature ß is a different story, but PDFsharp handles all these characters correctly (Unicode mode or not).

As long as you see blank squares in the debugger try to cure the problem in the C# code on the server. It can't be a problem of the HTML response settings or the MigraDoc settings.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 85 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group