RTF export: UTF-8 to ANSI conversion?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I try to write RTF files using text in UTF-8 encoding. Converting the  
text with utf8_decode() already fails on characters such as an  
apostrophe or an endash. Of course non-Latin-1 characters would go lost.  
Trying to understand the RTF spec I found that ISO-8859-1 is not  
available in RTF, but only the Windows 1252 codepage which differs from  
Latin-1 in some characters.

So I set the codepage to 1252 and learned that characters not contained  
in this codepage should be placed as Unicode:

For example, the text Lab[Gamma]Value (Unicode characters 0x004c,  
0x0061, 0x0062, 0x0393, 0x0056, 0x0061, 0x006c, 0x0075, 0x0065) should  
be represented as follows (assuming a previous \ucl):


Now I don't understand this anymore... What does the G after the decimal  
value mean? How should this \ucl be applied? ...

So these are actually my questions:
- Is there a good way to convert an UTF-8 string into CP1252, without  
losing the non-CP1252 character info? (mbstring is not available on that  
- Can somebody point me to an easy to understand RTF tutorial?

Thanks for any hint!

Re: RTF export: UTF-8 to ANSI conversion?

Markus napisaƂ(a):
Quoted text here. Click to load it

You will always lose some characters wile converting utf-8 to encoding  
that can handle up to 255 chars.

I haven't done anything related to RTF, but to convert UTF8 to CP-1252
you can use iconv library:

with optional //IGNORE and //TRANSLIT modifiers

Wiktor Walc
Freelance PHP Developer

Re: RTF export: UTF-8 to ANSI conversion?

iktorn schrieb:
Quoted text here. Click to load it

Thank you! From my first look at iconv, //IGNORE will make it unlossy;  
anyway there would be no way to identify the unconverted characters (in  
order to convert them to RTF Unicode syntax).

Meantime I found a solution using PEAR I18N_UnicodeString:  

This gives me an array of the decimal Unicode representations of all  
characters; so I can convert every character individually.

Site Timeline