I tried to make experiments to understand how the text is damaged, i.e., how to get "PROCURAÇÃo" from "PROCURAÇÃO".
The original encoding in RTF is CP1252.
I found that this result is achieved with the following steps of conversions:
1. CP1252 -> UTF8
Most probably, this conversion is made by a database.
At this step, we have a damaged RTF containing UTF8 characters where ANSI characters should be.
2. TRichView loads this RTF, thinking that it contains CP1252 characters, but it contains UTF8.
Good news: this conversion is repairable. I'll post code how to convert such RTF documents to DocX later today.
RTF to Docx
-
- Site Admin
- Posts: 17805
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
-
- Site Admin
- Posts: 17805
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
The simplest way to load this stream is assigning
RichView1.RTFReadProperties.DefCharsetCodePage := CP_UTF8;
(instead of 1252)
Please note that this setting should be made only for loading documents from this database. Valid RTF document must not contain UTF-8 characters.
RichView1.RTFReadProperties.DefCharsetCodePage := CP_UTF8;
(instead of 1252)
Please note that this setting should be made only for loading documents from this database. Valid RTF document must not contain UTF-8 characters.
-
- Site Admin
- Posts: 17805
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Sorry, the solution in my previous reply is not good. It fixes the problem with visible text, but does not fix problems with non-English characters in names of styles, bookmarks, etc.
The complete solution is converting RTF code from UTF-8 to code page 1252 before loading.
Let you have a document from DB in _Stream.
The code for loading is:
The complete solution is converting RTF code from UTF-8 to code page 1252 before loading.
Let you have a document from DB in _Stream.
The code for loading is:
Code: Select all
uses RVUni;
var
Stream2: TStream;
s: AnsiString;
begin
_Stream.Position := 0;
// reading UTF8 string from _Stream
s := RVU_AnsiStreamToAnsiString(_Stream);
// converting s from UTF-8 to CP 1252 and writing to Stream2
Stream2 := RVU_AnsiStringToAnsiStream(
RVU_UnicodeToAnsi(1252, UTF8ToUnicodeString(s)));
// loading from Stream2
Stream2.Position := 0;
RichView1.Clear;
RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
RichView1.LoadFromStream(Stream2, rvynaAuto);
RichView1.Format;
Stream2.Free;
end;