RTF to Docx

General TRichView support forum. Please post your questions here
Sergey Tkachenko
Site Admin
Posts: 17805
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

I tried to make experiments to understand how the text is damaged, i.e., how to get "PROCURAÇÃo" from "PROCURAÇÃO".
The original encoding in RTF is CP1252.
I found that this result is achieved with the following steps of conversions:
1. CP1252 -> UTF8
Most probably, this conversion is made by a database.
At this step, we have a damaged RTF containing UTF8 characters where ANSI characters should be.
2. TRichView loads this RTF, thinking that it contains CP1252 characters, but it contains UTF8.

Good news: this conversion is repairable. I'll post code how to convert such RTF documents to DocX later today.
Sergey Tkachenko
Site Admin
Posts: 17805
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

The simplest way to load this stream is assigning
RichView1.RTFReadProperties.DefCharsetCodePage := CP_UTF8;
(instead of 1252)

Please note that this setting should be made only for loading documents from this database. Valid RTF document must not contain UTF-8 characters.
Sergey Tkachenko
Site Admin
Posts: 17805
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Sorry, the solution in my previous reply is not good. It fixes the problem with visible text, but does not fix problems with non-English characters in names of styles, bookmarks, etc.

The complete solution is converting RTF code from UTF-8 to code page 1252 before loading.
Let you have a document from DB in _Stream.
The code for loading is:

Code: Select all

uses RVUni;

var
  Stream2: TStream;
  s: AnsiString;
begin
  _Stream.Position := 0;
  // reading UTF8 string from _Stream
  s := RVU_AnsiStreamToAnsiString(_Stream);
  // converting s from UTF-8 to CP 1252 and writing to Stream2
  Stream2 := RVU_AnsiStringToAnsiStream(
    RVU_UnicodeToAnsi(1252, UTF8ToUnicodeString(s)));
  // loading from Stream2
  Stream2.Position := 0;
  RichView1.Clear;
  RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
  RichView1.LoadFromStream(Stream2, rvynaAuto);
  RichView1.Format;
  Stream2.Free;
end;
Post Reply