Re: RTF to Docx
Posted: Tue Mar 04, 2025 7:42 am
I tried to make experiments to understand how the text is damaged, i.e., how to get "PROCURAÇÃo" from "PROCURAÇÃO".
The original encoding in RTF is CP1252.
I found that this result is achieved with the following steps of conversions:
1. CP1252 -> UTF8
Most probably, this conversion is made by a database.
At this step, we have a damaged RTF containing UTF8 characters where ANSI characters should be.
2. TRichView loads this RTF, thinking that it contains CP1252 characters, but it contains UTF8.
Good news: this conversion is repairable. I'll post code how to convert such RTF documents to DocX later today.
The original encoding in RTF is CP1252.
I found that this result is achieved with the following steps of conversions:
1. CP1252 -> UTF8
Most probably, this conversion is made by a database.
At this step, we have a damaged RTF containing UTF8 characters where ANSI characters should be.
2. TRichView loads this RTF, thinking that it contains CP1252 characters, but it contains UTF8.
Good news: this conversion is repairable. I'll post code how to convert such RTF documents to DocX later today.