Unicode in TRichView

<< Click to display table of contents >>

Unicode in TRichView

Introduction

Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code.

For Delphi 2009 or newer, Unicode is a default encoding for strings.

Unicode and ANSI Text in TRichView

All strings in TRichView are Unicode strings.

Import and Export

Text Files

LoadText, LoadTextFromStream load ANSI text files. A code page for conversion to Unicode is specified in the optional parameter.

LoadTextW, LoadTextFromStreamW load Unicode text files.

Note:  you can test file with the function

function RV_TestFileUnicode(const FileName: TRVUnicodeString): TRVUnicodeTestResult

defined in RVUni.pas.

Return values

rvutNo the file is not Unicode (odd size);

rvutYes the file is most likely Unicode (UTF-16) (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked));

rvutProbably the file can contain Unicode (even size);

rvutEmpty the file is empty;

rvutError error opening the file.

You can also use WinAPI function IsTextUnicode performing more advanced tests.

SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property.

SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets.

RTF (Rich Text Format) and DocX files

RTF and DocX files can contain Unicode text.

HTML

SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased; so it's highly recommended to save HTML in UTF-8 encoding.

Selection, Search and The Clipboard

GetSelTextA returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property.

GetSelTextW returns selection as a Unicode string.

Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchTextA/SearchTextW; however, SearchTextA simply converts the string to Unicode (using Style.DefCodePage) and calls SearchTextW.

CopyTextA copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property.

CopyTextW copies selection as Unicode.

Copy and CopyDef are copy Unicode (Option-rvoAutoCopyUnicodeText)

Editing Operations

If pasting text using Paste method, and text is available in Clipboard, the method pastes Unicode text.

PasteTextA pastes ANSI text, PasteTextW pastes Unicode text.

InsertTextFromFile: the file must be ANSI (converted, if needed)

InsertOEMTextFromFile: the file must be OEM (converted, if needed)

InsertTextFromFileW: the file must be Unicode (converted, if needed)

InsertText, InsertStringTag add Unicode string in Delphi/C++Builder 2009+ and ANSI string in older versions of Delphi/C++Builder.

InsertTextA, InsertStringATag add ANSI string  (converted, if needed)

InsertTextW, InsertStringWTag add Unicode string (converted, if needed)

RVF (RichView Format)

Applications compiled with older versions of TRichView (version less than 1.2) will not be able to load RVF files with Unicode.

RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required (for example, this conversion will occur when loading old RVF files in applications compiled in Delphi/C++Builder 2009+). There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place.

TRichView v11 introduces a new change in RVF files allowing to store String properties as Unicode. RVF files saved in Delphi/C++Builder 2009+ are saved as RVF version 1.3.1, RVF files saved in the older versions of Delphi/C++Builder are saved as RVF version 1.3.

See also...

Example how to load UTF-8 files.