rvico TRichView Reference | Overview

Unicode in RichView

Top  Previous  Next

Introduction

Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code. Because each Unicode character is 16 bits wide (in UTF-16 encoding), it is possible to have separate values for up to 65,536 characters. Unicode-enabled functions are often referred to as "wide-character" functions.

For Delphi 2009 or newer, Unicode is a default encoding for strings.

Unicode strings are referred here as 'Unicode'

Single-byte strings are referred here as 'ANSI' (for simplicity)

Unicode and ANSI Text in TRichView

Not all strings in TRichView are Unicode strings.

Text in text items can be Unicode or ANSI depending on Unicode property of text style.

Text (item name) of non-text items is always ANSI.

The following text depends on version of Delphi/C++Builder (Unicode for Delphi/C++Builder 2009 or newer, ANSI for older versions):

tag strings;

extra item properties;

names of checkpoints;

visible text in labels, numbered sequences, endnotes, footnotes;

live spelling interface;

text in list markers;

hypertext targets (OnReadHyperlink, OnWriteHyperlink);

and others.

Main Limitations of the Current Implementation

You must prevent conversion of Unicode to double-byte character set (DBCS) strings, used for representation of characters in Asian languages, because DBCS is not supported by RichView. The only exception (where conversion is ok) is exporting and saving (because in these cases DBCS text will not be used in TRichView).

How to Enable Unicode. Using Both ANSI and Unicode

Set Unicode property of text style to True. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer.

Document can contain both Unicode and ANSI text (in different styles).

So, you can mix ANSI and Unicode text. Of course, you can use only ANSI or only Unicode styles. This is even recommended.

How to Make Unicode Editor (Without ANSI Text)

1.Set Unicode property to True for all TextStyles in TRVStyle. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer.

2.Set RichViewEdit1.RTFReadProperties.UnicodeMode = rvruOnlyUnicode (this is the default value of this property for Delphi/C++Builder 2009 or newer).

3.Many methods working with text have 3 versions:

with TRVUnicodeString parameters (finished with -W, for example SearchTextW);

with TRVAnsiString parameters (finished with -A, for example SearchTextA);

with String parameters (for example, SearchText).

For Delphi/C++Builder versions prior to 2009, use TRVUnicodeString-methods. For Delphi/C++Builder, you can use either TRVUnicodeString-methods or String-methods. Avoid using TRVAnsiString-methods to prevent conversion between Unicode and ANSI text.

These methods include the following methods of TRichView (methods names without -A and -W are listed):

AddNLTag and its versions;

AddTextNL;

GetItemText;

GetSelText;

GetWordAt;

SearchText;

SetItemText.

These methods include the following methods of TRichViewEdit (methods names without -A and -W are listed):

GetCurrentItemText;

InsertStringTag;

InsertText;

PasteText;

SearchText;

SetItemTextEd;

SetCurrentItemText.

4.Existing non-Unicode RVF documents must be converted to Unicode by calling ConvertToUnicode after loading them (see below).

This step is not necessary for Delphi/C++Builder 2009: all text styles in RVF documents saved by applications compiled with older version of Delphi/C++Builder are converted to Unicode automatically.

It's safe to call this procedure for Unicode documents – it will do nothing.

hmtoggle_arrow1Example: converting TRichView document to Unicode

Unicode in Delphi/C++Builder 2009 or newer

In the new versions of Delphi/C++Builder, the String type is Unicode by default.

Many properties and parameters in TRichView become Unicode, see "Unicode and ANSI Text in TRichView" above.

Default (initial) values of some properties are changed:

Unicode property of text style (from False to True);

TRichView.RTFReadProperties.UnicodeMode (from rvruNoUnicode to rvruOnlyUnicode);

TRichView.Options (rvoAutoCopyUnicodeText is included, rvoAutoCopyText is excluded).

When saving text styles (in RVF files or Delphi forms) in older versions of Delphi/C++Builder, only non-default value (True) of Unicode property of text style is saved. When saving text styles (in RVF files or Delphi forms) in Delphi/C++Builder 2009+, value of Unicode property is always saved, default or not. The main consequence is the following: when loading forms/RVF files with styles saved by older versions of Delphi/C++Builder in Delphi/C++Builder 2009+, Unicode property of all text styles become True. For RVF files, all text in text items is converted to Unicode automatically.

ANSI text may appear in document when reading RTF files, if TRichView.RTFReadProperties.UnicodeMode<>rvruOnlyUnicode. If you use projects converted from the older version of Delphi/C++Builder, check a value of this property.

Import and Export

Text Files

LoadText, LoadTextFromStream load ANSI text files. When loading to Unicode style, they perform conversion from ANSI to Unicode.

LoadTextW, LoadTextFromStreamW load Unicode text files. When loading to non-Unicode style, they perform conversion from Unicode  to ANSI.

Code page used for conversion is based on Charset property of the corresponding style (Charsets of Unicode styles are used only for conversion to/from ANSI).

Note:  you can test file with the function

function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult

defined in RVUni.pas.

Return values

rvutNo the file is not Unicode (odd size);

rvutYes the file is most likely Unicode (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked));

rvutProbably the file can contain Unicode (even size);

rvutEmpty the file is empty;

rvutError error opening the file.

You can also use WinAPI function IsTextUnicode performing more advanced tests.

SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property.

SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets.

RTF (Rich Text Format)

Methods for RTF saving are able to store Unicode.

Methods for RTF loading and inserting work depending on TRichView.RTFReadProperties.UnicodeMode.

HTML

SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased; so it's highly recommended to save HTML in UTF-8 encoding.

Selection, Search and The Clipboard

GetSelTextA returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property.

GetSelTextW returns selection as a Unicode string. ANSI strings are converted basing on corresponding Charsets.

Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchTextA/SearchTextW, TRichViewEdit.SearchTextA/SearchTextW. All methods can search both in ANSI and Unicode text items. When comparing ANSI text with Unicode text, SearchText methods use Style.DefCodePage property, SearchText methods use text Charsets.

CopyTextA copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property.

CopyTextW copies selection as Unicode. ANSI strings are converted basing on corresponding Charsets.

None: on NT-based systems (such as Windows XP), the Clipboard is able to convert Unicode text to ANSI  text and vice versa. So, if you copy in one of these formats, both formats are available for pasting.

Copy and CopyDef are able to copy Unicode (Option-rvoAutoCopyUnicodeText)

Editing Operations

If pasting text using Paste method, and both ANSI and Unicode texts are available in Clipboard, then the choice is made depending on the current text style  (Unicode or not).

PasteTextA pastes ANSI text, PasteTextW pastes Unicode text.

InsertTextFromFile: the file must be ANSI (converted, if needed)

InsertOEMTextFromFile: the file must be OEM (converted, if needed)

InsertTextFromFileW: the file must be Unicode (converted, if needed)

InsertText, InsertStringTag add Unicode string in Delphi/C++Builder 2009+ and ANSI string in older versions of Delphi/C++Builder.

InsertTextA, InsertStringATag add ANSI string  (converted, if needed)

InsertTextW, InsertStringWTag add Unicode string (converted, if needed)

RVF (RichView Format)

Applications compiled with older versions of RichView (version less than 1.2) will not be able to load RVF files with Unicode.

RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required (for example, this conversion will occur when loading old RVF files in applications compiled in Delphi/C++Builder 2009+). There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place.

TRichView v11 introduces a new change in RVF files allowing to store String properties as Unicode. RVF files saved in Delphi/C++Builder 2009+ are saved as RVF version 1.3.1, RVF files saved in the older versions of Delphi/C++Builder are saved as RVF version 1.3.

See also...

TRVStyle.DefUnicodeStyle;

Example how to load UTF-8 files.


TRichView © trichview.com