Unicode in RichView
|Top Previous Next|
Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code. Because each Unicode character is 16 bits wide (in UTF-16 encoding), it is possible to have separate values for up to 65,536 characters. Unicode-enabled functions are often referred to as "wide-character" functions.
For Delphi 2009 or newer, Unicode is a default encoding for strings.
Unicode strings are referred here as 'Unicode'
Single-byte strings are referred here as 'ANSI' (for simplicity)
Unicode and ANSI Text in TRichView
Not all strings in TRichView are Unicode strings.
Text (item name) of non-text items is always ANSI.
The following text depends on version of Delphi/C++Builder (Unicode for Delphi/C++Builder 2009 or newer, ANSI for older versions):
▪names of checkpoints;
▪live spelling interface;
▪text in list markers;
Main Limitations of the Current Implementation
You must prevent conversion of Unicode to double-byte character set (DBCS) strings, used for representation of characters in Asian languages, because DBCS is not supported by RichView. The only exception (where conversion is ok) is exporting and saving (because in these cases DBCS text will not be used in TRichView).
How to Enable Unicode. Using Both ANSI and Unicode
Set Unicode property of text style to True. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer.
Document can contain both Unicode and ANSI text (in different styles).
So, you can mix ANSI and Unicode text. Of course, you can use only ANSI or only Unicode styles. This is even recommended.
How to Make Unicode Editor (Without ANSI Text)
1.Set Unicode property to True for all TextStyles in TRVStyle. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer.
3.Many methods working with text have 3 versions:
▪with TRVUnicodeString parameters (finished with -W, for example SearchTextW);
▪with TRVAnsiString parameters (finished with -A, for example SearchTextA);
▪with String parameters (for example, SearchText).
For Delphi/C++Builder versions prior to 2009, use TRVUnicodeString-methods. For Delphi/C++Builder, you can use either TRVUnicodeString-methods or String-methods. Avoid using TRVAnsiString-methods to prevent conversion between Unicode and ANSI text.
These methods include the following methods of TRichView (methods names without -A and -W are listed):
▪AddNLTag and its versions;
These methods include the following methods of TRichViewEdit (methods names without -A and -W are listed):
4.Existing non-Unicode RVF documents must be converted to Unicode by calling ConvertToUnicode after loading them (see below).
This step is not necessary for Delphi/C++Builder 2009: all text styles in RVF documents saved by applications compiled with older version of Delphi/C++Builder are converted to Unicode automatically.
It's safe to call this procedure for Unicode documents – it will do nothing.
Unicode in Delphi/C++Builder 2009 or newer
In the new versions of Delphi/C++Builder, the String type is Unicode by default.
Many properties and parameters in TRichView become Unicode, see "Unicode and ANSI Text in TRichView" above.
Default (initial) values of some properties are changed:
▪Unicode property of text style (from False to True);
▪TRichView.Options (rvoAutoCopyUnicodeText is included, rvoAutoCopyText is excluded).
When saving text styles (in RVF files or Delphi forms) in older versions of Delphi/C++Builder, only non-default value (True) of Unicode property of text style is saved. When saving text styles (in RVF files or Delphi forms) in Delphi/C++Builder 2009+, value of Unicode property is always saved, default or not. The main consequence is the following: when loading forms/RVF files with styles saved by older versions of Delphi/C++Builder in Delphi/C++Builder 2009+, Unicode property of all text styles become True. For RVF files, all text in text items is converted to Unicode automatically.
ANSI text may appear in document when reading RTF files, if TRichView.RTFReadProperties.UnicodeMode<>rvruOnlyUnicode. If you use projects converted from the older version of Delphi/C++Builder, check a value of this property.
Import and Export
Code page used for conversion is based on Charset property of the corresponding style (Charsets of Unicode styles are used only for conversion to/from ANSI).
Note: you can test file with the function
function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult
defined in RVUni.pas.
▪rvutNo – the file is not Unicode (odd size);
▪rvutYes – the file is most likely Unicode (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked));
▪rvutProbably – the file can contain Unicode (even size);
▪rvutEmpty – the file is empty;
▪rvutError – error opening the file.
You can also use WinAPI function IsTextUnicode performing more advanced tests.
SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets.
RTF (Rich Text Format)
Methods for RTF saving are able to store Unicode.
Methods for RTF loading and inserting work depending on TRichView.RTFReadProperties.UnicodeMode.
SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased; so it's highly recommended to save HTML in UTF-8 encoding.
Selection, Search and The Clipboard
GetSelTextW returns selection as a Unicode string. ANSI strings are converted basing on corresponding Charsets.
Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchTextA/SearchTextW, TRichViewEdit.SearchTextA/SearchTextW. All methods can search both in ANSI and Unicode text items. When comparing ANSI text with Unicode text, SearchText methods use Style.DefCodePage property, SearchText methods use text Charsets.
CopyTextW copies selection as Unicode. ANSI strings are converted basing on corresponding Charsets.
None: on NT-based systems (such as Windows XP), the Clipboard is able to convert Unicode text to ANSI text and vice versa. So, if you copy in one of these formats, both formats are available for pasting.
If pasting text using Paste method, and both ANSI and Unicode texts are available in Clipboard, then the choice is made depending on the current text style (Unicode or not).
InsertTextFromFile: the file must be ANSI (converted, if needed)
InsertOEMTextFromFile: the file must be OEM (converted, if needed)
InsertTextFromFileW: the file must be Unicode (converted, if needed)
RVF (RichView Format)
Applications compiled with older versions of RichView (version less than 1.2) will not be able to load RVF files with Unicode.
RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required (for example, this conversion will occur when loading old RVF files in applications compiled in Delphi/C++Builder 2009+). There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place.
TRichView v11 introduces a new change in RVF files allowing to store String properties as Unicode. RVF files saved in Delphi/C++Builder 2009+ are saved as RVF version 1.3.1, RVF files saved in the older versions of Delphi/C++Builder are saved as RVF version 1.3.
TRichView © trichview.com