rvico TRichView Reference | Overview

Unicode in RichView

Top  Previous  Next

Introduction

Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code. Because each Unicode character is 16 bits wide, it is possible to have separate values for up to 65,536 characters. Unicode-enabled functions are often referred to as "wide-character" functions.

Unicode strings are referred here as 'Unicode'

Single-byte strings are referred here as 'ANSI' (for simplicity)

Main Limitations of The Current Implementation

Unicode strings are supported only in text items.Item names should be ANSI. Tags (in TagsArePChars mode) must be ANSI also.
you must prevent conversion Unicode to double-byte character set (DBCS) strings, used for representation of characters in Asian languages, since DBCS is not supported by RichView. The only exception (where conversion is ok) is exporting and saving (because in these cases DBCS text will not be used in RichView).

How to Enable Unicode, Using Both ANSI and Unicode

Set Unicode property of text style to True. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property.

Document can contain both Unicode and ANSI text (in different styles).

So, you can mix ANSI and Unicode text. Of course, you can use only ANSI or only Unicode styles. This is even recommended.

How to Make Unicode Editor (Without ANSI Text)

1.Set Unicode property to True for all TextStyles in TRVStyle. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property.
2.Set RichViewEdit1.RTFReadProperties.UnicodeMode = rvruOnlyUnicode
3.Some methods cannot be used if the document is in Unicode:
AddNL, Add, AddNLTag, AddFmt (use AddNLATag or AddNLWTag instead)
AddTextNL, AddTextBlockNL (use AddTextNLA or AddTextNLW instead);
SetItemText for text items (use SetItemTextA or SetItemTextW instead);
SetItemTextEd for text items (use SetItemTextEdA or SetItemTextEdW instead);
SetCurrentItemText for text items (use SetCurrentItemTextA or SetCurrentItemTextW instead).
GetItemText for text items (use GetItemTextA or GetItemTextW instead);
GetCurrentItemText for text items (use GetCurrentItemTextA or GetCurrentItemTextW instead).
4.Existing non-Unicode RVF documents must be converted to Unicode by calling ConvertToUnicode after loading them (see below).

It's safe to call this procedure for Unicode documents – it will do nothing.

uses CRVData, RVItem, RVUni; 

// this code uses some undocumented methods

procedure ConvertRVToUnicode(RVData: TCustomRVData); 

var i,r,c, StyleNo: Integer; 

    table: TRVTableItemInfo

begin 

  for i := 0 to RVData.ItemCount-1 do begin 

    StyleNo := RVData.GetItemStyle(i); 

    if StyleNo>=0 then begin 

      if not RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin 

        RVData.SetItemText(i, RVU_GetRawUnicode(RVData.GetItemTextW(i))); 

        Include(RVData.GetItem(i).ItemOptions, rvioUnicode); 

      end

      end 

    else if RVData.GetItemStyle(i)=rvsTable then begin 

      table := TRVTableItemInfo(RVData.GetItem(i)); 

      for r := 0 to table.Rows.Count-1 do 

        for c := 0 to table.Rows[r].Count-1 do 

          if table.Cells[r,c]<>nil then 

            ConvertRVToUnicode(table.Cells[r,c].GetRVData); 

    end

  end

end

 

procedure ConvertToUnicode(rv: TCustomRichView); 

var i: Integer; 

begin 

  ConvertRVToUnicode(rv.RVData); 

  for i := 0 to rv.Style.TextStyles.Count-1 do 

    rv.Style.TextStyles[i].Unicode := True; 

end

Deprecated Methods (If You Use Unicode)

In the old methods which add a single text item (AddNL, Add, etc...) string can contain either "raw" Unicode or ANSI characters.

RichView understands the string parameter as Unicode or ANSI basing on Unicode property of text style (Style property is not assigned, RichView understands it as ANSI).

These methods does not perform any conversion from ANSI to Unicode and must be used very carefully (or not used at all).

The old methods for adding multiple text items (AddTextNL, AddTextBlockNL and obsolete methods AddText, AddTextFromNewLine) must be called for ANSI strings only, and for ANSI styles only.

Recommended Methods

Methods adding a single text item:

AddNLWTag adds one text string. It converts the wide string parameter to ANSI, if it is called for non-Unicode styles (language for  the conversion is based on charset of the text style).
AddNLATag adds one text string. It converts the string parameter to Unicode, if called for Unicode styles (language for the conversion is based on charset of the text style).

Methods adding several text items:

AddTextNLA converts the string parameter to Unicode if called for Unicode styles (language for conversion is based on charset of the text style).
AddTextNLW has wide (Unicode) string as a parameter. It converts this string to ANSI, if called for non-Unicode styles (language for  the conversion is based on charset of the text style).

The following methods have 3 variants (—W – working with Unicode and converting to/from ANSI automatically, —A – working with ANSI and converting to/from Unicode automatically, and low-level method without postfix):

TRichView.GetItemText, SetItemText;
TRichViewEdit.GetCurrentItemText, SetCurrentItemText, SetItemTextEd;

Only W and —A methods should be used in Unicode applications.

Import and Export

Text Files

LoadText, LoadTextFromStream load ANSI text files. When loading to Unicode style, they perform conversion from ANSI to Unicode.

LoadTextW, LoadTextFromStreamW load Unicode text files. When loading to non-Unicode style, they perform conversion from Unicode  to ANSI.

Code page used for conversion is based on Charset property of the corresponding style (Charsets of Unicode styles are used only for conversion to/from ANSI).

Note: you can test file with the function

function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult

defined in RVUni.pas.

Return values

rvutNo the file is not Unicode (odd size);
rvutYes the file is most likely Unicode (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked));
rvutProbably the file can contain Unicode (even size);
rvutEmpty the file is empty;
rvutError error opening the file.

You can also use WinAPI function IsTextUnicode performing more advanced tests.

SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property.

SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets.

RTF (Rich Text Format)

Methods for RTF saving are able to store Unicode.

Methods for RTF loading and inserting work depending on TRichView.RTFReadProperties.UnicodeMode.

HTML

SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased.

Selection, Search and The Clipboard

GetSelText returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property.

GetSelTextW returns selection as an Unicode string (WideString). ANSI strings are converted basing on corresponding Charsets.

Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchText/SearchTextW, TRichViewEdit.SearchText/SearchTextW. All methods can search both in ANSI and Unicode text items. When comparing ANSI text with Unicode text, SearchText methods use Style.DefCodePage property, SearchText methods use text Charsets.

CopyText copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property.

CopyTextW copies selection as Unicode. ANSI strings are converted basing on corresponding Charsets.

None: on NT-based systems (such as Windows XP), the Clipboard is able to convert Unicode text to ANSI  text and vice versa. So, if you copy in one of these formats, both formats are available for pasting.

Copy and CopyDef are able to copy Unicode (Option-rvoAutoCopyUnicodeText)

Editing Operations

If pasting text using Paste method, and both ANSI and Unicode texts are available in Clipboard, then the choice is made depending on the current text style  (Unicode or not).

PasteText pastes ANSI text, PasteTextW pastes Unicode text.

InsertTextFromFile: the file must be ANSI (converted, if needed)

InsertOEMTextFromFile: the file must be OEM (converted, if needed)

InsertTextFromFileW: the file must be Unicode (converted, if needed)

InsertText, InsertStringTag add ANSI string  (converted, if needed)

InsertTextW, InsertStringWTag add Unicode string (converted, if needed)

RVF (RichView Format)

Applications compiled with older versions of RichView (version less than 1.2) will not be able to load RVF files with Unicode.

RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required. There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place.

Unicode Conversion Functions

There are functions converting WideString to "raw Unicode" string and back, "raw Unicode" string to ANSI and back.

See also...

TRVStyle.DefUnicodeStyle;
Example how to load UTF-8 files.


RichView © Sergey Tkachenko