|
Unicode in RichView |
Top Previous Next |
|
Introduction Unicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code. Because each Unicode character is 16 bits wide, it is possible to have separate values for up to 65,536 characters. Unicode-enabled functions are often referred to as "wide-character" functions. Unicode strings are referred here as 'Unicode' Single-byte strings are referred here as 'ANSI' (for simplicity) Main Limitations of The Current Implementation
How to Enable Unicode, Using Both ANSI and Unicode Set Unicode property of text style to True. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. Document can contain both Unicode and ANSI text (in different styles). So, you can mix ANSI and Unicode text. Of course, you can use only ANSI or only Unicode styles. This is even recommended. How to Make Unicode Editor (Without ANSI Text)
It's safe to call this procedure for Unicode documents – it will do nothing. uses CRVData, RVItem, RVUni; // this code uses some undocumented methods procedure ConvertRVToUnicode(RVData: TCustomRVData); var i,r,c, StyleNo: Integer; table: TRVTableItemInfo; begin for i := 0 to RVData.ItemCount-1 do begin StyleNo := RVData.GetItemStyle(i); if StyleNo>=0 then begin if not RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin RVData.SetItemText(i, RVU_GetRawUnicode(RVData.GetItemTextW(i))); Include(RVData.GetItem(i).ItemOptions, rvioUnicode); end; end else if RVData.GetItemStyle(i)=rvsTable then begin table := TRVTableItemInfo(RVData.GetItem(i)); for r := 0 to table.Rows.Count-1 do for c := 0 to table.Rows[r].Count-1 do if table.Cells[r,c]<>nil then ConvertRVToUnicode(table.Cells[r,c].GetRVData); end; end; end;
procedure ConvertToUnicode(rv: TCustomRichView); var i: Integer; begin ConvertRVToUnicode(rv.RVData); for i := 0 to rv.Style.TextStyles.Count-1 do rv.Style.TextStyles[i].Unicode := True; end; Deprecated Methods (If You Use Unicode) In the old methods which add a single text item (AddNL, Add, etc...) string can contain either "raw" Unicode or ANSI characters. RichView understands the string parameter as Unicode or ANSI basing on Unicode property of text style (Style property is not assigned, RichView understands it as ANSI). These methods does not perform any conversion from ANSI to Unicode and must be used very carefully (or not used at all). The old methods for adding multiple text items (AddTextNL, AddTextBlockNL and obsolete methods AddText, AddTextFromNewLine) must be called for ANSI strings only, and for ANSI styles only. Recommended Methods Methods adding a single text item:
Methods adding several text items:
The following methods have 3 variants (—W – working with Unicode and converting to/from ANSI automatically, —A – working with ANSI and converting to/from Unicode automatically, and low-level method without postfix):
Only —W and —A methods should be used in Unicode applications. Import and Export Text Files LoadText, LoadTextFromStream load ANSI text files. When loading to Unicode style, they perform conversion from ANSI to Unicode. LoadTextW, LoadTextFromStreamW load Unicode text files. When loading to non-Unicode style, they perform conversion from Unicode to ANSI. Code page used for conversion is based on Charset property of the corresponding style (Charsets of Unicode styles are used only for conversion to/from ANSI). Note: you can test file with the function function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult defined in RVUni.pas. Return values
You can also use WinAPI function IsTextUnicode performing more advanced tests. SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property. SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets. RTF (Rich Text Format) Methods for RTF saving are able to store Unicode. Methods for RTF loading and inserting work depending on TRichView.RTFReadProperties.UnicodeMode. HTML SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased. Selection, Search and The Clipboard GetSelText returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property. GetSelTextW returns selection as an Unicode string (WideString). ANSI strings are converted basing on corresponding Charsets. Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchText/SearchTextW, TRichViewEdit.SearchText/SearchTextW. All methods can search both in ANSI and Unicode text items. When comparing ANSI text with Unicode text, SearchText methods use Style.DefCodePage property, SearchText methods use text Charsets. CopyText copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property. CopyTextW copies selection as Unicode. ANSI strings are converted basing on corresponding Charsets. None: on NT-based systems (such as Windows XP), the Clipboard is able to convert Unicode text to ANSI text and vice versa. So, if you copy in one of these formats, both formats are available for pasting. Copy and CopyDef are able to copy Unicode (Option-rvoAutoCopyUnicodeText) Editing Operations If pasting text using Paste method, and both ANSI and Unicode texts are available in Clipboard, then the choice is made depending on the current text style (Unicode or not). PasteText pastes ANSI text, PasteTextW pastes Unicode text. InsertTextFromFile: the file must be ANSI (converted, if needed) InsertOEMTextFromFile: the file must be OEM (converted, if needed) InsertTextFromFileW: the file must be Unicode (converted, if needed) InsertText, InsertStringTag add ANSI string (converted, if needed) InsertTextW, InsertStringWTag add Unicode string (converted, if needed) RVF (RichView Format) Applications compiled with older versions of RichView (version less than 1.2) will not be able to load RVF files with Unicode. RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required. There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place. Unicode Conversion Functions There are functions converting WideString to "raw Unicode" string and back, "raw Unicode" string to ANSI and back. See also...
|