Unicode Conversion

<< Click to display table of contents >>

Unicode Conversion

Various Unicode-related functions.

Unit RVUni.


  TRVCodePage = type cardinal; // defined in RVStyle unit

function RVU_AnsiToUnicode(CodePage: TRVCodePage; const s: TRVAnsiString): TRVUnicodeString;

function RVU_UnicodeToAnsi(CodePage: TRVCodePage; const s: TRVUnicodeString): TRVAnsiString;

function RVU_AnsiToUTF8(CodePage: TRVCodePage; const s: TRVAnsiString): TRVRawByteString;

RVU_AnsiToUnicode converts ANSI string (in the specified CodePage) to Unicode string.

RVU_UnicodeToAnsi converts Unicode string to ANSI string (in the specified CodePage).

RVU_AnsiToUTF8 converts ANSI string (in the specified CodePage) to UTF-8 string.


  TRVUnicodeTestResult = (rvutNo, rvutYes, rvutProbably, rvutEmpty, rvutError);

function RV_TestStreamUnicode(Stream: TStream): TRVUnicodeTestResult;

function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult;

function RV_TestStringUnicode(const s: TRVRawByteString): TRVUnicodeTestResult;

These functions do very basic test of stream, file or string content to determine if they contain Unicode or ANSI text.

Possible results:

rvutNo – it is ANSI text;

rvutYes – it is (very very probably) Unicode text;

rvutProbably – it may be either ANSI or Unicode;

rvutEmpty – empty content;

rvutError – error (reading file or stream).

First, these function check number of bytes in content. If it is odd, this is not Unicode text (rvutNo).

Next, they check the first 2 bytes of text. If it is a Unicode byte order mark, this is very probably Unicode (rvutYes).

Next, they check if there are 0-bytes in the first 500 bytes. If yes, it is not an ANSI (rvutYes), otherwise it may be any (rvutProbably).

You can also use WinAPI function IsTextUnicode performing more advanced tests.

See also:

Unicode in TRichView.