HMG-UNICODE
Unicode Documentation
Since version HMG.3.1.0 (2012\/11\/25), HMG at the same time supports ANSI and Unicode character set, or only ANSI character set (for compatibility with previous versions) depending on the choice of compilation in the build of the library. By default HMG supports ANSI and Unicode character set (see INCLUDE\SET_COMPILE_HMG_UNICODE.CH).
Unicode is the current standard in character set, say Microsoft in your documentation:
“Unicode is a worldwide character encoding standard that provides a unique number to represent each character used in modern computing, including technical symbols and special characters used in publishing. Unicode is required by modern standards, such as XML and ECMAScript (JavaScript), and is the official mechanism for implementing ISO\/IEC 10646 (UCS: Universal Character Set). It is supported by many operating systems, all modern browsers, and many other products. New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and to aid in simplifying localization.”
Thereby HMG-Unicode is the future in the xBase programming for Windows. Since version HMG.3.2.0 (2013\/12\/08), HMG-Unicode is considered stable.
HMG-Unicode required set 'Encoding in UTF-8' in your text editor for all the source code files which contain strings in languages using Unicode characters. See Main.UNI.Demo in SAMPLES folder for Unicode characters in Tamil language.
# General Functions\/Commands
HMG_SupportUnicode() - Return .T. or .F.
Return true only if HMG is compiled for support the ANSI and Unicode character set.
HMG_CharSetName() - Return "UNICODE" --> if HMG is compiled for support the ANSI and Unicode character set.
Return "ANSI" --> if HMG is compiled for support ONLY the ANSI character set.
SET CODEPAGE TO UNICODE - Sets the character code page to UTF-8 (Unicode).
If HMG is compiled for support ANSI\/Unicode character set: UTF-8 is default code page.
- HMG_IsCurrentCodePageUnicode() - Returns TRUE if current code page is UTF-8.
- IF HMG SUPPORT UNICODE [ RUN | STOP] --> This is a security command for avoid error in program execution\/compilation for
programmers that used HMG library with and without support for the Unicode character set.
RUN --> Only run the program if HMG library supports the Unicode character set.
STOP --> Stop the execution of the program if HMG library supports the Unicode character set (useful when deactivate the COMPILE_HMG_UNICODE directive to build the HMG library).
Note: The programs written entirely in ANSI can be compiled easily with HMG-UNICODE, adding to the beginning of the
function MAIN() the appropriate ANSI code page, eg. SET CODEPAGE TO SPANISH, without need to disabling
the COMPILE_HMG_UNICODE directive (in file INCLUDE\SET_COMPILE_HMG_UNICODE.CH) and rebuild the HMG library.
The hybrid programs must alternate the appropriate ANSI code page with UTF-8 code page according to the needs.
- Remember: To develop applications that support the ANSI\/UNICODE character set, you should replace in your
programs ALL functions that ONLY support the ANSI character set, by ANSI\/UNICODE equivalent functions.
# Alternative string functions that support ANSI\/Unicode character set
ANSI\/UNICODE ANSI Only
HMG_LEN() <=> LEN()
HMG_LOWER() <=> LOWER()
HMG_UPPER() <=> UPPER()
HMG_PADC() <=> PADC()
HMG_PADL() <=> PADL()
HMG_PADR() <=> PADR()
HMG_ISALPHA() <=> ISALPHA()
HMG_ISDIGIT() <=> ISDIGIT()
HMG_ISLOWER() <=> ISLOWER()
HMG_ISUPPER() <=> ISUPPER()
HMG_ISALPHANUMERIC() <=> RETURN (ISALPHA(c) .OR. ISDIGIT(c))
(*) HB_USUBSTR() <=> SUBSTR()
(*) HB_ULEFT() <=> LEFT()
(*) HB_URIGHT() <=> RIGHT()
(*) HB_UAT() <=> AT()
(*) HB_UTF8RAT() <=> RAT()
(*) HB_UTF8STUFF() <=> STUFF()
(*) Harbour native functions
# Gets Unicode text value
HB_UCODE ( cUnicodeCharacter ) --> Return nCode
HB_UCHAR ( nCode ) --> Return cUnicodeCharacter
HMG_GetUnicodeValue ( cUnicodeText ) --> Return array { nCode1, nCode2, ..., nCodeN }
HMG_GetUnicodeCharacter ( { nCode1, nCode2, ..., nCodeN } ) --> Return cUnicodeText
# UTF8 functions
HMG_IsUTF8 ( cString ) --> lBoolean
HMG_IsUTF8WithBOM ( cString ) --> lBoolean
HMG_UTF8RemoveBOM ( cString ) --> cString
HMG_UTF8InsertBOM ( cString ) --> cString
HMG_UNICODE_TO_ANSI ( cTextUNICODE ) --> cTextANSI
HMG_ANSI_TO_UNICODE ( cTextANSI ) --> cTextUNICODE
# Unicode functions
- HMG_StrCmp ( cText1 , cText2 , [ lCaseSensitive ] ) --> CmpValue