English version Russian version

Firebird and Interbase codepages

Built-in support of code pages

In IBProvider v.3.0.0.6327 new codepages processor was implemented. Support of 49 code pages of Firebird and Interbase is available. Using conversion tables and algorithms the provider converts text columns, BLOB fields and arrays into UCS2 format (two-byte Unicode) with which Firebird and Interbase work. In this case the code page of connection to the database is taken into account.

Along with text code pages IBProvider supports binary data code page - OCTETS.

The Figure shows the scheme for working with charsets:

Firebird charsets and collations scheme

1. Database charset and text column encoding

Set DEFAULT text data charset at database creation. The charset you have chosen will be used for all text data stored in the database. If necessary, you may set the encoding different from the database charset for individual columns, arrays, or BLOB fields.

If text column encoding has been set it will be used, and database charset will be ignored.

2. Charset NONE

If database charset has not been set and text column charset has not been defined, NONE encoding will be used.

IBProvider has ctype_none property for working with this encoding. It allows to set the charset for text data conversion in NONE encoding. If ctype_none property has not been set, ASCII convertor will be used for NONE:
  • Codes of characters up to and including 127 will be converted into appropriate characters.
  • Codes of characters over 127 will be considered erroneous and exclusion will be generated for them.
Number 2 on the Figure shows the conversion of Firebird and Interbase text data types into appropriate OLE DB and ADO types.

3. OCTETS charset and binary data

OCTETS encoding serves for binary data storage in text columns. When IBProvider detects this charset it will not use encoding processor, OLE DB and ADO data types will be turned into appropriate binary equivalents CHAR, VARCHAR, and BLOB (see number 3 on the Figure).

4-6. Text charsets

Encoding processor is used for charsets different from NONE and OCTETS. See its simplified scheme on the Figure under numbers 4, 5, and 6.

Storage encoding - character set of text column or database charset.

Connection encoding - is defined by ctype initialization property. If it has been set, the data is read and recorded in this encoding and its storage encoding is ignored

Client encoding - text data charset the client works with. Client encoding is set by ctype_user property. For example, data can be stored in the database in WIN1251 and come to client in UTF-8. When text data is recorded into database it is reversely converted into storage encoding or connection encoding.

If ctype=NONE server ignores connection charset and uses storage encoding.

If ctype_user=NONE provider ignores client charset and represents data to the user in storage encoding or in connection encoding if ctype!=NONE.

ctype_user and ctype properties are ignored for NONE and OCTETS charsets.

Dynamic client encodings (client charsets).

ctype_user permitted values are any charset name supported by the provider and special names of dynamically defined charsets - ACP and OCP.

ACP - provider calls GetACP() and converts ACP into WINDOWS-xxx, where xxx is ANSI system charset identifier.

OCP - provider calls GetOEMCP() and converts OEM into DOS-xxx, where xxx is OEM system charset identifier. If the provider fails to convert the charset name or it is not supported by the server the provider generates an error.

Если не удалось конвертировать имя кодовой страницы или она не поддерживается сервером - провайдер генерирует ошибку.

7. Unicode mode

The provider has unicode_mode property that defines text data publishing format. If unicode_mode = true then it uses Unicode DBTYPE_WSTR data types: WChar, VarWChar, and LongVarWChar. If unicode_mode = false then it uses simple DBTYPE_STR types: Char, VarChar, and LongVarChar.

Text columns size in Firebird 2.x

IBProvider started to control text columns size when working with Firebird 2 servers in Unicode mode. If the length of loaded data exceeds the text column size the exclusion will be generated. To avoid column size checking connect in ordinary mode and set Unicode_mode=false.

Aliases of Firebird and Interbase charsets.

IBProvider allows setting the charset by its name, server alias, or by using provider in-built names-aliases. For example, you may define WIN1251 charset by its name WIN1251, by server alias WIN_1251, and through IBProvider aliases WIN-1251 and WINDOWS-1251.

Procession of empty characret set names

Empty values of ctype, ctype_user, and ctype_none properies are changed into NONE.

IBProvider Professional v3 supported character sets

ID Charset  Max bytes
per Char
 
Collation Language Aliases
2 ASCII 1 ASCII English ASCII7, USASCII
56 BIG_5 2 BIG_5 Chinese, Vietnamese, Korean BIG5, DOS_950, WIN_950
50 CYRL 1 CYRL Russian --
50 -- -- DB_RUS Dbase Russian --
50 -- -- PDOX_CYRL Paradox Russian --
10 DOS437 1 DOS437 English—USA DOS_437
10 -- -- DB_DEU437 DBase German --
10 -- -- DB_ESP437 DBase Spanish --
10 -- -- DB_FRA437 DBase French --
10 -- -- DB_FIN437 DBase Finnish --
10 -- -- DB_ITA437 DBase Italian --
10 -- -- DB_NLD437 DBase Dutch --
10 -- -- DB_SVE437 DBase Swedish --
10 -- -- DB_UK437 DBase English—UK --
10 -- -- DB_US437 DBase English—USA --
10 -- -- PDOX_ASCII Paradox ASCII code page --
10 -- -- PDOX_SWEDFIN Paradox Swedish/Finnish code pages --
10 -- -- PDOX_INTL Paradox International English code page --
9 DOS737 1 DOS737 Greek DOS_737
15 DOS775 1 DOS775 Baltic DOS_775
11 DOS850 1 DOS850 Latin I (no Euro symbol) DOS_850
11 -- -- DB_DEU850 German --
11 -- -- DB_ESP850 Spanish --
11 -- -- DB_FRA850 French --
11 -- -- DB_FRC850 French—Canada --
11 -- -- DB_ITA850 Italian --
11 -- -- DB_NLD850 Dutch --
11 -- -- DB_PTB850 Portuguese—Brazil --
11 -- -- DB_SVE850 Swedish --
11 -- -- DB_UK850 English—UK --
11 -- -- DB_US850 English—USA --
45 DOS852 1 DOS852 Latin II DOS_852
45 -- -- DB_CSY DBase Czech --
45 -- -- DB_PLK DBase Polish --
45 -- -- DB_SLO DBase Slovakian --
45 -- -- PDOX_PLK Paradox Polish --
45 -- -- PDOX_HUN Paradox Hungarian --
45 -- -- PDOX_SLO Paradox Slovakian --
45 -- -- PDOX_CSY Paradox Czech --
46 DOS857 1 DOS857 Turkish DOS_857
46 -- -- DB_TRK DBase Turkish --
16 DOS858 1 DOS858 Latin I + Euro symbol DOS_858
13 DOS860 1 DOS860 Portuguese DOS_860
13 -- -- DB_PTG860 DBase Portuguese --
47 DOS861 1 DOS861 Icelandic DOS_861
47 -- -- PDOX_ISL Paradox Icelandic --
17 DOS862 1 DOS862 Hebrew DOS_862
14 DOS863 1 DOS863 French—Canada DOS_863
14 -- -- DB_FRC863 DBase French—Canada --
18 DOS864 1 DOS864 Arabic DOS_864
12 DOS865 1 DOS865 Nordic DOS_865
12 -- -- DB_DAN865 DBase Danish --
12 -- -- DB_NOR865 DBase Norwegian --
12 -- -- PDOX_NORDAN4 Paradox Norwegian & Danish --
48 DOS866 1 DOS866 Russian DOS_866
49 DOS869 1 DOS869 Modern Greek DOS_869
6 EUCJ_0208 2 EUCJ_0208 EUC Japanese EUCJ
57 GB_2312 2 GB_2312 Simplified Chinese (Hong Kong, PRC) DOS_936, GB2312, WIN_936
21 ISO8859_1 1 ISO8859_1 Latin 1 ANSI, ISO88591, LATIN1
21 -- -- FR_CA French—Canada --
21 -- -- DA_DA Danish --
21 -- -- DE_DE German --
21 -- -- ES_ES Spanish --
21 -- -- FI_FI Finnish --
21 -- -- FR_FR French --
21 -- -- IS_IS Icelandic --
21 -- -- IT_IT Italian --
21 -- -- NO_NO Norwegian --
21 -- -- DU_NL Dutch --
21 -- -- PT_PT Portuguese --
21 -- -- SV_SV Swedish --
21 -- -- EN_UK English—UK --
21 -- -- EN_US English—USA --
22 ISO8859_2 1 ISO8859_2 Latin 2—Central European (Croatian, Czech, Hungarian,
Polish, Romanian,Serbian, Slovakian, Slovenian)
ISO-8859-2, ISO88592, LATIN2
22 -- -- CS_CZ Czech --
22 -- -- ISO_HUN Hungarian --
23 ISO8859_3 1 ISO8859_3 Latin3—Southern European (Maltese, Esperanto) ISO-8859-3, ISO88593, LATIN3
34 ISO8859_4 1 ISO8859_4 Latin 4—Northern European
(Estonian, Latvian, Lithuanian, Greenlandic, Lappish)
ISO-8859-4, ISO88594, LATIN4
35 ISO8859_5 1 ISO8859_5 Cyrillic (Russian) ISO-8859-5, ISO88595
36 ISO8859_6 1 ISO8859_6 Arabic ISO-8859-6, ISO88596
37 ISO8859_7 1 ISO8859_7 Greek ISO-8859-7, ISO88597
38 ISO8859_8 1 ISO8859_8 Hebrew ISO-8859-8, ISO88598
39 ISO8859_9 1 ISO8859_9 Latin 5 ISO-8859-9, ISO88599, LATIN5
40 ISO8859_13 1 ISO8859_13 Latin 7—Baltic Rim ISO-8859-13, ISO885913, LATIN7
44 KSC_5601 2 KSC_5601 Korean (Unified Hangeul) DOS_949, KSC5601, WIN_949
44 -- -- KSC_DICTIONARY Korean—dictionary order collation --
19 NEXT 1 NEXT NeXTSTEP encoding --
19 -- -- NXT_US English—USA --
19 -- -- NXT_FRA French --
19 -- -- NXT_ITA Italian --
19 -- -- NXT_ESP Spanish --
19 -- -- NXT_DEU German --
0 NONE 1 NONE Codepage-neutral. Uppercasing limited to ASCII codes 97—122 --
1 OCTETS 1 OCTETS Binary character BINARY
5 SJIS_0208 2 SJIS_0208 Japanese SJIS
3 UNICODE_FSS 3 UNICODE_FSS UNICODE SQL_TEXT, UTF-8, UTF8, UTF_FSS
51 WIN1250 1 WIN1250 ANSI—Central European WIN_1250
51 -- -- PXW_PLK Polish --
51 -- -- PXW_HUN Hungarian --
51 -- -- PXW_CSY Czech --
51 -- -- PXW_HUNDC Hungarian—dictionary sort --
51 -- -- PXW_SLOV Slovakian --
52 WIN1251 1 WIN1251 ANSI--Cyrillic WIN_1251
52 -- -- WIN1251_UA Ukrainian --
52 -- -- PXW_CYRL Paradox Cyrillic (Russian) --
53 WIN1252 1 WIN1252 ANSI—Latin I WIN_1252
53 -- -- PXW_SWEDFIN Swedish & Finnish --
53 -- -- PXW_NORDAN4 Norwegian & Danish --
53 -- -- PXW_INTL English—International --
53 -- -- PXW_INTL850 Paradox Multi-lingual Latin I --
53 -- -- PXW_SPAN Paradox Spanish --
54 WIN1253 1 WIN1253 ANSI Greek WIN_1253
54 -- -- PXW_GREEK Paradox Greek --
55 WIN1254 1 WIN1254 ANSI Turkish WIN_1254
55 -- -- PXW_TURK Paradox Turkish --
58 WIN1255 1 WIN1255 ANSI Hebrew WIN_1255
59 WIN1256 1 WIN1256 ANSI Arabic WIN_1256
60 WIN1257 1 WIN1257 ANSI Baltic WIN_1257
61 WIN1258 1 WIN1258 ANSI Vietnamese WIN_1258

Support of data conversion external libraries

In addition to built-in data conversion tables and algorithms, IBProvider v3 can use external library with text data convertors - ICU. This requires to enter icu_library=icuuc30.dll. parameter in the connect string. The client shall have the library with icuuc30.dll converter algorithms and the resource library icudt30.dll. One can take the libraries from Firebird SQL Server set.

Be careful to use 32-bit ICU-libraries with IBProvider 32 bit and 64-bit ICU libraries with IBProvider 64 bit. The usage of ICU libraries from Firebird 2.1 set adds the support of GBK and CP943C code pages.

Useful links


Tags: Firebird, Interbase, Firebird codepages, collations, charsets, Firebird encoding, ODBC Firebird driver, UCS2 format, ODBC Interbase driver, character sets, Firebird oledb provider

Supported charsets: ASCII, BIG_5, CYRL, DOS437, DOS737, DOS775, DOS850, DOS852, DOS857, DOS858, DOS860, DOS861, DOS862, DOS863, DOS866, DOS869, EUCJ_0208, GB_2312, ISO8859_1, ISO8859_2, ISO8859_3, ISO8859_4, ISO8859_5, ISO8859_6, ISO8859_7, ISO8859_8, ISO8859_9, ISO8859_13, KOI8R, KOI8U, KSC_5601, NEXT,NONE, SJIS_0208, TIS620, UNICODE_FSS, UTF8, WIN1250, WIN1251, WIN1252, WIN1253, WIN1254, WIN1255, WIN1256, WIN1257, WIN1258, OCTETS, GBK, CP943C

Digg It Reddit Technorati Del.icio.us Google Bookmarks Slashdot StumbleUpon Yahoo MyWeb All

Copyright: IBProvider. This material may be reproduced on other web sites, without written permission but link to http://www.ibprovider.com required. Printable version: Firebird and Interbase codepages



Prev Next