Publish date: 2008-11-06
Updated: 2016-03-03

Firebird and InterBase codepages

Built-in support of code pages

In IBProvider v3.0.0.6327 new codepages processor was implemented. Support of 49 code pages of Firebird and InterBase is available. Using conversion tables and algorithms the provider converts text columns, BLOB fields and arrays into UCS2 format (two-byte Unicode) with which Firebird and InterBase work. In this case the code page of connection to the database is taken into account.

Along with text code pages IBProvider supports binary data code page — OCTETS.

The Figure shows the scheme for working with charsets:

Firebird charsets and collations scheme

1. Database charset and text column encoding

Set DEFAULT text data charset at database creation. The charset you have chosen will be used for all text data stored in the database. If necessary, you may set the encoding different from the database charset for individual columns, arrays, or BLOB fields.

If text column encoding has been set it will be used, and database charset will be ignored.

2. Charset NONE

If database charset has not been set and text column charset has not been defined, NONE encoding will be used.

IBProvider has ctype_none property for working with this encoding. It allows to set the charset for text data conversion in NONE encoding. If ctype_none property has not been set, ASCII convertor will be used for NONE:

  • Codes of characters up to and including 127 will be converted into appropriate characters.
  • Codes of characters over 127 will be considered erroneous and exclusion will be generated for them.

Number 2 on the Figure shows the conversion of Firebird and InterBase text data types into appropriate OLE DB and ADO types.

3. OCTETS charset and binary data

OCTETS encoding serves for binary data storage in text columns. When IBProvider detects this charset it will not use encoding processor, OLE DB and ADO data types will be turned into appropriate binary equivalents CHAR, VARCHAR, and BLOB (see number 3 on the Figure).

4-6. Text charsets

Encoding processor is used for charsets different from NONE and OCTETS. See its simplified scheme on the Figure under numbers 4, 5, and 6.

Storage encoding — character set of text column or database charset.

Connection encoding — is defined by ctype initialization property. If it has been set, the data is read and recorded in this encoding and its storage encoding is ignored.

Client encoding — text data charset the client works with. Client encoding is set by ctype_user property. For example, data can be stored in the database in WIN1251 and come to client in UTF-8. When text data is recorded into database it is reversely converted into storage encoding or connection encoding.

If ctype=NONE server ignores connection charset and uses storage encoding.

If ctype_user=NONE provider ignores client charset and represents data to the user in storage encoding or in connection encoding if ctype!=NONE.

ctype_user and ctype properties are ignored for NONE and OCTETS charsets.

Dynamic client encodings (client charsets).

ctype_user permitted values are any charset name supported by the provider and special names of dynamically defined charsets — ACP and OCP.

ACP — provider calls GetACP() and converts ACP into WINDOWS-xxx, where xxx is ANSI system charset identifier.

OCP — provider calls GetOEMCP() and converts OEM into DOS-xxx, where xxx is OEM system charset identifier.

If the provider fails to convert the charset name or it is not supported by the server the provider generates an error.

7. Unicode mode

The provider has unicode_mode property that defines text data publishing format.

If unicode_mode = true then it uses Unicode DBTYPE_WSTR data types: WChar, VarWChar, and LongVarWChar.

If unicode_mode = false then it uses simple DBTYPE_STR types: Char, VarChar, and LongVarChar.

Text columns size in Firebird 2.x

IBProvider started to control text columns size when working with Firebird 2 servers in Unicode mode. If the length of loaded data exceeds the text column size the exclusion will be generated. To avoid column size checking connect in ordinary mode and set Unicode_mode=false.

Aliases of Firebird and InterBase charsets.

IBProvider allows setting the charset by its name, server alias, or by using provider in-built names-aliases. For example, you may define WIN1251 charset by its name WIN1251, by server alias WIN_1251, and through IBProvider aliases WIN-1251 and WINDOWS-1251.

Procession of empty characret set names

Empty values of ctype, ctype_user, and ctype_none properies are changed into NONE.

IBProvider Professional v3 supported character sets

ID Charset  Max bytes
per Char
Collation Language Aliases
56 BIG_5 2 BIG_5 Chinese, Vietnamese, Korean BIG5, DOS_950, WIN_950
50 CYRL 1 CYRL Russian
50 DB_RUS Dbase Russian
50 PDOX_CYRL Paradox Russian
10 DOS437 1 DOS437 English—USA DOS_437
10 DB_DEU437 DBase German
10 DB_ESP437 DBase Spanish
10 DB_FRA437 DBase French
10 DB_FIN437 DBase Finnish
10 DB_ITA437 DBase Italian
10 DB_NLD437 DBase Dutch
10 DB_SVE437 DBase Swedish
10 DB_UK437 DBase English—UK
10 DB_US437 DBase English—USA
10 PDOX_ASCII Paradox ASCII code page
10 PDOX_SWEDFIN Paradox Swedish/Finnish code pages
10 PDOX_INTL Paradox International English code page
9 DOS737 1 DOS737 Greek DOS_737
15 DOS775 1 DOS775 Baltic DOS_775
11 DOS850 1 DOS850 Latin I (no Euro symbol) DOS_850
11 DB_DEU850 German
11 DB_ESP850 Spanish
11 DB_FRA850 French
11 DB_FRC850 French—Canada
11 DB_ITA850 Italian
11 DB_NLD850 Dutch
11 DB_PTB850 Portuguese—Brazil
11 DB_SVE850 Swedish
11 DB_UK850 English—UK
11 DB_US850 English—USA
45 DOS852 1 DOS852 Latin II DOS_852
45 DB_CSY DBase Czech
45 DB_PLK DBase Polish
45 DB_SLO DBase Slovakian
45 PDOX_PLK Paradox Polish
45 PDOX_HUN Paradox Hungarian
45 PDOX_SLO Paradox Slovakian
45 PDOX_CSY Paradox Czech
46 DOS857 1 DOS857 Turkish DOS_857
46 DB_TRK DBase Turkish
16 DOS858 1 DOS858 Latin I + Euro symbol DOS_858
13 DOS860 1 DOS860 Portuguese DOS_860
13 DB_PTG860 DBase Portuguese
47 DOS861 1 DOS861 Icelandic DOS_861
47 PDOX_ISL Paradox Icelandic
17 DOS862 1 DOS862 Hebrew DOS_862
14 DOS863 1 DOS863 French—Canada DOS_863
14 DB_FRC863 DBase French—Canada
18 DOS864 1 DOS864 Arabic DOS_864
12 DOS865 1 DOS865 Nordic DOS_865
12 DB_DAN865 DBase Danish
12 DB_NOR865 DBase Norwegian
12 PDOX_NORDAN4 Paradox Norwegian & Danish
48 DOS866 1 DOS866 Russian DOS_866
49 DOS869 1 DOS869 Modern Greek DOS_869
6 EUCJ_0208 2 EUCJ_0208 EUC Japanese EUCJ
57 GB_2312 2 GB_2312 Simplified Chinese (Hong Kong, PRC) DOS_936, GB2312, WIN_936
21 ISO8859_1 1 ISO8859_1 Latin 1 ANSI, ISO88591, LATIN1
21 FR_CA French—Canada
21 DA_DA Danish
21 DE_DE German
21 ES_ES Spanish
21 FI_FI Finnish
21 FR_FR French
21 IS_IS Icelandic
21 IT_IT Italian
21 NO_NO Norwegian
21 DU_NL Dutch
21 PT_PT Portuguese
21 SV_SV Swedish
21 EN_UK English—UK
21 EN_US English—USA
22 ISO8859_2 1 ISO8859_2 Latin 2—Central European (Croatian, Czech, Hungarian,
Polish, Romanian,Serbian, Slovakian, Slovenian)
ISO-8859-2, ISO88592, LATIN2
22 CS_CZ Czech
22 ISO_HUN Hungarian
23 ISO8859_3 1 ISO8859_3 Latin3—Southern European (Maltese, Esperanto) ISO-8859-3, ISO88593, LATIN3
34 ISO8859_4 1 ISO8859_4 Latin 4—Northern European
(Estonian, Latvian, Lithuanian, Greenlandic, Lappish)
ISO-8859-4, ISO88594, LATIN4
35 ISO8859_5 1 ISO8859_5 Cyrillic (Russian) ISO-8859-5, ISO88595
36 ISO8859_6 1 ISO8859_6 Arabic ISO-8859-6, ISO88596
37 ISO8859_7 1 ISO8859_7 Greek ISO-8859-7, ISO88597
38 ISO8859_8 1 ISO8859_8 Hebrew ISO-8859-8, ISO88598
39 ISO8859_9 1 ISO8859_9 Latin 5 ISO-8859-9, ISO88599, LATIN5
40 ISO8859_13 1 ISO8859_13 Latin 7—Baltic Rim ISO-8859-13, ISO885913, LATIN7
44 KSC_5601 2 KSC_5601 Korean (Unified Hangeul) DOS_949, KSC5601, WIN_949
44 KSC_DICTIONARY Korean—dictionary order collation
19 NEXT 1 NEXT NeXTSTEP encoding
19 NXT_US English—USA
19 NXT_FRA French
19 NXT_ITA Italian
19 NXT_ESP Spanish
19 NXT_DEU German
0 NONE 1 NONE Codepage-neutral. Uppercasing limited to ASCII codes 97—122
1 OCTETS 1 OCTETS Binary character BINARY
5 SJIS_0208 2 SJIS_0208 Japanese SJIS
51 WIN1250 1 WIN1250 ANSI—Central European WIN_1250
51 PXW_PLK Polish
51 PXW_HUN Hungarian
51 PXW_CSY Czech
51 PXW_HUNDC Hungarian—dictionary sort
51 PXW_SLOV Slovakian
52 WIN1251 1 WIN1251 ANSI—Cyrillic WIN_1251
52 WIN1251_UA Ukrainian
52 PXW_CYRL Paradox Cyrillic (Russian)
53 WIN1252 1 WIN1252 ANSI—Latin I WIN_1252
53 PXW_SWEDFIN Swedish & Finnish
53 PXW_NORDAN4 Norwegian & Danish
53 PXW_INTL English—International
53 PXW_INTL850 Paradox Multi-lingual Latin I
53 PXW_SPAN Paradox Spanish
54 WIN1253 1 WIN1253 ANSI Greek WIN_1253
54 PXW_GREEK Paradox Greek
55 WIN1254 1 WIN1254 ANSI Turkish WIN_1254
55 PXW_TURK Paradox Turkish
58 WIN1255 1 WIN1255 ANSI Hebrew WIN_1255
59 WIN1256 1 WIN1256 ANSI Arabic WIN_1256
60 WIN1257 1 WIN1257 ANSI Baltic WIN_1257
61 WIN1258 1 WIN1258 ANSI Vietnamese WIN_1258

Support of data conversion external libraries

In addition to built-in data conversion tables and algorithms, IBProvider v3 can use external library with text data convertors — ICU. This requires to enter icu_library=icuuc30.dll. parameter in the connect string. The client shall have the library with icuuc30.dll converter algorithms and the resource library icudt30.dll. One can take the libraries from Firebird SQL Server set.

Be careful to use 32-bit ICU-libraries with IBProvider 32bit and 64-bit ICU libraries with IBProvider 64bit. The usage of ICU libraries from Firebird 2.1 set adds the support of GBK and CP943C code pages.

Tags: Firebird, InterBase, Firebird codepages, collations, charsets, Firebird encoding, ODBC Firebird driver, UCS2 format, ODBC InterBase driver, character sets, Firebird oledb provider

Supported charsets: ASCII, BIG_5, CYRL, DOS437, DOS737, DOS775, DOS850, DOS852, DOS857, DOS858, DOS860, DOS861, DOS862, DOS863, DOS866, DOS869, EUCJ_0208, GB_2312, ISO8859_1, ISO8859_2, ISO8859_3, ISO8859_4, ISO8859_5, ISO8859_6, ISO8859_7, ISO8859_8, ISO8859_9, ISO8859_13, KOI8R, KOI8U, KSC_5601, NEXT,NONE, SJIS_0208, TIS620, UNICODE_FSS, UTF8, WIN1250, WIN1251, WIN1252, WIN1253, WIN1254, WIN1255, WIN1256, WIN1257, WIN1258, OCTETS, GBK, CP943C
Publish date: 2008-11-06. Copyright: IBProvider. This material may be reproduced on other web sites, without written permission but link required.