Collation defines the rules for comparing and sorting string data — which characters are equal, how case and accents are handled, and sort order. Character set (charset) defines which characters can be stored. Together they determine string behavior throughout the engine.
Defines how strings are sorted and compared (collation) and what character encoding is used (charset). Collation affects ORDER BY, comparison, and uniqueness. Mixing collations causes errors; always specify explicitly for predictable behavior.
| Database System | Support Status | Since Version | Notes |
|---|---|---|---|
| MySQL | ✓ Supported | 4.1 | Character set and collation are separate but related settings. Character set determines storable characters (utf8mb4 is the recommended charset — note utf8 in MySQL is a misleading alias for a 3-byte subset that cannot store 4-byte Unicode like emoji; use utf8mb4). Collation determines sort/comparison rules. Can be set at server, database, table, and column level. utf8mb4_unicode_ci is case-insensitive and accent-insensitive; utf8mb4_bin is binary (case-sensitive, byte-by-byte). Collation mismatches between joined columns cause the 'Illegal mix of collations' error. |
| PostgreSQL | ✓ Supported | 9.1 | PostgreSQL collation support uses the underlying OS locale (libc provider) or the ICU library (ICU provider, recommended for cross-platform consistency since PostgreSQL 10). Database-level collation is set at CREATE DATABASE and cannot be changed afterward. Column-level COLLATE overrides apply to comparisons, ORDER BY, and indexes. The C collation is byte-order (equivalent to POSIX, fastest, no locale rules). ICU deterministic=false collations (15+) allow accent/case-insensitive indexes. |
| SQL Server | ✓ Supported | 7.0 | Collation names in SQL Server encode multiple attributes: locale (Latin1_General), case sensitivity (CI = case-insensitive, CS = case-sensitive), accent sensitivity (AI = accent-insensitive, AS = accent-sensitive), and optionally width and kana sensitivity. The default collation for a SQL Server installation is set at setup time and affects system databases. SQL_* collations are legacy; Windows collations (Latin1_General_*) and BIN2 (binary) are current. Collation conflicts between tempdb and user databases are a common source of errors. |
| Oracle | ✓ Supported | 12c | Oracle's collation support has evolved significantly. Pre-12c: NLS_SORT and NLS_COMP session parameters control sort order and comparison behavior. 12c+: column-level COLLATE clause added. 21c+: enhanced Unicode collation with BINARY_AI (accent-insensitive), BINARY_CI (case-insensitive). The database character set (AL32UTF8 = Unicode) is set at creation and cannot be changed. NLS_SORT=BINARY_CI + NLS_COMP=LINGUISTIC enables case-insensitive comparisons globally. |
| SQLite | ✓ Supported | 3.0 | SQLite has three built-in collations: BINARY (default, case-sensitive byte comparison), NOCASE (case-insensitive for ASCII A-Z only, not Unicode), and RTRIM (BINARY with trailing spaces ignored). Custom collations can be registered at runtime via the application layer (e.g., sqlite3_create_collation() in C, or driver-level in Python/Go). No ICU or Unicode-aware case folding built in — for full Unicode case-insensitive search, register a custom collation or use LOWER(). |
The single most common collation-related bug: MySQL's utf8 charset is NOT full Unicode (it's a 3-byte subset that silently truncates 4-byte characters like emoji). Always use utf8mb4. In SQL Server, collation mismatches between columns in a JOIN produce 'cannot resolve collation conflict' errors — most commonly hit when joining against #temp tables in tempdb (which uses the server collation). PostgreSQL's ICU-based collations (10+) are recommended over libc collations for cross-platform consistency, particularly for case-insensitive and accent-insensitive indexes (deterministic=false, PG 15+). SQLite's NOCASE is ASCII-only — it does not fold Unicode characters and will not treat 'é' and 'E' as equal.
Character set and collation are separate but related settings. Character set determines storable characters (utf8mb4 is the recommended charset — note utf8 in MySQL is a misleading alias for a 3-byte subset that cannot store 4-byte Unicode like emoji; use utf8mb4). Collation determines sort/comparison rules. Can be set at server, database, table, and column level. utf8mb4_unicode_ci is case-insensitive and accent-insensitive; utf8mb4_bin is binary (case-sensitive, byte-by-byte). Collation mismatches between joined columns cause the 'Illegal mix of collations' error.
PostgreSQL collation support uses the underlying OS locale (libc provider) or the ICU library (ICU provider, recommended for cross-platform consistency since PostgreSQL 10). Database-level collation is set at CREATE DATABASE and cannot be changed afterward. Column-level COLLATE overrides apply to comparisons, ORDER BY, and indexes. The C collation is byte-order (equivalent to POSIX, fastest, no locale rules). ICU deterministic=false collations (15+) allow accent/case-insensitive indexes.
Collation names in SQL Server encode multiple attributes: locale (Latin1_General), case sensitivity (CI = case-insensitive, CS = case-sensitive), accent sensitivity (AI = accent-insensitive, AS = accent-sensitive), and optionally width and kana sensitivity. The default collation for a SQL Server installation is set at setup time and affects system databases. SQL_* collations are legacy; Windows collations (Latin1_General_*) and BIN2 (binary) are current. Collation conflicts between tempdb and user databases are a common source of errors.
Oracle's collation support has evolved significantly. Pre-12c: NLS_SORT and NLS_COMP session parameters control sort order and comparison behavior. 12c+: column-level COLLATE clause added. 21c+: enhanced Unicode collation with BINARY_AI (accent-insensitive), BINARY_CI (case-insensitive). The database character set (AL32UTF8 = Unicode) is set at creation and cannot be changed. NLS_SORT=BINARY_CI + NLS_COMP=LINGUISTIC enables case-insensitive comparisons globally.
SQLite has three built-in collations: BINARY (default, case-sensitive byte comparison), NOCASE (case-insensitive for ASCII A-Z only, not Unicode), and RTRIM (BINARY with trailing spaces ignored). Custom collations can be registered at runtime via the application layer (e.g., sqlite3_create_collation() in C, or driver-level in Python/Go). No ICU or Unicode-aware case folding built in — for full Unicode case-insensitive search, register a custom collation or use LOWER().