COLLATION AND CHARACTER SETS SQL Compatibility | MySQL, PostgreSQL, SQL Server, Oracle, SQLite

Collation defines the rules for comparing and sorting string data — which characters are equal, how case and accents are handled, and sort order. Character set (charset) defines which characters can be stored. Together they determine string behavior throughout the engine.

Defines how strings are sorted and compared (collation) and what character encoding is used (charset). Collation affects ORDER BY, comparison, and uniqueness. Mixing collations causes errors; always specify explicitly for predictable behavior.

Compatibility

Details

Standard Syntax

Version Support

Per-Database Syntax & Notes

Database System	Support Status	Since Version	Notes
MySQL	✓ Supported	4.1	Character set and collation are separate but related settings. Character set determines storable characters (utf8mb4 is the recommended charset — note utf8 in MySQL is a misleading alias for a 3-byte subset that cannot store 4-byte Unicode like emoji; use utf8mb4). Collation determines sort/comparison rules. Can be set at server, database, table, and column level. utf8mb4_unicode_ci is case-insensitive and accent-insensitive; utf8mb4_bin is binary (case-sensitive, byte-by-byte). Collation mismatches between joined columns cause the 'Illegal mix of collations' error.
PostgreSQL	✓ Supported	9.1	PostgreSQL collation support uses the underlying OS locale (libc provider) or the ICU library (ICU provider, recommended for cross-platform consistency since PostgreSQL 10). Database-level collation is set at CREATE DATABASE and cannot be changed afterward. Column-level COLLATE overrides apply to comparisons, ORDER BY, and indexes. The C collation is byte-order (equivalent to POSIX, fastest, no locale rules). ICU deterministic=false collations (15+) allow accent/case-insensitive indexes.
SQL Server	✓ Supported	7.0	Collation names in SQL Server encode multiple attributes: locale (Latin1_General), case sensitivity (CI = case-insensitive, CS = case-sensitive), accent sensitivity (AI = accent-insensitive, AS = accent-sensitive), and optionally width and kana sensitivity. The default collation for a SQL Server installation is set at setup time and affects system databases. SQL_* collations are legacy; Windows collations (Latin1_General_*) and BIN2 (binary) are current. Collation conflicts between tempdb and user databases are a common source of errors.
Oracle	✓ Supported	12c	Oracle's collation support has evolved significantly. Pre-12c: NLS_SORT and NLS_COMP session parameters control sort order and comparison behavior. 12c+: column-level COLLATE clause added. 21c+: enhanced Unicode collation with BINARY_AI (accent-insensitive), BINARY_CI (case-insensitive). The database character set (AL32UTF8 = Unicode) is set at creation and cannot be changed. NLS_SORT=BINARY_CI + NLS_COMP=LINGUISTIC enables case-insensitive comparisons globally.
SQLite	✓ Supported	3.0	SQLite has three built-in collations: BINARY (default, case-sensitive byte comparison), NOCASE (case-insensitive for ASCII A-Z only, not Unicode), and RTRIM (BINARY with trailing spaces ignored). Custom collations can be registered at runtime via the application layer (e.g., sqlite3_create_collation() in C, or driver-level in Python/Go). No ICU or Unicode-aware case folding built in — for full Unicode case-insensitive search, register a custom collation or use LOWER().

MySQL

Character set and collation are separate but related settings. Character set determines storable characters (utf8mb4 is the recommended charset — note utf8 in MySQL is a misleading alias for a 3-byte subset that cannot store 4-byte Unicode like emoji; use utf8mb4). Collation determines sort/comparison rules. Can be set at server, database, table, and column level. utf8mb4_unicode_ci is case-insensitive and accent-insensitive; utf8mb4_bin is binary (case-sensitive, byte-by-byte). Collation mismatches between joined columns cause the 'Illegal mix of collations' error.

-- Server default (my.cnf): -- character-set-server = utf8mb4 -- collation-server = utf8mb4_unicode_ci -- Database level: CREATE DATABASE myapp CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; -- Table level: CREATE TABLE users ( username VARCHAR(100) ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; -- Column level: CREATE TABLE products ( name VARCHAR(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, sku VARCHAR(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ); -- Ad-hoc comparison: SELECT * FROM users WHERE username = 'alice' COLLATE utf8mb4_bin; -- force case-sensitive -- List available collations: SHOW COLLATION WHERE Charset = 'utf8mb4';

PostgreSQL

PostgreSQL collation support uses the underlying OS locale (libc provider) or the ICU library (ICU provider, recommended for cross-platform consistency since PostgreSQL 10). Database-level collation is set at CREATE DATABASE and cannot be changed afterward. Column-level COLLATE overrides apply to comparisons, ORDER BY, and indexes. The C collation is byte-order (equivalent to POSIX, fastest, no locale rules). ICU deterministic=false collations (15+) allow accent/case-insensitive indexes.

-- Database with specific collation: CREATE DATABASE myapp ENCODING 'UTF8' LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' TEMPLATE template0; -- Column collation: CREATE TABLE users ( username TEXT COLLATE "en-US-x-icu", -- ICU email TEXT COLLATE "C" -- byte order ); -- Ad-hoc comparison with explicit collation: SELECT * FROM users WHERE username = 'alice' COLLATE "case_insensitive"; -- Case-insensitive collation (PG 12+ with ICU): CREATE COLLATION case_insensitive ( provider = icu, locale = 'und-u-ks-level2', deterministic = false ); -- List collations: SELECT * FROM pg_collation;

SQL Server

Collation names in SQL Server encode multiple attributes: locale (Latin1_General), case sensitivity (CI = case-insensitive, CS = case-sensitive), accent sensitivity (AI = accent-insensitive, AS = accent-sensitive), and optionally width and kana sensitivity. The default collation for a SQL Server installation is set at setup time and affects system databases. SQL_* collations are legacy; Windows collations (Latin1_General_*) and BIN2 (binary) are current. Collation conflicts between tempdb and user databases are a common source of errors.

-- Server/database collation: CREATE DATABASE myapp COLLATE Latin1_General_CI_AS; -- Column collation: CREATE TABLE users ( username NVARCHAR(100) COLLATE Latin1_General_CI_AS, sku VARCHAR(50) COLLATE Latin1_General_CS_AS -- case-sensitive ); -- Ad-hoc collation in query: SELECT * FROM users WHERE username = N'alice' COLLATE Latin1_General_CS_AS; -- Accent-insensitive: SELECT * FROM users WHERE username = N'cafe' COLLATE Latin1_General_CI_AI; -- Matches: cafe, café, CAFE, CAFÉ -- List available collations: SELECT * FROM fn_helpcollations(); -- Check database collation: SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation');

Oracle

Oracle's collation support has evolved significantly. Pre-12c: NLS_SORT and NLS_COMP session parameters control sort order and comparison behavior. 12c+: column-level COLLATE clause added. 21c+: enhanced Unicode collation with BINARY_AI (accent-insensitive), BINARY_CI (case-insensitive). The database character set (AL32UTF8 = Unicode) is set at creation and cannot be changed. NLS_SORT=BINARY_CI + NLS_COMP=LINGUISTIC enables case-insensitive comparisons globally.

-- Database character set (set at CREATE DATABASE, not changeable): -- AL32UTF8 is the recommended Unicode character set. -- Session-level collation control (pre-12c approach): ALTER SESSION SET NLS_SORT = 'BINARY_CI'; -- case-insensitive sort ALTER SESSION SET NLS_COMP = 'LINGUISTIC'; -- use NLS_SORT for comparisons -- Column-level COLLATE (12c+): CREATE TABLE users ( username VARCHAR2(100) COLLATE BINARY_CI, -- case-insensitive sku VARCHAR2(50) COLLATE BINARY -- binary (default) ); -- Ad-hoc: SELECT * FROM users WHERE NLSSORT(username,'NLS_SORT=BINARY_CI') = NLSSORT('alice','NLS_SORT=BINARY_CI'); -- Check NLS settings: SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER IN ('NLS_CHARACTERSET','NLS_SORT','NLS_COMP');

SQLite

SQLite has three built-in collations: BINARY (default, case-sensitive byte comparison), NOCASE (case-insensitive for ASCII A-Z only, not Unicode), and RTRIM (BINARY with trailing spaces ignored). Custom collations can be registered at runtime via the application layer (e.g., sqlite3_create_collation() in C, or driver-level in Python/Go). No ICU or Unicode-aware case folding built in — for full Unicode case-insensitive search, register a custom collation or use LOWER().

-- Column with NOCASE collation: CREATE TABLE users ( username TEXT COLLATE NOCASE, -- ASCII case-insensitive email TEXT COLLATE BINARY -- default, case-sensitive ); -- NOCASE in a query: SELECT * FROM users WHERE username = 'alice' COLLATE NOCASE; -- Matches: alice, Alice, ALICE (ASCII only) -- Index respects collation: CREATE INDEX idx_username ON users (username COLLATE NOCASE); -- LOWER() workaround for Unicode case-insensitivity: SELECT * FROM users WHERE LOWER(username) = LOWER('Ünïcödé'); -- List collations: PRAGMA collation_list;