Postgresql varchar是否使用Unicode字符长度或ASCII字符长度进行计数? [英] Does Postgresql varchar count using unicode character length or ASCII character length?

查看:313
本文介绍了Postgresql varchar是否使用Unicode字符长度或ASCII字符长度进行计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从SQL文件导入数据库转储,并且在将字符串Mér插入定义为 varying(3)的字段中时插入失败。我没有捕获确切的错误,但它指出了受 varying(3)约束的特定值。

I tried importing a database dump from a SQL file and the insert failed when inserting the string Mér into a field defined as varying(3). I didn't capture the exact error, but it pointed to that specific value with the constraint of varying(3).

鉴于我认为这与当时的工作无关紧要,所以我只是将值更改为 Mer ,它起作用了,我继续前进。

Given that I considered this unimportant to what I was doing at the time, I just changed the value to Mer, it worked, and I moved on.

是否变化字段,其限制考虑了字节字符串的长度?我真正感到困惑的是这是从另一个PostgreSQL数据库中转储的。因此,没有什么约束条件可以允许最初写入值。

Is a varying field with its limit taking into account length of the byte string? What really boggles my mind is that this was dumped from another PostgreSQL database. So it doesn't make sense how a constraint could allow the value to be written initially.

推荐答案

varchar(N)类型并由 length 函数计算得出的是字符而不是字节。因此'abcdef':: char(3)被截断为'abc' a€cdef':: char(3)被截断为'a€c',即使在编码为UTF- 8,其中'a€c'使用5个字节编码。

The length limit imposed by varchar(N) types and calculated by the length function is in characters, not bytes. So 'abcdef'::char(3) is truncated to 'abc' but 'a€cdef'::char(3) is truncated to 'a€c', even in the context of a database encoded as UTF-8, where 'a€c' is encoded using 5 bytes.

如果恢复转储文件,则抱怨'Mér'不会进入 varchar(3)列,这表明您正在还原UTF-8编码的转储

If restoring a dump file complained that 'Mér' would not go into a varchar(3) column, that suggests you were restoring a UTF-8 encoded dump file into a SQL_ASCII database.

例如,我是在UTF-8数据库中完成的:

For example, I did this in a UTF-8 database:

create schema so4249745;
create table so4249745.t(key varchar(3) primary key);
insert into so4249745.t values('Mér');

然后转储此文件并尝试将其加载到SQL_ASCII数据库中:

And then dumped this and tried to load it into a SQL_ASCII database:

pg_dump -f dump.sql --schema=so4249745 --table=t
createdb -E SQL_ASCII -T template0 enctest
psql -f dump.sql enctest

并且肯定:

psql:dump.sql:34: ERROR:  value too long for type character varying(3)
CONTEXT:  COPY t, line 1, column key: "Mér"

之所以出现此问题,是因为转储具有多字节字符编码的数据库,然后尝试将其还原到SQL_ASCII数据库。使用SQL_ASCII基本上会禁用将客户端数据转换为服务器数据的功能,并假定每个字符一个字节,然后由客户端负责使用正确的字符映射。由于转储文件包含存储的字符串UTF-8(即四个字节),因此SQL_ASCII数据库将其视为四个字符,因此将其视为违反约束。并打印出值,然后我的终端将其重新组装为三个字符。

This problem comes about because of a combination of dumping a database with a multi-byte character encoding, and trying to restore it into a SQL_ASCII database. Using SQL_ASCII basically disables the transcoding of client data to server data and assumes one byte per character, leaving it to the clients to take responsibility for using the right character map. Since the dump file contains the stored string as UTF-8, that is four bytes, so a SQL_ASCII database sees that as four characters, and therefore regards it as violating the constraint. And it prints out the value, which my terminal then reassembles as three characters.

这篇关于Postgresql varchar是否使用Unicode字符长度或ASCII字符长度进行计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆