Postgresql约束来检查非ASCII字符 [英] Postgresql constraint to check for non-ascii characters
问题描述
我有一个Postgresql 9.3数据库,其编码为 UTF8。但是,数据库中有一列应仅包含ASCII的内容。而且,如果非ascii进入那里,则会在我无法控制的另一个系统中引起问题。因此,我想向列添加约束。注意:我已经有一个BEFORE INSERT触发器-因此这可能是进行检查的好地方。
I have a Postgresql 9.3 database that is encoded 'UTF8'. However, there is a column in database that should never contain anything but ASCII. And if non-ascii gets in there, it causes a problem in another system that I have no control over. Therefore, I want to add a constraint to the column. Note: I already have a BEFORE INSERT trigger - so that might be a good place to do the check.
在PostgreSQL中完成此操作的最佳方法是什么?
What's the best way to accomplish this in PostgreSQL?
推荐答案
为此,您可以将 ASCII
定义为序号1到127
,因此以下查询将标识一个字符串,其中 -ascii值:
You can define ASCII
as ordinal 1 to 127
for this purpose, so the following query will identify a string with "non-ascii" values:
SELECT exists(SELECT 1 from regexp_split_to_table('abcdéfg','') x where ascii(x) not between 1 and 127);
但这不太可能有效,使用子查询会迫使您去做
but it's not likely to be super-efficient, and the use of subqueries would force you to do it in a trigger rather than a CHECK constraint.
相反,我会使用正则表达式。如果要所有可打印字符,则可以在检查约束中使用范围,例如:
Instead I'd use a regular expression. If you want all printable characters then you can use a range in a check constraint, like:
CHECK (my_column ~ '^[ -~]*$')
这将匹配从空格到波浪号的所有内容,即可打印的ASCII范围。
this will match everything from the space to the tilde, which is the printable ASCII range.
如果您想要所有可打印和不可打印的ASCII,则可以使用字节转义 :
If you want all ASCII, printable and nonprintable, you can use byte escapes:
CHECK (my_column ~ '^[\x00-\x7F]*$')
最严格的正确方法是 convert_to( my_string,'ascii')
,如果失败则引发异常……但是PostgreSQL不提供 ascii
(即7-位)编码,因此这种方法是不可能的。
The most strictly correct approach would be to convert_to(my_string, 'ascii')
and let an exception be raised if it fails ... but PostgreSQL doesn't offer an ascii
(i.e. 7-bit) encoding, so that approach isn't possible.
这篇关于Postgresql约束来检查非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!