选择文本上的奇怪字符,不能与LIKE运算符一起使用 [英] Select strange characters on text, not working with LIKE operator
问题描述
我尝试使用此解决方案和此(用于 str_eval()
),但似乎是其他编码或其他UTF8的归一化形式,也许组合变音符号 ...
I try to use this solution and this (for str_eval()
) but seems other encode or other UTF8's Normalization Form, perhaps combining diacritical marks...
select distinct logradouro, str_eval(logradouro)
from logradouro where logradouro like '%CECi%';
-- logradouro | str_eval
------------------------------+----------------------------
-- AV CECi\u008DLIA MEIRELLES | AV CECi\u008DLIA MEIRELLES
问题:如何选择以下项的所有行问题所在的表?
就是 \u
发生的地方?
PROBLEM: how to select all rows of the table where the problem exists?
That is, where \u
occurs?
- 不适用于
,例如'%CECi\u%'
都不< c $ c> like'%CECi\\u%' - 与
like E'%CECi\u008D% '
但不是通用的
- not works with
like '%CECi\u%'
neitherlike '%CECi\\u%'
- works with
like E'%CECi\u008D%'
but is not generic
对于Google,已解决问题后进行了编辑:这是典型的 XY问题。在上面的原始问题中,我使用了〜错误的假设。以下所有解决方案都是以下(客观)问题的答案:
For Google, edited after solved question: this is a typical XY problem. In the original question (above) I used ~wrong hypothesis. All the solutions bellow are answers to the following (objective) question:
可打印ASCII 是UTF8的子集,它是不是控制字符的所有ASCII。 。
"Printable ASCII" is a subset of UTF8, it is "all ASCII that is not a 'control character'".
不可打印的控制字符是UNICODE十六进制的00到1F和7F
(HTML实体&#x00;
到&#x1F;
+ & #x7F;
或十进制0到31 + 127)。
The "non-printable" control characters are UNICODE hexadecimal 00 to 1F and 7F
(HTML entity �
to 
+ 
or decimal 0 to 31 + 127).
PS1:零(&#x00;
)是PostgreSQL text 数据类型内部表示形式的文本结尾标记,因此无需检查,但将其包含在范围内也没有问题。
PS1: the zero (�
) is the "end of text" mark of PostgreSQL text datatype internal representation, so not need to be checked, but no problems to include it in the range.
PS2:关于第二个问题如何将带有编码错误的单词转换为有效单词? ,
请参阅启发式,网址为我的答案。
PS2: about the secondary question "how to convert a word with encode bug to a valid word?",
see an heuristic at my answer.
推荐答案
此条件将排除所有不完全由可打印ASCII字符组成的字符串:
This condition will exclude any strings that do not entirely consist of printable ASCII characters:
logradouro ~ '[^\u0020-\u007E]'
这篇关于选择文本上的奇怪字符,不能与LIKE运算符一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!