Oracle使用非英文字符搜索文本 [英] Oracle search text with non-english characters
问题描述
SELECT CONVERT('PeñaBáináõ','US7ASCII' )转换为双重;
但是这里的问题是,并不是所有的字符都具有英文等同的东西(甚至不甚明显)如ñ或õ),所以我们最终将文本转换为:
Pe?a Baina?
所以如果用户试图找到那个打字 Pena Bainao 的地址,他不能找到它,因为贝纳贝纳不同于 Pe?a Baina。
我们已经弄清楚了一些肮脏的工作环境,但是如果有人找到了更优雅的解决方案,我想先检查一下。
这是一些字符的列表这不是转换为US7ASCII:
字符UTF8代码可能的等价
æ - u00E6 - ae
å - u00E5 - a
ã - u00E3 - a
ñ - u00F1 - n
õ - u00F5 - o
1)使用 nlssort
与BINARY_AI(两种情况和重音无关):
SQL>选择nlssort('PeñaBáináõ','NLS_SORT = BINARY_AI')C from dual;
C
------------------------
70656E61206261696E616F00
SQL>选择nlssort('Pena Bainao','NLS_SORT = BINARY_AI')C from dual;
C
------------------------
70656E61206261696E616F00
SQL>选择nlssort('pena bainao','NLS_SORT = BINARY_AI')C from dual;
C
------------------------
70656E61206261696E616F00
SQL>从double选择trueT,其中nlssort('pena bainao','NLS_SORT = BINARY_AI')= nlssort('PeñaBáináõ','NLS_SORT = BINARY_AI');
T
----
true
2)您也可以将NLS_SORT会话变量更改为binary_ai,然后每次都不必指定NLS_SORT:
SQL> ;选择真T从双,其中nlssort('pena bainao')= nlssort('PeñaBáináõ');
没有选择行
SQL> alter session set nls_sort = binary_ai;
会话更改。
SQL>选择真T从双,其中nlssort('pena bainao')= nlssort('PeñaBáináõ');
T
----
true
3)要删除使用 nlssort
函数并更改所有内容的语义,还要设置nls_comp会话变量:
SQL>选择'真'T从双,其中'pena bainao'='PeñaBáináõ';
没有选择行
SQL> alter session set nls_comp = linguistic;
会话更改。
SQL>选择'真'T从双,其中'pena bainao'='PeñaBáináõ';
T
----
true
选项1仅更改本地行为,查询需要不同的结果。选项2和3将改变其他查询的行为,可能不是你想要的。有关表5-2 ://download.oracle.com/docs/cd/E11882_01/server.112/e10729/toc.htm>Oracle®数据库全球化支持指南。另请参阅使用语言索引一节以便能够使用索引。
Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as "Peña Báináõ" or with english equivalent charactes like "Pena Bainao". What we did is to convert the text on the query, something like:
SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;
But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:
Pe?a Baina?
So if the user tries to find that addres typing "Pena Bainao" he can't find it because "Pena Bainao" is different from ""Pe?a Baina?"".
We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.
Here is a list of some characters that are not converted to US7ASCII:
Character UTF8 Code Possible Equivalent
æ - u00E6 - ae
å - u00E5 - a
ã - u00E3 - a
ñ - u00F1 - n
õ - u00F5 - o
1) Using nlssort
with BINARY_AI (Both case and accent insentive):
SQL> select nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select nlssort('Pena Bainao', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select nlssort('pena bainao', 'NLS_SORT = BINARY_AI') C from dual;
C
------------------------
70656E61206261696E616F00
SQL> select 'true' T from dual where nlssort('pena bainao', 'NLS_SORT = BINARY_AI') = nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') ;
T
----
true
2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:
SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;
no rows selected
SQL> alter session set nls_sort = binary_ai;
Session altered.
SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;
T
----
true
3) To drop the use of nlssort
function and change the sematics of everything, also set the nls_comp session variable:
SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';
no rows selected
SQL> alter session set nls_comp = linguistic;
Session altered.
SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';
T
----
true
Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section "Using Linguistic Indexes" to see how to be able to use indexes.
这篇关于Oracle使用非英文字符搜索文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!