Oracle使用非英文字符搜索文本 [英] Oracle search text with non-english characters

查看:165
本文介绍了Oracle使用非英文字符搜索文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的Oracle数据库是UTF8。我们正在存储需要搜索的地址。一些街道名称包含非英文字符(例如PeñaBáináõ),这需要作为PeñaBáináõ或具有诸如 Pena Bainao 之类的英文等效字符可搜索。 。我们做的是转换查询中的文本,例如:

  SELECT CONVERT('PeñaBáináõ','US7ASCII' )转换为双重; 

但是这里的问题是,并不是所有的字符都具有英文等同的东西(甚至不甚明显)如ñ或õ),所以我们最终将文本转换为:

  Pe?a Baina? 

所以如果用户试图找到那个打字 Pena Bainao 的地址,他不能找到它,因为贝纳贝纳不同于 Pe?a Baina



我们已经弄清楚了一些肮脏的工作环境,但是如果有人找到了更优雅的解决方案,我想先检查一下。



这是一些字符的列表这不是转换为US7ASCII:

 字符UTF8代码可能的等价
æ - u00E6 - ae
å - u00E5 - a
ã - u00E3 - a
ñ - u00F1 - n
õ - u00F5 - o


解决方案

1)使用 nlssort 与BINARY_AI(两种情况和重音无关):

  SQL>选择nlssort('PeñaBáináõ','NLS_SORT = BINARY_AI')C from dual; 

C
------------------------
70656E61206261696E616F00

SQL>选择nlssort('Pena Bainao','NLS_SORT = BINARY_AI')C from dual;

C
------------------------
70656E61206261696E616F00

SQL>选择nlssort('pena bainao','NLS_SORT = BINARY_AI')C from dual;

C
------------------------
70656E61206261696E616F00

SQL>从double选择trueT,其中nlssort('pena bainao','NLS_SORT = BINARY_AI')= nlssort('PeñaBáináõ','NLS_SORT = BINARY_AI');

T
----
true

2)您也可以将NLS_SORT会话变量更改为binary_ai,然后每次都不必指定NLS_SORT:

  SQL> ;选择真T从双,其中nlssort('pena bainao')= nlssort('PeñaBáináõ'); 

没有选择行

SQL> alter session set nls_sort = binary_ai;

会话更改。

SQL>选择真T从双,其中nlssort('pena bainao')= nlssort('PeñaBáináõ');

T
----
true

3)要删除使用 nlssort 函数并更改所有内容的语义,还要设置nls_comp会话变量:

  SQL>选择'真'T从双,其中'pena bainao'='PeñaBáináõ'; 

没有选择行

SQL> alter session set nls_comp = linguistic;

会话更改。

SQL>选择'真'T从双,其中'pena bainao'='PeñaBáináõ';

T
----
true

选项1仅更改本地行为,查询需要不同的结果。选项2和3将改变其他查询的行为,可能不是你想要的。有关表5-2 ://download.oracle.com/docs/cd/E11882_01/server.112/e10729/toc.htm>Oracle®数据库全球化支持指南。另请参阅使用语言索引一节以便能够使用索引。


Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as "Peña Báináõ" or with english equivalent charactes like "Pena Bainao". What we did is to convert the text on the query, something like:

SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;

But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:

Pe?a Baina?

So if the user tries to find that addres typing "Pena Bainao" he can't find it because "Pena Bainao" is different from ""Pe?a Baina?"".

We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.

Here is a list of some characters that are not converted to US7ASCII:

Character     UTF8 Code     Possible Equivalent   
æ         -   u00E6      -      ae
å         -   u00E5      -       a
ã         -   u00E3      -       a
ñ         -   u00F1      -       n
õ         -   u00F5      -       o

解决方案

1) Using nlssort with BINARY_AI (Both case and accent insentive):

SQL> select nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('Pena Bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('pena bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select 'true' T from dual where nlssort('pena bainao', 'NLS_SORT = BINARY_AI') = nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') ;

T
----
true

2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

no rows selected

SQL> alter session set nls_sort = binary_ai;

Session altered.

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

T
----
true

3) To drop the use of nlssort function and change the sematics of everything, also set the nls_comp session variable:

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

no rows selected

SQL> alter session set nls_comp = linguistic;

Session altered.

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

T
----
true

Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section "Using Linguistic Indexes" to see how to be able to use indexes.

这篇关于Oracle使用非英文字符搜索文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆