具有连字符的Oracle正则表达式在Windows上的结果与在Unix上的结果不同 [英] Oracle regular expression having a hyphen doesn't give same result on Windows as on Unix

查看:106
本文介绍了具有连字符的Oracle正则表达式在Windows上的结果与在Unix上的结果不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下带有正则表达式的查询:

I have the following query with a regular expression:

select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG 
from dual;

Windows 计算机上通过SQL * Plus执行时,它将返回以下内容:

When executed via SQL*Plus on a Windows machine it returns the following:

SQL>  select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST 3304 V2

SunOS 机器上,我得到了不同的结果:

On a SunOS machine I get a different result:

SQL>  select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST      V

这些查询是针对同一Oracle服务器运行的.造成输出差异的任何原因吗?

These queries were run against the same Oracle server. Is there any reason for that difference in the output?

SQL * Plus版本:

SQL*Plus: Release 11.2.0.1.0 Production on Mar. Oct. 14 15:36:35 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Unix上的SQL * Plus版本:

SQL*Plus: Release 11.2.0.2.0 Production on Tue Oct 14 16:01:26 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

推荐答案

正如Avinash Raj在评论中说的那样,正则表达式模式中的连字符被解释为一个范围.该行为似乎取决于两个客户端基于NLS_LANG环境变量使用的排序算法,该算法会影响NLS_SORT值.

As Avinash Raj said in comments, the hyphen in your regular expression pattern is being interpreted as a range. The behaviour seems to be dependent on the sorting algorithm being used by the two clients, based on the NLS_LANG environment variable, which influences the NLS_SORT value.

使用NLS_LANG=ENGLISH_UNITED KINGDOM.WE8ISO8859P1:

SQL> select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST      V

SQL> select value from nls_session_parameters where parameter = 'NLS_SORT';

VALUE
----------
BINARY

四处走动,就像你的个人资料说你在摩洛哥一样,NLS_LANG="ARABIC_MOROCCO.AR8MSWIN1256":

Going out on a limb as your profile says you're in Morocco, with NLS_LANG="ARABIC_MOROCCO.AR8MSWIN1256":

SQL> select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST 3304 V2

SQL> select value from nls_session_parameters where parameter = 'NLS_SORT';

VALUE
----------
ARABIC

原因是图案段+-=被视为覆盖从+=的所有字符的范围.在ISO8859-1和 Windows 1252字符集中,即字符43至61,以及所有数字落在该范围内-例如零为48-在该范围内,因此用正则表达式替换它们.在 Windows 1256字符集中也是如此. (以及任何基于ASCII的内容.)

The reason is that the pattern segment +-= is treated as a range covering all characters from + to =. In the ISO8859-1 and Windows 1252 character set that is characters 43 to 61, and all the numeric digits fall within that range - zero is 48 for example - are within that range, so the regex replaces them. That is also true in the Windows 1256 character set. (And anything based on ASCII).

但是您的NLS_LANG也在隐式更改排序顺序,并且它从BINARY切换为ARABIC排序,从而改变了行为.您可以在一个会话中看到它;与NLS_LANG=ENGLISH_UNITED KINGDOM.WE8ISO8859P1:

But your NLS_LANG is also implicitly changing the sort order, and it's switch from BINARY to ARABIC sorting that changes the behaviour. You can see that within a single session; with NLS_LANG=ENGLISH_UNITED KINGDOM.WE8ISO8859P1:

SQL> select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST      V

SQL> alter session set NLS_SORT=ARABIC;

Session altered.

SQL> select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-={}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST 3304 V2

您还可以通过稍微修改范围来确定这是一个范围问题;将+-=更改为+-3,以便不包括更高的数字,但其他所有内容均保持不变:

You can also tell that it is a range issue by slightly modifying the range; changing +-= to +-3 so higher digits are not included, but leaving everything else the same:

SQL> alter session set NLS_SORT=BINARY;

Session altered.

SQL> select REGEXP_REPLACE ('TEST 3304 V2', '[`~!@#$%^&*()_+-3{}|;.:<>?,./]', ' ') as REG from dual;

REG
------------
TEST    4 V

详细了解语言排序.

但是,始终依靠NLS设置存在风险,因此最好通过更改模式以在开头或结尾使用连字符来完全避免范围问题,从而根本不将其视为范围.再次如阿维纳什·拉杰(Avinash Raj)的建议.

Relying on NLS settings is always risky though, so it's better to avoid the range issue entirely by changing the pattern to have the hyphen at the beginning or end, which stops it being seen as a range at all; again as Avinash Raj suggested.

这篇关于具有连字符的Oracle正则表达式在Windows上的结果与在Unix上的结果不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆