Hive SQL查询中Regexp_replace的异常行为 [英] Strange behaviour of Regexp_replace in a Hive SQL query
问题描述
我有一些输入信息,试图从输入中删除ID字符串以.0
结尾的部分.0
.
I have some input information where I'm trying to remove the part .0
from my input where an ID string ends with .0
.
select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0';
输入:
01-0230984.03
12345098.0
34567.0
预期输出:
01-0230984.03
12345098
34567
但是我得到的结果如下:它删除所有旁边带有0
的字符,而不是仅删除以.0
结尾的出现
But the result I'm getting is as follows: It's removing any character having with a 0
next to it instead of removing only the occurrences that end with .0
0129843
123498
34567
我做错了什么?有人可以帮忙吗?
What am I doing wrong? Can someone please help?
推荐答案
点在正则表达式中具有特殊含义(表示任何字符).如果您确实需要点(.),则应使用双斜杠(在Hive中)对其进行屏蔽.还要添加行尾锚($):
Dot in regexp has special meaning (it means any character). If you need dot (.) literally, it should be shielded using double-slash (in Hive). Also add end-of-the-line anchor($):
with mydata as (
select stack(3,
'01-0230984.03',
'12345098.0',
'34567.0'
) as str
)
select regexp_replace(str,'\\.0$','') from mydata;
结果:
01-0230984.03
12345098
34567
Regexp '\\.0$'
字面意思是点零($
).
Regexp '\\.0$'
means dot zero (.0
) literally, end of the line ($
).
这篇关于Hive SQL查询中Regexp_replace的异常行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!