Hive SQL 查询中 Regexp_replace 的奇怪行为 [英] Strange behaviour of Regexp_replace in a Hive SQL query

查看:54
本文介绍了Hive SQL 查询中 Regexp_replace 的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些输入信息,我试图从我的输入中删除部分 .0,其中 ID 字符串以 .0 结尾.

I have some input information where I'm trying to remove the part .0 from my input where an ID string ends with .0.

select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0';

输入:

01-0230984.03
12345098.0
34567.0

预期输出:

01-0230984.03 
12345098
34567

但我得到的结果如下:它正在删除旁边带有 0 的任何字符,而不是只删除以 .0 结尾的出现

But the result I'm getting is as follows: It's removing any character having with a 0 next to it instead of removing only the occurrences that end with .0

0129843
123498
34567

我做错了什么?有人可以帮忙吗?

What am I doing wrong? Can someone please help?

推荐答案

正则表达式中的点 具有特殊含义(表示任何字符).如果字面上需要点 (.),则应使用双斜杠(在 Hive 中)将其屏蔽.还要添加行尾锚点($):

Dot in regexp has special meaning (it means any character). If you need dot (.) literally, it should be shielded using double-slash (in Hive). Also add end-of-the-line anchor($):

with mydata as (
select stack(3,
'01-0230984.03',
'12345098.0',
'34567.0'
) as str
)

select regexp_replace(str,'\.0$','') from mydata;

结果:

01-0230984.03
12345098
34567

Regexp '\.0$' 字面意思是点零 (.0),行尾 ($).

Regexp '\.0$' means dot zero (.0) literally, end of the line ($).

这篇关于Hive SQL 查询中 Regexp_replace 的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆