如何在配置单元中使用正则表达式排除字符串中的特殊字符 [英] How to exclude special characters in a string using regular expressions in hive

查看:25
本文介绍了如何在配置单元中使用正则表达式排除字符串中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想排除句号(.)和大括号((,)).
但是,十进制数字应该保持不变

所以基本上如果输入是

<块引用>

Hive 仅支持 FROM 子句中的子查询(通过 Hive 0.12).必须为子查询命名,因为 FROM 中的每个表子句必须有一个名字.子查询选择列表中的列必须有独一无二的名字.

输出应该是

<块引用>

Hive 通过 Hive 0.12 仅支持 FROM 子句中的子查询必须为子查询命名,因为 FROM 子句中的每个表子查询选择列表中的列必须有唯一的名称名字

解决方案

with t as (select 'Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery must be given a name,因为每个FROM 子句中的表必须有一个名称.子查询选择列表中的列必须有唯一的名称.' as mycol)选择 regexp_replace(mycol,'(\\d+\\.\\d+)|[.()]','$1'),'\\((.*?)\\)'从T

<块引用>

Hive 通过 Hive 0.12 仅支持 FROM 子句中的子查询必须为子查询命名,因为 FROM 子句中的每个表子查询选择列表中的列必须有唯一的名称名字

I want to exclude periods(.) and braces ((,)).
However, decimal numbers should be left intact

So basically if the input is

Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.

The output should be

Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in a FROM clause must have a name Columns in the subquery select list must have unique names

解决方案

with t as (select 'Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names.' as mycol)

select  regexp_replace(mycol,'(\\d+\\.\\d+)|[.()]','$1'),'\\((.*?)\\)'
from    t

Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in a FROM clause must have a name Columns in the subquery select list must have unique names

这篇关于如何在配置单元中使用正则表达式排除字符串中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆