配置单元将字符串转换为字符数组 [英] Hive convert a string to an array of characters
问题描述
例如,如何将字符串转换为字符数组
How can I convert a string to an array of characters, for example
"abcd" -> ["a","b","c","d"]
我知道分割方法:
SELECT split("abcd","");
#["a","b","c","d",""]
是最后一个空格的错误吗?或其他任何想法?
is a bug for the last whitespace? or any other ideas?
推荐答案
这实际上不是错误.蜂巢 String#split(String regexp, int limit)
方法,其limit
参数设置为-1
,这将导致返回尾随空格.
This is not actually a bug. Hive split function simply calls the underlying Java String#split(String regexp, int limit)
method with limit
parameter set to -1
, which causes trailing whitespace(s) to be returned.
由于内容已经存在一个精妙的答案,因此我不会深入探讨其发生原因的实现细节问题.请注意,根据您使用的Java版本,str.split("", -1)
将返回不同的结果.
I'm not going to dig into implementation details on why it's happening since there is already a brilliant answer that describes the issue. Note that str.split("", -1)
will return different results depending on the version of Java you use.
一些替代方法:
- 使用
"(?!\A|\z)"
作为分隔符正则表达式,例如split("abcd", "(?!\\A|\\z)")
.这将使正则表达式匹配器在字符串的开头和结尾位置跳过零宽度匹配. - 创建一个自定义UDF ,该自定义UDF使用
String#toCharArray()
或接受limit
作为UDF的参数,因此您可以将其用作:SPLIT("", 0)
- Use
"(?!\A|\z)"
as a separator regexp, e.g.split("abcd", "(?!\\A|\\z)")
. This will make the regexp matcher skip zero-width matches at the start and at the end positions of the string. - Create a custom UDF that uses either
String#toCharArray()
, or acceptslimit
as an argument of the UDF so you can use it as:SPLIT("", 0)
这篇关于配置单元将字符串转换为字符数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!