配置单元将字符串转换为字符数组 [英] Hive convert a string to an array of characters

查看:95
本文介绍了配置单元将字符串转换为字符数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,如何将字符串转换为字符数组

How can I convert a string to an array of characters, for example

"abcd" -> ["a","b","c","d"]

我知道分割方法:

SELECT split("abcd","");

#["a","b","c","d",""]

是最后一个空格的错误吗?或其他任何想法?

is a bug for the last whitespace? or any other ideas?

推荐答案

这实际上不是错误.蜂巢

This is not actually a bug. Hive split function simply calls the underlying Java String#split(String regexp, int limit) method with limit parameter set to -1, which causes trailing whitespace(s) to be returned.

由于内容已经存在一个精妙的答案,因此我不会深入探讨其发生原因的实现细节问题.请注意,根据您使用的Java版本,str.split("", -1)将返回不同的结果.

I'm not going to dig into implementation details on why it's happening since there is already a brilliant answer that describes the issue. Note that str.split("", -1) will return different results depending on the version of Java you use.

一些替代方法:

  1. 使用"(?!\A|\z)"作为分隔符正则表达式,例如split("abcd", "(?!\\A|\\z)").这将使正则表达式匹配器在字符串的开头和结尾位置跳过零宽度匹配.
  2. 创建一个自定义UDF ,该自定义UDF使用String#toCharArray()或接受limit作为UDF的参数,因此您可以将其用作:SPLIT("", 0)
  1. Use "(?!\A|\z)" as a separator regexp, e.g. split("abcd", "(?!\\A|\\z)"). This will make the regexp matcher skip zero-width matches at the start and at the end positions of the string.
  2. Create a custom UDF that uses either String#toCharArray(), or accepts limit as an argument of the UDF so you can use it as: SPLIT("", 0)

这篇关于配置单元将字符串转换为字符数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆