PIG REGEX_EXTRACT ALL 函数 ->没有结果 [英] PIG REGEX_EXTRACT ALL function -> no results

查看:19
本文介绍了PIG REGEX_EXTRACT ALL 函数 ->没有结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经遇到了几个小时的问题.我有一个 .csv 文件,里面有 JSON 字符串.该 .csv 中的每一列都包含一个包含多个 JSON 对象的字符串.我将几列导入 PigStorage.工作至今.然后我尝试提取具有以下形式的 JSON 对象:

I have been encountering an issue for several hours already. I have a .csv file with JSON strings inside. Every column in that .csv contains a string with several JSON objects. I imported several columns into PigStorage. Worked so far. Then I tried to extract the JSON objects which have the following form:

[{"tmestmp":"2014-05-14T07:01:00","Value":0,"Quality":1},{"tmestmp":"2014-05-14T07:01:00.02","Value":10,"Quality":4},{"tmestmp":"2014-05-14T07:01:00.04","Value":17,"Quality":9},{"tmestmp":"2014-05-14T07:01:00.06","Value":75,"Quality":6},{"tmestmp":"2014-05-14T07:01:00.08","Value":63,"Quality":9}];

[{"tmestmp":"2014-05-14T07:01:00","Value":0,"Quality":1},{"tmestmp":"2014-05-14T07:01:00.02","Value":10,"Quality":4},{"tmestmp":"2014-05-14T07:01:00.04","Value":17,"Quality":9},{"tmestmp":"2014-05-14T07:01:00.06","Value":75,"Quality":6},{"tmestmp":"2014-05-14T07:01:00.08","Value":63,"Quality":9}];

这是一列.

Regex_Extract_All 函数在以下代码行中不起作用.有没有人对此有想法?我总是收到空的结果.这是我的代码:

The Regex_Extract_All function does not work woth the following lines of code. Does anyone have an idea on that? I receive always empty results. Here is my code :

 A = LOAD '/user/hue/test.csv' USING PigStorage(';') AS (timestamp, mv1, mv2,mv3,mv4,mv5); --using five columns
 B= foreach A generate mv1,mv2,mv3,mv4,mv5; --removing the timestamp in the first column, not needed anymore
 C= foreach B generate REGEX_EXTRACT_ALL($0, '(\\{[^{]*\\})')AS (T:tuple(r1,r2,r3,r4,r5)); 

如果我只使用一列而不是 $0,则效果不佳.

If I use only one column instead of $0, it does not work as well.

非常欢迎任何帮助或解释.

Any help or explanation is very welcome.

干杯,乔

推荐答案

有一个 JsonLoader() 来读取 JSON 格式的输入.您可以使用 JSsonLoader() 而不是使用 REGEX,它非常易于使用.请参阅 http://joshualande.com/read-write-json-apache-pig/ 了解更多信息.

There is a JsonLoader() to read JSON formatted input. You can use JSsonLoader() instead of using the REGEX and it is very easy to use. Refer http://joshualande.com/read-write-json-apache-pig/ for more Info.

这篇关于PIG REGEX_EXTRACT ALL 函数 ->没有结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆