如何在PIG中匹配'，'? [英] How to match ',' in PIG?

查看：115 发布时间：2020/9/3 20:21:01 apache-pig

本文介绍了如何在PIG中匹配'，'?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面的Pig脚本给出了文件中各种字符的计数.它适用于除'，'以外的所有字符.

The below pig script gives the count of various characters in a file. It works for all characters except ','.

我的代码:

A = load 'a.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = filter B by word matches '(.+)';
D = foreach C generate flatten(TOKENIZE(REPLACE(word,'','|'), '|')) as letter;
E = group D by letter;
F = foreach E generate COUNT(D), group;
store F into 'pigfiles/wordcount';

这将匹配'，'以外的所有字符并给出输出.

This matches all characters except ',' and gives an output.

输入:(目录a.txt)

Input: (cat a.txt)

HI, I.

输出:(生成的文件输出)

Output:(output in file generated)

1 H
2 I
1 .

它不提供文件中,的计数.我不明白为什么它没有给出'，'的计数！

It doesn't give the count of , in the file. I don't understand why it isn't giving the count of ',' !

推荐答案

第一个标记化将消除标记分隔符空间，双引号()，逗号(，)括号(())，星号(*).替换以对每个字符进行标记，然后计数.请参见下文

The first tokenize will eliminate the token separators space, double quote("), coma(,) parenthesis(()), star(*).Instead use replace to tokenize each character and then count.See below

输入

HI, I.

PigScript

A = LOAD 'test3.txt';
B = FOREACH A GENERATE FLATTEN(TOKENIZE(REPLACE((chararray)$0,'','|'), '|')) AS letter;
C = FILTER B  BY letter != ' ';
D = GROUP C BY letter;
E = FOREACH D GENERATE COUNT(C.letter), group;
DUMP E;

输出

这篇关于如何在PIG中匹配'，'?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在PIG中匹配'，'? [英] How to match ',' in PIG?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在PIG中匹配'，'? [英] How to match &#39;,&#39; in PIG?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何在PIG中匹配'，'? [英] How to match ',' in PIG?

登录关闭