如何在 PIG 中匹配“,"? [英] How to match ',' in PIG?
问题描述
下面的 pig 脚本给出了文件中各种字符的数量.它适用于除 ',' 之外的所有字符.
The below pig script gives the count of various characters in a file. It works for all characters except ','.
我的代码:
A = load 'a.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = filter B by word matches '(.+)';
D = foreach C generate flatten(TOKENIZE(REPLACE(word,'','|'), '|')) as letter;
E = group D by letter;
F = foreach E generate COUNT(D), group;
store F into 'pigfiles/wordcount';
这匹配除,"之外的所有字符并给出输出.
This matches all characters except ',' and gives an output.
输入:(cat a.txt)
Input: (cat a.txt)
HI, I.
输出:(生成的文件中的输出)
Output:(output in file generated)
1 H
2 I
1 .
它没有给出文件中 ,
的数量.我不明白为什么它没有给出 ',' 的计数!
It doesn't give the count of ,
in the file. I don't understand why it isn't giving the count of ',' !
推荐答案
第一个 tokenize 会消除标记分隔符空格,双引号("), coma(,) 括号(()), star(*).而是使用替换以标记每个字符然后计数.见下文
The first tokenize will eliminate the token separators space, double quote("), coma(,) parenthesis(()), star(*).Instead use replace to tokenize each character and then count.See below
输入
HI, I.
PigScript
A = LOAD 'test3.txt';
B = FOREACH A GENERATE FLATTEN(TOKENIZE(REPLACE((chararray)$0,'','|'), '|')) AS letter;
C = FILTER B BY letter != ' ';
D = GROUP C BY letter;
E = FOREACH D GENERATE COUNT(C.letter), group;
DUMP E;
输出
这篇关于如何在 PIG 中匹配“,"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!