在猪图中查找变量键 [英] Looking up variable keys in pig map

查看:20
本文介绍了在猪图中查找变量键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 pig 将文本分解为小写单词,然后在地图中查找每个单词.这是我在 map.txt 中的示例地图(只有 1 行长):

I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):

[this#1.9,is#2.5my#3.3,vocabulary#4.1]

我是这样加载的:

M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);

效果很好.然后我执行以下操作以加载文本并将其分解为小写单词:

which works just fine. Then I do the following to load the text and break it into lowercased words:

LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);

现在,我想做这样的事情:

Now, I'd like to do something like this:

RESULTS = FOREACH TOKENS GENERATE M.mp#word;

这样,如果我有像这是我的词汇表"这样的行,我会得到以下输出: 1 3 3 4 ,但我不断收到各种错误.如何在地图中查找变量值?

so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?

我看过我如何使用Apache Pig 中的地图数据类型?http://pig.apache.org/docs/r0.10.0/basic.html#map-schema ,但这些仅在我在地图中查找 fixed 值时才有帮助,例如 M.mp#'this',这不是我想在这里做的.

I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.

推荐答案

您也可以 FLATTEN M,然后根据 Token/word JOIN M 和 LINES(您可以在 M 上进行复制"联接,因此它会复制到每个映射器

You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper

这篇关于在猪图中查找变量键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆