在猪地图中查找可变键 [英] Looking up variable keys in pig map

查看:187
本文介绍了在猪地图中查找可变键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用猪将文字分解成较小的单词,然后在地图上查找每个单词。这是我的示例地图,我在map.txt(它只有1行长):

I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):

[this#1.9,is#2.5my#3.3,vocabulary#4.1]

我这样加载:

M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);

其工作正常。然后,我执行以下操作来加载文本并将其分解成较小的单词:

which works just fine. Then I do the following to load the text and break it into lowercased words:

LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);

现在,我想做这样的事情:

Now, I'd like to do something like this:

RESULTS = FOREACH TOKENS GENERATE M.mp#word;

所以如果我有一个像我的我的词汇这样的行,我会得到以下输出:1 3 3 4,但我不断收到各种错误。我如何在地图中查找变量值?

so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?

我看过如何在Apache Pig中使用地图数据类型? http://pig.apache.org/docs/r0.10.0/basic.html#map-schema ,但是如果我在地图上查找固定的值,这些仅仅是有帮助的,例如M.mp#this,这不是我想在这里做的。

I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.

推荐答案

您还可以FLATTEN M,然后根据令牌/单词JOIN M和LINES(您可以进行复制加入在M上,所以它将是每个映射器的副本

You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper

这篇关于在猪地图中查找可变键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆