在猪图中查找变量键 [英] Looking up variable keys in pig map
问题描述
我正在尝试使用 pig 将文本分解为小写单词,然后在地图中查找每个单词.这是我在 map.txt 中的示例地图(只有 1 行长):
I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):
[this#1.9,is#2.5my#3.3,vocabulary#4.1]
我是这样加载的:
M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);
效果很好.然后我执行以下操作以加载文本并将其分解为小写单词:
which works just fine. Then I do the following to load the text and break it into lowercased words:
LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);
现在,我想做这样的事情:
Now, I'd like to do something like this:
RESULTS = FOREACH TOKENS GENERATE M.mp#word;
这样,如果我有像这是我的词汇表"这样的行,我会得到以下输出: 1 3 3 4 ,但我不断收到各种错误.如何在地图中查找变量值?
so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?
我看过我如何使用Apache Pig 中的地图数据类型? 和 http://pig.apache.org/docs/r0.10.0/basic.html#map-schema ,但这些仅在我在地图中查找 fixed 值时才有帮助,例如 M.mp#'this',这不是我想在这里做的.
I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.
推荐答案
您也可以 FLATTEN M,然后根据 Token/word JOIN M 和 LINES(您可以在 M 上进行复制"联接,因此它会复制到每个映射器
You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper
这篇关于在猪图中查找变量键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!