在Pig中分组后选择字段 [英] Selecting fields after grouping in Pig

查看:73
本文介绍了在Pig中分组后选择字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可能缺少一些琐碎的东西,但是我无法使它正常工作.我有一个电影"对象,包括标题,演员,年份和角色.现在,我想要的是带有标题的结果,以及包含角色/角色对的嵌套包.

There's probably something very trivial that I'm missing, but I just can't get this to work. I have a "movies" object, with title, actor, year and role. Now what I want, is to have results with the title, along with a nested bag containing actor/role pairs.

如果我只是执行group movies by title,我最终会得到类似(title,{movie objects})的结果,除了标题和年份也出现在那里的电影对象中之外,这是完美的.我只想要演员和角色.

If I just do group movies by title, I end up with results like (title, {movie objects}) which would be perfect, except that the title and year also appear in the movie objects there. I want just the actor and role.

我也尝试过foreach movie_groups generate group, movies.actor, movies.role,但是最后我得到了(标题,{所有演员},{所有角色}),这显然是错误的.

I also tried foreach movie_groups generate group, movies.actor, movies.role but then I end up with (title, {all actors}, {all roles}) which is obviously wrong.

在SQL中,这是如此琐碎,以至于我因无法弄清这一点而感到非常愚蠢.会有人提出建议吗?

In SQL this would be so trivial that I can't help but feel incredibly stupid for not being able to figure this out. Would anyone have a suggestion?

推荐答案

看电影的格式会很有帮助,但是我假设它是这样的:

It would be helpful to see the format of movies, but I'm assuming it is something like this:

MovieTitle1 Year1 Actor1 Role1
MovieTitle1 Year2 Actor2 Role2
etc.

在这种情况下,我会这样:

In that case, I would do it like this:

result = FOREACH (GROUP movies BY title)  
         GENERATE FLATTEN(group), movies.(actor, role) AS actors ;

此外,您还提到电影也包含年份.如果您不需要该字段,则最好先仅投影所需的字段(标题,演员,角色).

Also, you mention that the movies contain the year as well. If you do not need that field it might be worthwhile to project only the fields that you need (title, actor, role) first.

这篇关于在Pig中分组后选择字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆