如何使用pig从hdfs加载twitter数据? [英] how to load twitter data from hdfs using pig?

查看:23
本文介绍了如何使用pig从hdfs加载twitter数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是使用flume流式传输一些twitter数据并将其聚类到HDFS中,现在我尝试将其加载到pig中进行分析.由于默认的JsonLoader函数无法加载数据,所以我在谷歌中搜索了一些可以加载这种类型的库数据.我找到了这个 link 并关注那里说明.

I just streaming some twitter data using flume and cluster it into HDFS now I try to load it into pig for analysis.As the default JsonLoader function can not load the data so I search in google for some library which can load this kind of data.I found this link and follow there instruction.

结果如下

REGISTER '/home/hduser/Downloads/json-simple-1.1.1.jar';

2016-02-22 20:54:46,539 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS

与其他拖曳命令相同.

现在,当我尝试使用此命令加载数据时

Now when I try to load my data using this command

load_tweets = LOAD '/TwitterData/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

它向我展示了这个错误

2016-02-22 20:58:01,639 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.twitter.elephantbird.pig.load.JsonLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/hduser/pig-0.15.0/pig_1456153061619.log

那么如何解决并正确加载?

so how to solve it and load properly?

注意:我的数据是关于最近发布的电影死侍推特数据.

Note:My data is about recent release movie deadpool twitter data.

推荐答案

您需要在 pig 中的 jar 下注册,此 jar 包含您尝试访问的相应类.

You need to register below jar in pig, this jar contains the appropriate class which you are trying to access.

elephant-bird-pig-4.1.jar

elephant-bird-pig-4.1.jar

已编辑:正确的步骤.

REGISTER '/home/hdfs/json-simple-1.1.jar';

REGISTER '/home/hdfs/elephant-bird-hadoop-compat-4.1.jar';

REGISTER '/home/hdfs/elephant-bird-pig-4.1.jar';

load_tweets = LOAD '/user/hdfs/twittes.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;

dump load_tweets;

我在本地集群上使用了上述步骤并且它工作正常,因此您需要在运行负载之前添加这些 jar.

I used above steps on my local cluster and its working fine, so you need to add these jars before running your load.

这篇关于如何使用pig从hdfs加载twitter数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆