运行字数统计映射时无法访问wasb FileSystem减少作业 [英] Not able to access wasb FileSystem while running word count map reduce job

查看:110
本文介绍了运行字数统计映射时无法访问wasb FileSystem减少作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

java -jar WordCount201.jar wasb://hexhadoopcluster-2019-05-15t07-01-07-193z@hexanikahdinsight.blob.core.windows.net/hexa/custdata.csv wasb:// hexhadoopcluster -2019-05-15t07-01-07-193z@hexanikahdinsight.blob.core.windows.net/hexa

java -jar WordCount201.jar wasb://hexhadoopcluster-2019-05-15t07-01-07-193z@hexanikahdinsight.blob.core.windows.net/hexa/custdata.csv wasb://hexhadoopcluster-2019-05-15t07-01-07-193z@hexanikahdinsight.blob.core.windows.net/hexa

log4j:警告找不到适用于logger的追加程序(org.apache。 hadoop.metrics2.lib.MutableMetricsFactory)。

log4j:WARN请正确初始化log4j系统。

log4j:WARN请参阅http://logging.apache.org/log4j /1.2/faq.html#noconfig了解更多信息。

java.io.IOException:没有用于方案的文件系统:isb

at org.apache.hadoop.fs.FileSystem .getFileSystemClass(FileSystem.java:2421)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)

at org.apache.hadoop .fs.FileSystem.access $ 200(FileSystem.java:88)

at org.apache.hadoop.fs.FileSystem $ Cache.getInternal(FileSystem.java:2467)

atg.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:2449)

at org.a pache.hadoop.fs.FileSystem.get(FileSystem.java:367)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:446)
$
at HexanikaWordCount.HexanikaWordCount.WordCount.main(WordCount.java:29)

at sun.reflect.NativeMethodAccessorImpl.invoke0(原生方法)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun .reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.eclipse .jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.ja

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.io.IOException: No FileSystem for scheme: wasb
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:446)
at HexanikaWordCount.HexanikaWordCount.WordCount.main(WordCount.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.ja

推荐答案

嗨JPurva,

Hi JPurva,

欢迎使用Azure!

我希望您是Azure HDInsight的新手。

I hope you are new to Azure HDInsight.

您需要使用" yarn jar "运行MapReduce作业的命令。

You need to use "yarn jar" command to run MapReduce jobs.

提交MapRed的语法工作:

Syntax to submit MapReduce Job:

纱瓶<纱瓶的完全合格路径> <类名> <输入路径> <输出路径>

示例: yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount / example /data/gutenberg/davinci.txt / example / data / WordCountOutput

注意:输入文件和任何输出文件都存储为默认值集群的存储。



Note: The input file and any output files are stored to the default storage for the cluster.

如wordcount示例的帮助中所述,您还可以指定多个输入文件。

As noted in the help for the wordcount sample, you could also specify multiple input files.

示例:  yarn jar / usr /hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /example/data/gutenberg/davinci.txt/example/data/gutenberg/ulysses.txt / example / data / twowordcount将计算davinci中的单词.txt和ulysses.txt。

Example: yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /example/data/gutenberg/davinci.txt /example/data/gutenberg/ulysses.txt /example/data/twowordcountwould count words in both davinci.txt and ulysses.txt.

默认情况下,Hadoop使用文本输入阅读器,从输入文件中逐行输入映射器。映射器中的键是读取的行数。但请注意CSV文件,因为单个列/字段可以包含换行符。你可能想要
寻找像这样的CSV输入阅读器:

By default, Hadoop uses a Text Input reader that feeds the mapper line by line from the input file. The key in the mapper is the number of lines read. Be careful with CSV files though, as single columns/fields can contain a line break. You might want to look for a CSV input reader like this one:

https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop /mapreduce/lib/input/CSVNLineInputFormat.java

有关详细信息,请参阅"如何在HDInsight群集上运行MapReduce作业"。

For more details, learn "how to run MapReduce jobs on HDInsight clusters".

希望这会有所帮助。


这篇关于运行字数统计映射时无法访问wasb FileSystem减少作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆