在Cloudera VM中阅读教程CSV文件时出现异常 [英] Exceptions when reading tutorial CSV file in the Cloudera VM

查看：345 发布时间：2017/2/26 15:51:14 python csv hadoop pyspark cloudera-quickstart-vm

本文介绍了在Cloudera VM中阅读教程CSV文件时出现异常的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Cloudera虚拟机附带的Spark教程。但即使我使用正确的行结束编码，我不能执行脚本，因为我得到吨的错误。
本教程是Coursera 大数据分析简介课程的一部分。可在此处找到。

I'm trying to do a Spark tutorial that comes with the Cloudera Virtual Machine. But even though I'm using the correct line-ending encoding, I can not execute the scripts, because I get tons of errors. The tutorial is part of the Coursera Introduction to Big Data Analytics course. The assignment can be found here.

这是我做的。安装IPython shell（如果尚未完成）：

So here's what I did. Install the IPython shell (if not yet done):

sudo easy_install ipython==1.2.1

打开/启动shell（使用1.2.0或1.4.0）：

Open/Start the shell (either with 1.2.0 or 1.4.0):

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.2.0

将行尾设置为Windows样式。这是因为文件是在Windows编码，并在课程中说这样做。如果你不这样做，你会得到其他错误。

Set the line-endings to windows style. This is because the file is in windows-encoding and it's said in the course to do so. If you don't do this, you'll get other errors.

sc._jsc.hadoopConfiguration().set('textinputformat.record.delimiter','\r\n')

尝试加载CSV file：

Trying to load the CSV file:

yelp_df = sqlCtx.load(source='com.databricks.spark.csv',header = 'true',inferSchema = 'true',path = 'file:///usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv')

但得到一个非常长的错误列表，其开头如下：

But getting a very long list of errors, which starts like this:

Py4JJavaError: An error occurred while calling o23.load.: java.lang.RuntimeException: 
Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)

此处可以看到完整的错误消息。这是/etc/hive/conf/hive-site.xml

The full error message can be seen here. And this is the /etc/hive/conf/hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
  <!-- that are implied by Hadoop setup variables.                                                -->
  <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
  <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
  <!-- resource).                                                                                 -->

  <!-- Hive Execution Parameters -->

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>cloudera</value>
  </property>

  <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
    <description>This is the WAR file with the jsp content for Hive Web Interface</description>
  </property>

  <property>
    <name>datanucleus.fixedDatastore</name>
    <value>true</value>
  </property>

  <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>false</value>
  </property>

  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://127.0.0.1:9083</value>
    <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
  </property>
</configuration>

任何帮助或想法如何解决？我想这是一个很常见的错误。但是我找不到任何解决方案。

Any help or idea how to solve that? I guess it's a pretty common error. But I couldn't find any solution, yet.

还有一件事：有一种方法可以将这样长的错误消息转储到单独的日志文件中吗？

One more thing: is there a way to dump such long error messages into a separate log-file?

在Cloudera VM中阅读教程CSV文件时出现异常 [英] Exceptions when reading tutorial CSV file in the Cloudera VM

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Cloudera VM中阅读教程CSV文件时出现异常 [英] Exceptions when reading tutorial CSV file in the Cloudera VM

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭