DataFrame 对象未显示任何数据 [英] DataFrame Object is not showing any data

查看:38
本文介绍了DataFrame 对象未显示任何数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用 spark csv lib 在 hdfs 文件上创建数据帧对象,如图 在本教程中.

I was trying to create a dataframe object on a hdfs file using spark csv lib as shown in this tutorial.

但是当我尝试获取 DataFrame 对象的计数时,它显示为 0

But when i tried to get the count of DataFrame object , it is showing as 0

这是我的文件,

employee.csv:

employee.csv:

empid,empname
1000,Tom
2000,Jerry

我加载了上面的文件,

val empDf = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("delimiter",",").load("hdfs:///user/.../employee.csv");

当我查询时,empDf object.printSchema() 给出了带有 empid,empname 作为字符串字段的正确模式,我可以看到分隔符被正确读取.

When i queried like, empDf object.printSchema() is giving proper schema with empid,empname as string fields and i could see that delimiter was read properly.

但是当我尝试使用显示数据帧时,empDf.show 只给出列标题,其中没有数据,当我执行 empDf.count 时给出 0 条记录.

But when i tried to display the dataFrame using, empDf.show giving only column header and no data in it and when i do empDf.count giving 0 records.

如果我错过了这里非常需要做的事情,请纠正我.

Please correct me if i missed something to do which is very much required here.

推荐答案

请确保 spark-csv 版本和用于构建 Spark 发行版的 Scala 版本相同.

Be sure that the spark-csv version and the Scala version with which your Spark distribution is built are the same.

例如,如果您的 Spark 发行版是使用 Scala 2.10(Databricks 预构建 Spark 发行版的默认 Scala 版本)构建的,您将需要 spark-csv_2.10 - 版本 spark-csv_2.11(在提到的教程中显示)将不起作用,并将返回一个只有列名的空数据框 - 请参阅 我对这个问题的回答对于类似的情况.

For example, if your Spark distro is built with Scala 2.10 (the default Scala version for Databricks prebuilt Spark distros), you will need spark-csv_2.10 - version spark-csv_2.11 (shown in the mentioned tutorial) will not work, and will return an empty dataframe with only column names - see my answer to this SO question for a similar case.

这篇关于DataFrame 对象未显示任何数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆