DataFrame 对象未显示任何数据 [英] DataFrame Object is not showing any data
问题描述
我试图使用 spark csv lib 在 hdfs 文件上创建数据帧对象,如图 在本教程中.
I was trying to create a dataframe object on a hdfs file using spark csv lib as shown in this tutorial.
但是当我尝试获取 DataFrame 对象的计数时,它显示为 0
But when i tried to get the count of DataFrame object , it is showing as 0
这是我的文件,
employee.csv:
employee.csv:
empid,empname
1000,Tom
2000,Jerry
我加载了上面的文件,
val empDf = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("delimiter",",").load("hdfs:///user/.../employee.csv");
当我查询时,empDf object.printSchema() 给出了带有 empid,empname 作为字符串字段的正确模式,我可以看到分隔符被正确读取.
When i queried like, empDf object.printSchema() is giving proper schema with empid,empname as string fields and i could see that delimiter was read properly.
但是当我尝试使用显示数据帧时,empDf.show 只给出列标题,其中没有数据,当我执行 empDf.count 时给出 0 条记录.
But when i tried to display the dataFrame using, empDf.show giving only column header and no data in it and when i do empDf.count giving 0 records.
如果我错过了这里非常需要做的事情,请纠正我.
Please correct me if i missed something to do which is very much required here.
推荐答案
请确保 spark-csv
版本和用于构建 Spark 发行版的 Scala 版本相同.
Be sure that the spark-csv
version and the Scala version with which your Spark distribution is built are the same.
例如,如果您的 Spark 发行版是使用 Scala 2.10(Databricks 预构建 Spark 发行版的默认 Scala 版本)构建的,您将需要 spark-csv_2.10
- 版本 spark-csv_2.11
(在提到的教程中显示)将不起作用,并将返回一个只有列名的空数据框 - 请参阅 我对这个问题的回答对于类似的情况.
For example, if your Spark distro is built with Scala 2.10 (the default Scala version for Databricks prebuilt Spark distros), you will need spark-csv_2.10
- version spark-csv_2.11
(shown in the mentioned tutorial) will not work, and will return an empty dataframe with only column names - see my answer to this SO question for a similar case.
这篇关于DataFrame 对象未显示任何数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!