无法解析字段名称中的列名称 [英] Cannot resolve column name among field names
问题描述
我创建了如下数据框:
val bankDF = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").option("delimiter",";").load("/user/pvviswanathan_yahoo_com/Bank_Dataset.csv");
bankDF: org.apache.spark.sql.DataFrame = ["age";"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"
campaign";"pdays";"previous";"poutcome";"y": string]
之后,当我尝试以下操作时,它将引发错误-无法在字段名称中解析列名称"age"
After that when I tried the below, it is throwing error - Cannot resolve column name "age" among field names
bankDF.groupBy("age").count().show;
org.apache.spark.sql.AnalysisException: Cannot resolve column name "age" among ("age";"job";"marital";"education";"default";"balance";"housing";"loan
";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y");
推荐答案
在尝试使用 CSV
文件时,我遇到了同样的问题.
I had the same problem when tried to work with CSV
files.
Dataset<Row> students = spark.read().format("csv")
.option("sep", ";")
.option("inferSchema", "true")
.option("header", "true")
.load("data/students.csv");
使用Raphael Roth的建议,我打印了 Students
模式,发现Spark确实将所有列都视为一个值:
Using the advice of Raphael Roth, I printed the Students
schema and discovered that indeed Spark considers all the column as one value:
+----------------------+
|studentId, name, lname|
+----------------------+
| 1, Mickey, Mouse|
| 2, Donald, Duck|
+----------------------+
root
|-- studentId, name, lname: string (nullable = true)
我得到的错误是
无法解析(studentId,name,lname)中的列名称"studentId";
Cannot resolve column name "studentId" among (studentId, name, lname);
所以问题确实出在 seperator
字符上.所以我改变了
So the problem was indeed in the seperator
character. So i changed
.option("sep", ";")
成为
.option("sep", ",")
(实际上CSV分隔符是,
)
(as indeed the CSV separator is ,
)
现在架构是正确的:
root
|-- studentId: integer (nullable = true)
|-- name: string (nullable = true)
|-- lname: string (nullable = true)
这篇关于无法解析字段名称中的列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!