无法解析字段名称中的列名称 [英] Cannot resolve column name among field names

查看：186 发布时间：2021/4/8 20:23:01 apache-spark dataframe

本文介绍了无法解析字段名称中的列名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了如下数据框:

val bankDF = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").option("delimiter",";").load("/user/pvviswanathan_yahoo_com/Bank_Dataset.csv");

bankDF: org.apache.spark.sql.DataFrame = ["age";"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"
campaign";"pdays";"previous";"poutcome";"y": string]

之后，当我尝试以下操作时，它将引发错误-无法在字段名称中解析列名称"age"

After that when I tried the below, it is throwing error - Cannot resolve column name "age" among field names

bankDF.groupBy("age").count().show;

org.apache.spark.sql.AnalysisException: Cannot resolve column name "age" among ("age";"job";"marital";"education";"default";"balance";"housing";"loan
";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y");

推荐答案

在尝试使用 CSV 文件时，我遇到了同样的问题.

I had the same problem when tried to work with CSV files.

      Dataset<Row> students = spark.read().format("csv")
            .option("sep", ";")
            .option("inferSchema", "true")
            .option("header", "true")
            .load("data/students.csv");

使用Raphael Roth的建议，我打印了 Students 模式，发现Spark确实将所有列都视为一个值:

Using the advice of Raphael Roth, I printed the Students schema and discovered that indeed Spark considers all the column as one value:

 +----------------------+
 |studentId, name, lname|
 +----------------------+
 |      1, Mickey, Mouse|
 |       2, Donald, Duck|
 +----------------------+

root
  |-- studentId, name, lname: string (nullable = true)

我得到的错误是

无法解析(studentId，name，lname)中的列名称"studentId"；

Cannot resolve column name "studentId" among (studentId, name, lname);

所以问题确实出在 seperator 字符上.所以我改变了

So the problem was indeed in the seperator character. So i changed

 .option("sep", ";")

成为

 .option("sep", ",")

(实际上CSV分隔符是，)

(as indeed the CSV separator is ,)

现在架构是正确的:

root
  |-- studentId: integer (nullable = true)
  |--  name: string (nullable = true)
  |--  lname: string (nullable = true)

这篇关于无法解析字段名称中的列名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法解析字段名称中的列名称 [英] Cannot resolve column name among field names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

无法解析字段名称中的列名称 [英] Cannot resolve column name among field names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭