无法解析...给定的输入列 [英] can't resolve ... given input columns
问题描述
我正在阅读 O'Reilly 的 Spark: The Definitive Guide 一书,但在尝试执行简单的 DataFrame 操作时遇到了错误.
I'm going through the Spark: The Definitive Guide book from O'Reilly and I'm running into an error when I try to do a simple DataFrame operation.
数据如下:
DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
United States,Romania,15
United States,Croatia,1
...
然后我用(在 Pyspark 中)阅读它:
I then read it with (in Pyspark):
flightData2015 = spark.read.option("inferSchema", "true").option("header","true").csv("./data/flight-data/csv/2015-summary.csv")
然后我尝试运行以下命令:
Then I try to run the following command:
flightData2015.select(max("count")).take(1)
我收到以下错误:
pyspark.sql.utils.AnalysisException: "cannot resolve '`u`' given input columns: [DEST_COUNTRY_NAME, ORIGIN_COUNTRY_NAME, count];;
'Project ['u]
+- AnalysisBarrier
+- Relation[DEST_COUNTRY_NAME#10,ORIGIN_COUNTRY_NAME#11,count#12] csv"
我什至不知道u"来自哪里,因为它不在我的代码中,也不在数据文件头中.我读到另一个建议,这可能是由标题中的空格引起的,但这在这里不适用.知道要尝试什么吗?
I don't know where "u" is even coming from, since it's not in my code and it isn't in the data file header either. I read another suggestion that this could be caused by spaces in the header, but that's not applicable here. Any idea what to try?
注意:奇怪的是,当我使用 SQL 而不是 DataFrame 转换时,同样的事情会起作用.这有效:
NOTE: The strange thing is, the same thing works when I use SQL instead of the DataFrame transformations. This works:
flightData2015.createOrReplaceTempView("flight_data_2015")
spark.sql("SELECT max(count) from flight_data_2015").take(1)
我还可以执行以下操作并且效果很好:
I can also do the following and it works fine:
flightData2015.show()
推荐答案
您的问题是您正在调用内置的 max
函数,而不是 pyspark.sql.functions.max
.
Your issue is that you are calling the built-in max
function, not pyspark.sql.functions.max
.
当 python 在你的代码中计算 max("count")
时,它返回字母 'u'
,这是组成字母的集合中的最大值字符串.
When python evaluates max("count")
in your code it returns the letter 'u'
, which is the maximum value in the collection of letters that make up the string.
print(max("count"))
#'u'
试试这个:
import pyspark.sql.functions as f
flightData2015.select(f.max("count")).show()
这篇关于无法解析...给定的输入列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!