如何使用正确的DataType获取DataFrame中的值? [英] How to get the values in DataFrame with the correct DataType?
问题描述
当我尝试在DataFrame
中获取一些值时,例如:
When I tried to get some values in a DataFrame
, like:
df.select("date").head().get(0) // type: Any
结果类型为Any
,这不是预期的.
由于dataframe
包含数据的schema
,因此它应该知道每个column
的DataType
,因此当我尝试使用get(0)
获取值时,它应返回正确类型的值.但是,事实并非如此.
The result type is Any
, which is not expected.
Since a dataframe
contains the schema
of the data, it should know the DataType
for each column
, so when i try to get a value using get(0)
, it should return the value with the correct type. However, it does not.
相反,我需要使用getDate(0)
指定要使用的DataType
,这似乎很奇怪,不便,并且使我发疯.
Instead, I need to specify which DataType
i want using getDate(0)
, which seems weird, inconvenient, and makes me mad.
当我创建Dataframe
时为每个column
指定了正确的DataTypes
的schema
时,我不想使用其他的getXXX()' for different
列.
When I have specified the schema
with the correct DataTypes
for each column
when i created the Dataframe
, I don't want to use different getXXX()' for different
column`s.
是否有一些方便的方法可以获取具有自己正确类型的值?也就是说,如何获得schema
中指定的正确DataType
的值?
谢谢!
Are there some convenient ways that I can get the values with their own correct types? That is to say, how can I get the values with the correct DataType
specified in the schema
?
Thank you!
推荐答案
Scala是一种静态类型的语言.因此在Row上定义的get
方法只能返回单一类型的值,因为get方法的返回类型为Any
.对于一个呼叫,它不能返回Int
,而对于另一个呼叫,它不能返回String
.
Scala is a statically typed language. so the get
method defined on the Row can only return values with a single type because the return type of the get method is Any
. It cannot return Int
for one call and a String
for another.
您应该调用每种类型提供的getInt
,getDate
和其他get方法.或getAs
you should be calling the getInt
, getDate
and other get methods provided for each type. Or the getAs
method in which you can pass the type as a parameter (for example row.getAs[Int](0)
).
如评论中所述,其他选项是
As mentioned in the comments other options are
- 使用数据集而不是DataFrame.
- 使用Spark SQL
这篇关于如何使用正确的DataType获取DataFrame中的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!