Spark-如何处理名称中包含空格的列 [英] Spark - How to deal with columns that have blank space in the name

查看:355
本文介绍了Spark-如何处理名称中包含空格的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何从名称中带有空格的Row访问属性.

I would like to know how to access an attribute from a Row that has a blank space in the name.

例如,我有这个Row对象

Row(ONE CATEGORY=u'category') 

如何访问ONE CATEGORY值.通常,我会使用row.oneCategory来访问它,但是在这种情况下,由于空白,这是不可能的.如果可能的话,我更喜欢Python中的建议.

How can I access the ONE CATEGORY value. Normally I would use row.oneCategory to access it, but in this case that's not possible because of the blank space. If possible, I prefer the suggestions in Python.

谢谢

推荐答案

在Python中可以使用

In Python can use getattr function:

row = Row("ONE CATEGORY")("category")
row
## Row(ONE CATEGORY='category')
getattr(row, u"ONE CATEGORY")
## 'category'

Row.asDict方法:

row.asDict()["ONE CATEGORY"]
## 'category'

由于您无法在Scala中使用点语法,因此这并不是真正的问题,但是如果您想按名称访问字段,则可以使用Row.getAs

Since you cannot use dot syntax in Scala it is not really an issue, but if you want to access fields by name you can use Row.getAs

val row = sc.parallelize(Tuple1("category") :: Nil).toDF("ONE CATEGORY").first
row.getAs[String]("ONE CATEGORY")

Row.getValuesMap:

row.getValuesMap[String](Seq("ONE CATEGORY"))("ONE CATEGORY")

在Python和Scala中,您都可以按索引访问值:

In both Python and Scala, you can access value by index:

## row[0]
'category'

row(0)
// Any = category
row.getString(0)
// String = category

最后,您可以在选择过程中使用alias方法来完全避免该问题:

Finally you can use alias method during select to avoid the issue completely:

df.select(col("ONE CATEGORY").alias("ONE_CATEGORY"))

这篇关于Spark-如何处理名称中包含空格的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆