从包含嵌套值的Spark列中提取值 [英] Extracting values from a Spark column containing nested values

查看:121
本文介绍了从包含嵌套值的Spark列中提取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的mongodb集合的架构的一部分:

This is part of the schema of my mongodb collection:

|-- variables: struct (nullable = true)  
|    |-- actives: struct (nullable = true)  
|    |    |-- data: struct (nullable = true)  
|    |    |    |-- 0: struct (nullable = true)  
|    |    |    |    |--active: integer (nullable = true)  
|    |    |    |    |-- inactive: integer (nullable = true)

我已经获取了集合并将其存储在Spark数据框中,现在正尝试在变量列中提取最里面的值.

I've fetched the collection and stored it in a Spark dataframe and am now trying to extract the innermost values in the variables column.

df_temp = df1.select(df1.variables.actives.data)

这工作得很好,我能够获得 data 结构的内部结构.

This works perfectly fine and I am able to get the inner structure of the data struct.

+----------------------+  
|variables.actives.data|  
+----------------------+  
|  [[1,32,0.516165...|  
|  [[1,30,1.173139...|  
|  [[4,18,0.160088...|

但是,当我尝试进一步尝试时:

However, as soon as I try to go in further:

df_temp = df1.select(df1.variables.actives.data.0.active)

我收到无效的语法错误.

df_temp = df1.select(df1.variables.actives.data.0.active)
^
SyntaxError:语法无效

df_temp = df1.select(df1.variables.actives.data.0.active)
^
SyntaxError: invalid syntax

问题在于我的内部字段键的名称是一个数字,而我找不到一个示例,其中内部字段键的名称是一个数字.

The problem is with my inner field's key's name being a number and I couldn't find an example where the inner field key's name is a number.

实现我从数据框中检索最内层值(有效无效)的最佳方法是什么?

What would be the best way to achieve my goal of retrieving the innermost values (active and inactive) from the dataframe?

推荐答案

您可以尝试:

df_temp = df1.select(df1.variables.actives.data["0"].active)

这篇关于从包含嵌套值的Spark列中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆