在Apache Spark中将Dataframe的列值提取为列表 [英] Extract column values of Dataframe as List in Apache Spark

查看:474
本文介绍了在Apache Spark中将Dataframe的列值提取为列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数据框的字符串列转换为列表.我可以从Dataframe API中找到RDD,因此我尝试先将其转换回RDD,然后将toArray函数应用于RDD.在这种情况下,长度和SQL都可以正常工作.但是,我从RDD获得的结果在每个元素周围都有方括号,例如[A00001].我想知道是否有适当的方法可以将列转换为列表,也可以删除方括号.

I want to convert a string column of a data frame to a list. What I can find from the Dataframe API is RDD, so I tried converting it back to RDD first, and then apply toArray function to the RDD. In this case, the length and SQL work just fine. However, the result I got from RDD has square brackets around every element like this [A00001]. I was wondering if there's an appropriate way to convert a column to a list or a way to remove the square brackets.

任何建议将不胜感激.谢谢!

Any suggestions would be appreciated. Thank you!

推荐答案

这应该返回包含单个列表的集合:

This should return the collection containing single list:

dataFrame.select("YOUR_COLUMN_NAME").rdd.map(r => r(0)).collect()

没有映射,您只会得到一个Row对象,其中包含数据库中的每一列.

Without the mapping, you just get a Row object, which contains every column from the database.

请记住,这可能会为您提供任何类型的列表. Ï如果要指定结果类型,可以在r => r(0).asInstanceOf[YOUR_TYPE]映射中使用.asInstanceOf [YOUR_TYPE]

Keep in mind that this will probably get you a list of Any type. Ïf you want to specify the result type, you can use .asInstanceOf[YOUR_TYPE] in r => r(0).asInstanceOf[YOUR_TYPE] mapping

P.S.由于自动转换,您可以跳过.rdd部分.

P.S. due to automatic conversion you can skip the .rdd part.

这篇关于在Apache Spark中将Dataframe的列值提取为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆