如何转换数据框的一列中的Apache星火列表? [英] How to Convert a Column of Dataframe to A List in Apache Spark?
问题描述
我想一个数据帧的字符串列转换为一个列表。我可以从数据框API发现是RDD所以我试图将其转换回先RDD,然后应用功能的toArray向RDD。在这种情况下,长度和SQL工作就好了。但是,我从RDD得到的结果有这样的[A00001]每个元素围绕着方括号。我在想,如果有一列转换为一个列表或方法,以消除方括号的适当方式。
I would like convert a string column of a dataframe to a list. What I can found from the Dataframe API is rdd so I tried converting it back to rdd first, and then apply toArray function to the rdd. In this case, the length and sql work just fine. However, the result I got from rdd has a square brackets around every element like this [A00001]. I was wondering if there's an appropriate way to convert a column to a list or a way to remove the square brackets.
任何建议将是AP preciated。谢谢!
Any suggestions would be appreciated. Thank you!
推荐答案
这应该返回包含单列表中的集合:
This should return the collection containing single list:
dataFrame.select(YOUR_COLUMN_NAME).rdd.map(r => r(0)).collect()
没有地图,你只是得到一个Row对象,其中包含从数据库中的每一列。
Without the mapping, you just get a Row object, which contains every column from the database.
请,这可能会得到你的任何类型的列表。如果你想指定的结果类型,可以使用.asInstanceOf [YOUR_TYPE]在 R => R(0).asInstanceOf [YOUR_TYPE]
映射
Keep in mind that this will probably get you a list of Any type. Ïf you want to specify the result type, you can use .asInstanceOf[YOUR_TYPE] in r => r(0).asInstanceOf[YOUR_TYPE]
mapping
这篇关于如何转换数据框的一列中的Apache星火列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!