从字符串的RDD到双打列表的RDD的Pyspark映射 [英] Pyspark map from RDD of strings to RDD of list of doubles
问题描述
我相信在spark/python编程上下文中,这是一个相当基本的操作.我有一个看起来像这样的文本文件:
I believe in the context of programming in spark / python that this is a reasonably basic operation. I have a text file that looks as such:
mydata.txt
12 34 2.3 15
23 11 1.5 9
33 18 4.5 99
然后我使用以下代码读取文本文件:
and then I use the following code to read in the textfile:
data = sc.textFile("mydata.txt")
,它以字符串的RDD形式读取文件.但是,我想分离值并将它们全部转换为浮点数.所以我将上面的行更改为此:
and this reads in the file as an RDD of strings. However I want to separate the values and convert them all into floats. So I change the line above to this:
data = sc.textFile("matrix1.txt").map(lambda line: line.split(' '))
成功将数据按空格分割.但是我在努力想出map函数,然后将其转换为浮点数.类似于:
which successfully splits the data by spaces. However I am struggling to come up with the map function that then converts to floats. something along the lines of:
.map(lambda line: float(line))
但这没用.任何帮助表示赞赏!谢谢!
but this didn't work. Any help appreciated! Thanks!
编辑-请假设我不知道数据的列数.因此,沿着.map(lambda line:float(line [0]),float(line [1]),float(line [2]),float(line [3])的线并不是特别有用./p>
EDIT - please assume I do not know the number of columns of the data. so something along the lines of .map(lambda line: float(line[0]), float(line[1]), float(line[2]), float(line[3])) is not particularly helpful.
推荐答案
没关系,明白了.
.map(lambda line: [float(x) for x in line])
这篇关于从字符串的RDD到双打列表的RDD的Pyspark映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!