从字符串的RDD到双打列表的RDD的Pyspark映射 [英] Pyspark map from RDD of strings to RDD of list of doubles

查看:50
本文介绍了从字符串的RDD到双打列表的RDD的Pyspark映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我相信在spark/python编程上下文中,这是一个相当基本的操作.我有一个看起来像这样的文本文件:

I believe in the context of programming in spark / python that this is a reasonably basic operation. I have a text file that looks as such:

mydata.txt
12  34  2.3  15
23  11  1.5  9
33  18  4.5  99

然后我使用以下代码读取文本文件:

and then I use the following code to read in the textfile:

data = sc.textFile("mydata.txt") 

,它以字符串的RDD形式读取文件.但是,我想分离值并将它们全部转换为浮点数.所以我将上面的行更改为此:

and this reads in the file as an RDD of strings. However I want to separate the values and convert them all into floats. So I change the line above to this:

data = sc.textFile("matrix1.txt").map(lambda line: line.split(' '))

成功将数据按空格分割.但是我在努力想出map函数,然后将其转换为浮点数.类似于:

which successfully splits the data by spaces. However I am struggling to come up with the map function that then converts to floats. something along the lines of:

.map(lambda line: float(line))

但这没用.任何帮助表示赞赏!谢谢!

but this didn't work. Any help appreciated! Thanks!

编辑-请假设我不知道数据的列数.因此,沿着.map(lambda line:float(line [0]),float(line [1]),float(line [2]),float(line [3])的线并不是特别有用./p>

EDIT - please assume I do not know the number of columns of the data. so something along the lines of .map(lambda line: float(line[0]), float(line[1]), float(line[2]), float(line[3])) is not particularly helpful.

推荐答案

没关系,明白了.

.map(lambda line: [float(x) for x in line])

这篇关于从字符串的RDD到双打列表的RDD的Pyspark映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆