从字符串的RDD到双打列表的RDD的Pyspark映射 [英] Pyspark map from RDD of strings to RDD of list of doubles

查看：50 发布时间：2021/4/8 20:11:22 apache-spark pyspark

本文介绍了从字符串的RDD到双打列表的RDD的Pyspark映射的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我相信在spark/python编程上下文中，这是一个相当基本的操作.我有一个看起来像这样的文本文件:

I believe in the context of programming in spark / python that this is a reasonably basic operation. I have a text file that looks as such:

mydata.txt
12  34  2.3  15
23  11  1.5  9
33  18  4.5  99

然后我使用以下代码读取文本文件:

and then I use the following code to read in the textfile:

data = sc.textFile("mydata.txt")

，它以字符串的RDD形式读取文件.但是，我想分离值并将它们全部转换为浮点数.所以我将上面的行更改为此:

and this reads in the file as an RDD of strings. However I want to separate the values and convert them all into floats. So I change the line above to this:

data = sc.textFile("matrix1.txt").map(lambda line: line.split(' '))

成功将数据按空格分割.但是我在努力想出map函数，然后将其转换为浮点数.类似于:

which successfully splits the data by spaces. However I am struggling to come up with the map function that then converts to floats. something along the lines of:

.map(lambda line: float(line))

但这没用.任何帮助表示赞赏！谢谢！

but this didn't work. Any help appreciated! Thanks!

编辑-请假设我不知道数据的列数.因此，沿着.map(lambda line:float(line [0])，float(line [1])，float(line [2])，float(line [3])的线并不是特别有用./p>

EDIT - please assume I do not know the number of columns of the data. so something along the lines of .map(lambda line: float(line[0]), float(line[1]), float(line[2]), float(line[3])) is not particularly helpful.

从字符串的RDD到双打列表的RDD的Pyspark映射 [英] Pyspark map from RDD of strings to RDD of list of doubles

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从字符串的RDD到双打列表的RDD的Pyspark映射 [英] Pyspark map from RDD of strings to RDD of list of doubles

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭