如何使用Spark Scala将CSV行拆分为元组 [英] How to split CSV lines into tuples with Spark Scala

查看：74 发布时间：2021/4/8 19:31:36 scala apache-spark

本文介绍了如何使用Spark Scala将CSV行拆分为元组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我想由Scala检索的数据.数据如下所示:userId，movieId1,11721,14051,21931,29682,522,1442,248

Here is a data I want to retrieve by Scala. The data looks like this: userId,movieId 1,1172 1,1405 1,2193 1,2968 2,52 2,144 2,248

首先，我想跳过第一行，然后通过split(，")分割用户和电影并映射到(userID，movieID)

First I want to skip the first line, and then split user and movie by split(",") and map to (userID,movieID)

这是我第一次尝试Scala，一切都使我发疯.我写了这段代码以跳过第一行并拆分

This is my first time trying scala, everything made me insane. I wrote this code to skip first line and split

rdd.mapPartitionsWithIndex{ (idx, iter) => 
if (idx == 0) 
    iter.drop(1) 
else     
    iter }.flatMap(line=>line.split(","))

但是结果是这样的:

我想这是因为mapPartitionsWithIndex有什么方法可以在不更改结构的情况下正确跳过标题?

I guess it's because mapPartitionsWithIndex Is there any way to correctly skip the header without change the structure?

推荐答案

嗯，您的问题不是关于标题的问题，而是关于如何将行分割为(userid，movieid)的问题?代替 .flatMap(line => line.split(，"))，您应该尝试以下操作:

Ah, your question is not about the header, but about how to split the lines into (userid, movieid)? Instead of .flatMap(line=>line.split(",")) you should try this:

.map(line => line.split(",") match { case Array(userid, movieid) => (userid, movieid) })

这篇关于如何使用Spark Scala将CSV行拆分为元组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Spark Scala将CSV行拆分为元组 [英] How to split CSV lines into tuples with Spark Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Spark Scala将CSV行拆分为元组 [英] How to split CSV lines into tuples with Spark Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭