广播变量无法采取一切数据 [英] broadcast variable fails to take all data

查看:318
本文介绍了广播变量无法采取一切数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当施加与collectasmap(广播变量),不是所有的值由广播变量包括在内。例如。

When applying broadcast variable with collectasmap(), not all the values are included by broadcast variable. e.g.

    val emp = sc.textFile("...text1.txt").map(line => (line.split("\t")(3),line.split("\t")(1))).distinct()
    val emp_new = sc.textFile("...text2.txt").map(line => (line.split("\t")(3),line.split("\t")(1))).distinct()
    emp_new.foreach(println)

    val emp_newBC = sc.broadcast(emp_new.collectAsMap())
    println(emp_newBC.value)

在我内emp_newBC检查了价值,我看到,并非所有从emp_new数据出现。我想什么?

When i checked the values within emp_newBC I saw that not all the data from emp_new appear. What am i missing?

先谢谢了。

推荐答案

的问题是,emp_new是元组的集合,而emp_newBC是一个广播地图。如果您正在收集地图,重复键被删除,因此,你有较少的数据。如果你想找回所有元组的列表,用

The problem is that emp_new is a collection of tuples, while emp_newBC is a broadcasted map. If you are collecting map, the duplicate keys are being removed and therefore you have less data. If you want to get back a list of all tuples, use

VAL emp_newBC = sc.broadcast(emp_new.collect())

这篇关于广播变量无法采取一切数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆