斯卡拉星火 - 放弃空键 [英] Scala Spark - Discard empty keys

查看:159
本文介绍了斯卡拉星火 - 放弃空键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下图:

 val pairs = lines.map( l => ( if (l.split(",")(1).toInt < 60) { "rest" } else if (l.split(",")(1).toInt > 110) { "sport" }, 10) ).reduceByKeyAndWindow((a:Int, b:Int) => (a+b), Seconds(12))

基本上,当一个人的HR是波纹管60,它列为休息,上述110被列为体育项目。元组重新presents的第二个变量该人已经做了10分钟。

Basically, when someone's HR is bellow 60, it's classified as rest, above 110 is classified as sport. The second variable of the tuple represents that the person has been doing it for 10 minutes.

现在分辩,这映射为60和110之间的值的空键我想是完全抛弃他们。这是怎么实现的?

Rigth now, this maps an empty key for values between 60 and 110. What I want is to completely discard them. How is that achievable?

因此​​,从

("rest", 30)
("sport", 120)
((),10)

我想筛选出((),10)
我试过

 pairs.filter{case (key, value) => key.length < 3} //error: value length is not a member of Any
 pairs.filter(_._1 != "")  //no error, just still keeps the empty keys, too   

无似乎工作。

推荐答案

您的问题是,你的如果前pression返回或者字符串在丢失的情况下单位的匹配的情况下。您可以修复你的过滤器轻松:

Your problem is that your if expression returns either String in case of match of Unit in case of miss. You can fix your filter easily:

val pairs = lines.map(
  l => (if (l.split(",")(1).toInt < 60) {"rest"} else if (l.split(",")(1).toInt > 110) {"sport"}, 10))
    .filter(_._1 != ())

()在Scala是类型的身份单位

() in scala is identity of type Unit.

但是,这是不正确的做法,真的。你仍然可以得到(单位,智力)的元组作为结果。你失去类型与此如果语句。

But this is not the right way, really. You still get tuples of (Unit, Int) as the result. You're losing type with this if statement.

正确的方法是前两种来过滤数据,并详尽如果

The correct way is either to filter your data before and have exhaustive if:

val pairs =
  lines.map(_.split(",")(1).toInt)
    .filter(hr => hr < 60 || hr > 110)
    .map(hr => (if (hr < 60) "rest" else "sport", 10))

或者使用收集,它在的火花是快捷键 .filter.map

val pairs =
  lines.map(_.split(",")(1).toInt)
    .collect{
      case hr if hr < 60 => "rest" -> 10
      case hr if hr > 110 => "sport" -> 10
    }

也许这变种是更具可读性。

Probably this variant is more readable.

另外,请注意我是如何移动拆分成单独的步骤。这样做是为了避免呼吁第二分支,如果拆分第二次。

Also, please note how I moved split into separate step. This is done to avoid calling split second time for second if branch.

UPD 。另一种方法是使用 flatMap ,所建议的意见:

UPD. Another approach is to use flatMap, as suggested in comments:

val pairs =
  lines.flatMap(_.split(",")(1).toInt match{
      case hr if hr < 60 => Some("rest" -> 10)
      case hr if hr > 110 => Some("sport" -> 10)
      case _ => None
    })

这可能会或可能不会是更有效,因为它避免了过滤器的一步,但增加了包装和展开元素选项。您可以测试不同的方法表现,告诉我们结果。

It may or may not be more efficient, as it avoids filter step, but adds wrapping and unwrapping elements in Option. You can test performance of different approaches and tell us the results.

这篇关于斯卡拉星火 - 放弃空键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆