Scala FlatMap提供错误结果 [英] Scala FlatMap provides wrong results

查看:91
本文介绍了Scala FlatMap提供错误结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个文档列表,我想获得共享至少一个令牌的货币对. 为此,我编写了下面的代码,该代码通过反向索引来实现.

object TestFlatMap {
 case class Document(id : Int, tokens : List[String])

 def main(args: Array[String]): Unit = {

   val documents = List(
     Document(1, List("A", "B", "C", "D")),
     Document(2, List("A", "B", "E", "F", "G")),
     Document(3, List("E", "G", "H")),
     Document(4, List("A", "L", "M", "N"))
   )

   val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
   val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs


   /**
     * For each token returns the id of the documents that contains it
     * */
   val tokensIds = documents.flatMap{ document =>
     document.tokens.map{ token =>
       (token, document.id)
     }
   }

   //Check if the tuples are right
   assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")

   //Group the documents by the token
   val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)

   /**
     * For each group of documents generate the pairs
     * */
   val couples = docIdsByToken.map{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.flatten.toSet


   /**
     * Same operation, but with flatMap
     * For each group of documents generate the pairs
     * */
   val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.toSet

   //The results obtained with flatten pass the test
   assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
   //The results obtained with flatMap do not pass the test: they are wrong
   assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}

问题在于应生成最终结果的flatMap无法正常工作,它仅返回两对:(2,3)和(1,2). 我不明白为什么它不起作用,而且IntelliJ建议我使用flatMap而不是先使用map然后再将其展平.

有人可以向我解释问题出在哪里?因为我无法弄清楚,所以我过去也遇到过这个问题.

谢谢

路卡

解决方案

这是一个很好的示例,它说明了如果您在map/flatMap期间在不同类型的集合之间进行切换,则不一定能满足所有漂亮的monad法则/flatten.


必须将Map转换为List,以便在构造另一个Map作为中间结果时不会重复覆盖键,因为Map将覆盖键,而不是收集所有对:

val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
  docs.combinations(2).map{ c =>
    val d1 = c.head._2
    val d2 = c.last._2

    if(d1 < d2){
      (d1, d2)
    }
    else{
      (d2, d1)
    }
  }
}.toSet

以下是一个简短得多的版本,它演示了相同的问题:

val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)

代替Set((2, 1), (2, 3)),它将产生Set((2, 3)),因为 在flatMap之后和toSet之前,中间结果是新的Map ,并且此映射只能保存一个值 . >

与第一个版本的区别在于,在map之后,您获得的内容类似于Iterable[List[(Int, Int)]],而不是Map,因此不能丢失/覆盖任何键.

given a list of documents, I want to obtain the pairs that shares at least one token. To do this I wrote the code below, that do that through an inverted index.

object TestFlatMap {
 case class Document(id : Int, tokens : List[String])

 def main(args: Array[String]): Unit = {

   val documents = List(
     Document(1, List("A", "B", "C", "D")),
     Document(2, List("A", "B", "E", "F", "G")),
     Document(3, List("E", "G", "H")),
     Document(4, List("A", "L", "M", "N"))
   )

   val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
   val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs


   /**
     * For each token returns the id of the documents that contains it
     * */
   val tokensIds = documents.flatMap{ document =>
     document.tokens.map{ token =>
       (token, document.id)
     }
   }

   //Check if the tuples are right
   assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")

   //Group the documents by the token
   val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)

   /**
     * For each group of documents generate the pairs
     * */
   val couples = docIdsByToken.map{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.flatten.toSet


   /**
     * Same operation, but with flatMap
     * For each group of documents generate the pairs
     * */
   val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
     docs.combinations(2).map{ c =>
       val d1 = c.head._2
       val d2 = c.last._2

       if(d1 < d2){
         (d1, d2)
       }
       else{
         (d2, d1)
       }
     }
   }.toSet

   //The results obtained with flatten pass the test
   assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
   //The results obtained with flatMap do not pass the test: they are wrong
   assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}

The problem is that the flatMap that should generates the final results does not works properly, it only returns two couples: (2,3) and (1,2). I do not understand why it does not works, moreover IntelliJ suggests me to use flatMap instead of use map an then flatten.

Someone is able to explain me where the problem is? Because I cannot figure out, I also had this problem in past.

Thanks

Luca

解决方案

That's an excellent example that demonstrates that all the nice monad laws do not necessarily hold if you switch between different types of collections during the map/flatMap/flatten.


You must convert the Map to List, so that keys are not overridden repeatedly while you are constructing another Map as an intermediate result, because a Map will override keys, instead of collecting all pairs:

val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
  docs.combinations(2).map{ c =>
    val d1 = c.head._2
    val d2 = c.last._2

    if(d1 < d2){
      (d1, d2)
    }
    else{
      (d2, d1)
    }
  }
}.toSet

Here is a much shorter version that demonstrates the same problem:

val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)

Instead of Set((2, 1), (2, 3)), it will produce Set((2, 3)), because after the flatMap and before the toSet the intermediate result is a new Map, and this map can hold only one value for key = 2.

The difference to the first version is that after map, you obtain something like an Iterable[List[(Int, Int)]], which is not a Map, and therefore cannot lose/override any keys.

这篇关于Scala FlatMap提供错误结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆