Scala FlatMap提供错误结果 [英] Scala FlatMap provides wrong results
问题描述
给定一个文档列表,我想获得共享至少一个令牌的货币对. 为此,我编写了下面的代码,该代码通过反向索引来实现.
object TestFlatMap {
case class Document(id : Int, tokens : List[String])
def main(args: Array[String]): Unit = {
val documents = List(
Document(1, List("A", "B", "C", "D")),
Document(2, List("A", "B", "E", "F", "G")),
Document(3, List("E", "G", "H")),
Document(4, List("A", "L", "M", "N"))
)
val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs
/**
* For each token returns the id of the documents that contains it
* */
val tokensIds = documents.flatMap{ document =>
document.tokens.map{ token =>
(token, document.id)
}
}
//Check if the tuples are right
assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")
//Group the documents by the token
val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)
/**
* For each group of documents generate the pairs
* */
val couples = docIdsByToken.map{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.flatten.toSet
/**
* Same operation, but with flatMap
* For each group of documents generate the pairs
* */
val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
//The results obtained with flatten pass the test
assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
//The results obtained with flatMap do not pass the test: they are wrong
assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}
问题在于应生成最终结果的flatMap无法正常工作,它仅返回两对:(2,3)和(1,2). 我不明白为什么它不起作用,而且IntelliJ建议我使用flatMap而不是先使用map然后再将其展平.
有人可以向我解释问题出在哪里?因为我无法弄清楚,所以我过去也遇到过这个问题.
谢谢
路卡
这是一个很好的示例,它说明了如果您在map
/flatMap
期间在不同类型的集合之间进行切换,则不一定能满足所有漂亮的monad法则/flatten
.
必须将Map
转换为List
,以便在构造另一个Map
作为中间结果时不会重复覆盖键,因为Map
将覆盖键,而不是收集所有对:
val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
以下是一个简短得多的版本,它演示了相同的问题:
val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)
代替 与第一个版本的区别在于,在 given a list of documents, I want to obtain the pairs that shares at least one token.
To do this I wrote the code below, that do that through an inverted index. The problem is that the flatMap that should generates the final results does not works properly, it only returns two couples: (2,3) and (1,2).
I do not understand why it does not works, moreover IntelliJ suggests me to use flatMap instead of use map an then flatten. Someone is able to explain me where the problem is? Because I cannot figure out, I also had this problem in past. Thanks Luca That's an excellent example that demonstrates that all the nice monad laws do not necessarily hold if you switch between different types of collections during the You must convert the Here is a much shorter version that demonstrates the same problem: Instead of The difference to the first version is that after 这篇关于Scala FlatMap提供错误结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!Set((2, 1), (2, 3))
,它将产生Set((2, 3))
,因为
在flatMap
之后和toSet
之前,中间结果是新的Map
,并且此映射只能保存map
之后,您获得的内容类似于Iterable[List[(Int, Int)]]
,而不是Map
,因此不能丢失/覆盖任何键.object TestFlatMap {
case class Document(id : Int, tokens : List[String])
def main(args: Array[String]): Unit = {
val documents = List(
Document(1, List("A", "B", "C", "D")),
Document(2, List("A", "B", "E", "F", "G")),
Document(3, List("E", "G", "H")),
Document(4, List("A", "L", "M", "N"))
)
val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs
/**
* For each token returns the id of the documents that contains it
* */
val tokensIds = documents.flatMap{ document =>
document.tokens.map{ token =>
(token, document.id)
}
}
//Check if the tuples are right
assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")
//Group the documents by the token
val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)
/**
* For each group of documents generate the pairs
* */
val couples = docIdsByToken.map{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.flatten.toSet
/**
* Same operation, but with flatMap
* For each group of documents generate the pairs
* */
val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
//The results obtained with flatten pass the test
assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
//The results obtained with flatMap do not pass the test: they are wrong
assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}
map
/flatMap
/flatten
.
Map
to List
, so that keys are not overridden repeatedly while you are constructing another Map
as an intermediate result, because a Map
will override keys, instead of collecting all pairs:val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)
Set((2, 1), (2, 3))
, it will produce Set((2, 3))
, because
after the flatMap
and before the toSet
the intermediate result is a new Map
, and this map can hold only one value for key = 2
.map
, you obtain something like an Iterable[List[(Int, Int)]]
, which is not a Map
, and therefore cannot lose/override any keys.