Scala:如何遍历流/迭代器将结果收集到几个不同的集合中 [英] Scala: how to traverse stream/iterator collecting results into several different collections

查看:536
本文介绍了Scala:如何遍历流/迭代器将结果收集到几个不同的集合中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在浏览的日志文件太大而无法容纳到内存中并收集2种类型的表达式,下面的迭代代码片段有哪些更好的功能替代?

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?

def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
  val lines : Iterator[String] = io.Source.fromFile(file).getLines()

  val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
  val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty

  for (line <- lines){
    line match {
      case errorPat(date,ip)=> errors.append((ip,date))
      case loginPat(date,user,ip,id) =>logins.put(ip, id)
      case _ => ""
    }
  }

  errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}


推荐答案

这是一个可能的解决方案:

Here is a possible solution:

def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
  val lines = Source.fromFile(file).getLines
  val (err, log) = lines.collect {
        case errorPat(inf, ip) => (Some((ip, inf)), None)
        case loginPat(_, _, ip, id) => (None, Some((ip, id)))
      }.toList.unzip
  val ip2id = log.flatten.toMap
  err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}

这篇关于Scala:如何遍历流/迭代器将结果收集到几个不同的集合中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆