如何动态生成大数据流 [英] How to generate a big data stream on the fly

查看:90
本文介绍了如何动态生成大数据流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须即时生成一个大文件.读取数据库并将其发送给客户端. 我读了一些文档,我做到了

I have to generate a big file on the fly. Reading to the database and send it to the client. I read some documentation and i did this

val streamContent: Enumerator[Array[Byte]] = Enumerator.outputStream {
        os => 
              // new PrintWriter() read from database and for each record 
              // do some logic and write
              // to outputstream
      }
      Ok.stream(streamContent.andThen(Enumerator.eof)).withHeaders(
              CONTENT_DISPOSITION -> s"attachment; filename=someName.csv"
        )

我一般只在一个星期才开始使用scala,所以不要以我的名誉作为指导.

Im rather new to scala in general only a week so don't guide for my reputation.

我的问题是:

1)这是最好的方法吗?我发现如果我有一个大文件,这将加载到内存中,并且在这种情况下也不知道块大小是多少,如果为每个write()发送则不太方便.

1) Is this the best way? I found this if i have a big file, this will load in memory, and also don't know what is the chunk size in this case, if it will send for each write() is not to convenient.

2)我发现此方法Enumerator.fromStream(data : InputStream, chunkedSize : int)更好一点,因为它具有块大小,但是我没有inputStream导致即时创建文件.

2) I found this method Enumerator.fromStream(data : InputStream, chunkedSize : int) a little better cause it has a chunk-size, but i don't have an inputStream cause im creating the file on the fly.

推荐答案

调用write的

Not [sic!] 不会阻塞,因此,如果馈入的iteratee消耗输入内容的速度很慢,则OutputStream不会回退.这意味着它不应该用于大型流,因为存在内存不足的风险.

Not [sic!] that calls to write will not block, so if the iteratee that is being fed to is slow to consume the input, the OutputStream will not push back. This means it should not be used with large streams since there is a risk of running out of memory.

是否会发生这种情况取决于您的情况.如果可以并且将在几秒钟内生成千兆字节,则可能应该尝试其他方法.我不确定,但是我将从Enumerator.generateM()开始.但是,在许多情况下,您的方法非常好.看看GaëtanRenaudeau在此示例中提供的服务于在以与您使用时相同的方式飞行:

If this can happen depends on your situation. If you can and will generate Gigabytes in seconds, you should probably try something different. I'm not exactly sure what, but I'd start at Enumerator.generateM(). For many cases though, your method is perfectly fine. Have a look at this example by Gaëtan Renaudeau for serving a Zip file that's generated on the fly in the same way you're using it:

val enumerator = Enumerator.outputStream { os =>
  val zip = new ZipOutputStream(os);
  Range(0, 100).map { i =>
    zip.putNextEntry(new ZipEntry("test-zip/README-"+i+".txt"))
    zip.write("Here are 100000 random numbers:\n".map(_.toByte).toArray)
    // Let's do 100 writes of 1'000 numbers
    Range(0, 100).map { j =>
      zip.write((Range(0, 1000).map(_=>r.nextLong).map(_.toString).mkString("\n")).map(_.toByte).toArray);
    }
    zip.closeEntry()
  }
  zip.close()
}
Ok.stream(enumerator >>> Enumerator.eof).withHeaders(
  "Content-Type"->"application/zip", 
  "Content-Disposition"->"attachment; filename=test.zip"
)

请记住,如果要升级,在较新版本的Play中,Ok.stream已由Ok.chunked替换.

Please keep in mind that Ok.stream has been replaced by Ok.chunked in newer versions of Play, in case you want to upgrade.

对于块大小,您始终可以使用

As for the chunk size, you can always use Enumeratee.grouped to gather a bunch of values and send them as one chunk.

val grouper = Enumeratee.grouped(  
  Traversable.take[Array[Double]](100) &>> Iteratee.consume()  
)

然后您会做类似的事情

Ok.stream(enumerator &> grouper >>> Enumerator.eof)

这篇关于如何动态生成大数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆