Google Cloud Dataflow中FileBasedSource用法的示例 [英] Example of FileBasedSource usage in Google Cloud Dataflow

查看:80
本文介绍了Google Cloud Dataflow中FileBasedSource用法的示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以发布一个简单的FileBasedSource子类示例吗?我是Google Dataflow的新手,对Java经验不足.我的目标是在读取文件时将行号作为键,或者根据行号跳过行.

Can someone post a simple example of subclassing FileBasedSource? I'm new to Google Dataflow and very inexperienced with Java. My goal is to read files while including line numbers as a key, or to skip lines based on the line number.

推荐答案

The implementation of XMLSource is a good starting point for understanding how FileBasedSource works. You'll likely want something like this for your reader (where readNextLine() reads to the end of a line and updates the offset):

protected void startReading(ReadableByteChannel channel) throws IOException {
  if (getCurrentSource().getMode() == FileBasedSource.Mode.SINGLE_FILE_OR_SUBRANGE) {
    // If we are not at the beginning of a line, we should ignore the current line.
    if (getCurrentSource().getStartOffset() > 0) {
      SeekableByteChannel seekChannel = (SeekableByteChannel) channel;
      // Start from one character back and read till we find a new line.
      seekChannel.position(seekChannel.position() - 1);
      nextOffset = seekChannel.position() + readNextLine(new ByteArrayOutputStream());
    }
  }
}

我用完整的 LineIO 示例创建了要点,它可能比XMLSource更简单

I've created a gist with the complete LineIO example, which may be simpler than XMLSource.

这篇关于Google Cloud Dataflow中FileBasedSource用法的示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆