跳过标题行-Cloud DataFlow是否可能? [英] Skipping header rows - is it possible with Cloud DataFlow?

查看：66 发布时间：2020/11/18 1:26:21 google-cloud-dataflow

本文介绍了跳过标题行-Cloud DataFlow是否可能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我创建了一个管道，该管道从GCS中的文件中读取，转换并最终写入BQ表.该文件包含标题行(字段).

I've created a Pipeline, which reads from a file in GCS, transforms it, and finally writes to a BQ table. The file contains a header row (fields).

有什么方法可以像加载时一样在BQ中以编程方式设置要跳过的标题行数"吗?

Is there any way to programatically set the "number of header rows to skip" like you can do in BQ when loading in?

当前无法实现.听起来这里有两个潜在的请求:

This is not currently possible. It sounds like there are two potential requests here:

同时，您可以在ParDo代码中添加一个简单的过滤器以跳过标头.像这样:

Also, in the meantime, you could add a simple filter to your ParDo code to skip headers. Something like this:

PCollection<X> rows = ...;
PCollection<X> nonHeaders =
   rows.apply(Filter.by(new MatchIfNonHeader()));

这篇关于跳过标题行-Cloud DataFlow是否可能?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文