ETL&解析Cloud Dataflow中的CSV文件 [英] ETL & Parsing CSV files in Cloud Dataflow

查看：90 发布时间：2020/7/11 22:45:39 csv google-cloud-dataflow

本文介绍了ETL&解析Cloud Dataflow中的CSV文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是云数据流和Java的新手，所以我希望这是一个正确的问题.

I'm new to cloud dataflow and Java so I'm hoping this is the right question to ask.

我有一个csv文件，其中n列和行的数量可以是字符串，整数或时间戳.我需要为每个列创建一个新的PCollection吗?

I have a csv file with n number of columns and rows that could be a string, integer or timestamp. Do I need to create a new PCollection for each column?

我在示例中找到的大多数文档都是这样的:

Most of the documentation that I've found in examples is along the lines of something like:

PCollection<String> data = p.apply(TextIO.Read.from("gs://abc/def.csv"));

但是对我来说，将整个csv文件作为字符串导入没有任何意义.我在这里缺少什么，应该如何设置我的PCollections?

But to me it doesn't make sense to import an entire csv file as a string. What am I missing here and how should I set my PCollections up?

推荐答案

此示例将创建一个集合，该集合在文件中每行包含1个String，例如如果文件是:

This example will create a collection containing 1 String per line in the file, e.g. if the file is:

Alex,28,111-222-3344
Sam,30,555-666-7788
Drew,19,123-45-6789

然后该集合将在逻辑上包含"Alex,28,111-222-3344"，"Sam,30,555-666-7788"和"Drew,19,123-45-6789".您可以通过ParDo或MapElements转换来管道化集合，从而在Java中应用进一步的解析代码，例如:

then the collection will logically contain "Alex,28,111-222-3344", "Sam,30,555-666-7788", and "Drew,19,123-45-6789". You can apply further parsing code in Java by piping the collection through a ParDo or MapElements transform, e.g.:

class User {
    public String name;
    public int age;
    public String phone;
}

PCollection<String> lines = p.apply(TextIO.Read.from("gs://abc/def.csv"));
PCollection<User> users = lines.apply(MapElements.via((String line) -> {
    User user = new User();
    String[] parts = line.split(",");
    user.name = parts[0];
    user.age = Integer.parseInt(parts[1]);
    user.phone = parts[2];
    return user;
}).withOutputType(new TypeDescriptor<User>() {});)

这篇关于ETL&解析Cloud Dataflow中的CSV文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ETL&解析Cloud Dataflow中的CSV文件 [英] ETL & Parsing CSV files in Cloud Dataflow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ETL&amp;解析Cloud Dataflow中的CSV文件 [英] ETL &amp; Parsing CSV files in Cloud Dataflow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

ETL&解析Cloud Dataflow中的CSV文件 [英] ETL & Parsing CSV files in Cloud Dataflow

登录关闭