你可以代表谷歌的协议缓冲区格式CSV数据? [英] Can you represent CSV data in Google's Protocol Buffer format?

查看:91
本文介绍了你可以代表谷歌的协议缓冲区格式CSV数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近发现了协议缓冲区,并想知道他们是否可以应用到我的具体问题。

I've recently found out about protocol buffers and was wondering if they could be applied to my specific problem.

基本上,我有我需要一些CSV数据转换为更紧凑的格式存储一些文件有几个演出。

Basically I have some CSV data that I need to convert to a more compact format for storage as some of the files are several gig.

在CSV每个字段都有一个报头,并且只有两种类型,字符串和小数(因为有时有很多的显著数字和我需要处理所有的数字一样的方法)。但是,每个文件都将有各自不同领域的列名。

Each field in the CSV has a header, and there are only two types, strings and decimals (because sometimes there are alot of significant digits and I need to handle all numbers the same way). But each file will have different column names for each field.

除了捕捉我需要能够在保存之前额外的信息添加到该文件中的原始CSV数据。而我希望让受处理不同的文件版本这个未来的证明。

As well as capturing the original CSV data I need to be able to add extra information to the file before saving. And I was hoping to make this future proof by handling different file versions.

那么,是不是可以用协议缓冲区来捕获数据的随机命名的列的随机数,就像一个CSV文件?

So, is it possible to use protocol buffers to capture a random number of randomly named columns of data, like a CSV file?

推荐答案

那么,它肯定表示的。喜欢的东西:

Well, it's certainly representable. Something like:

message CsvFile {
    repeated CsvHeader header = 1;
    repeated CsvRow row = 2;
}

message CsvHeader {
    require string name = 1;
    require ColumnType type = 2;
}

enum ColumnType {
    DECIMAL = 1;
    STRING = 2;
}

message CsvRow {
    repeated CsvValue value = 1;
}

// Note that the column is implicit based on position within row    
message CsvValue {
    optional string string_value = 1;
    optional Decimal decimal_value = 2;
}

message Decimal {
    // However you want to represent it (there are various options here)
}

我不知道有多少好处它会提供,你要知道...你当然可以添加更多的信息(添加到CsvFile消息)和未来打样是在正常PB的方式。 - 只添加可选字段等等

I'm not sure how much benefit it will provide, mind you... You can certainly add more information (add to the CsvFile message) and future proofing is in the "normal PB way" - only add optional fields, etc.

这篇关于你可以代表谷歌的协议缓冲区格式CSV数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆