在Swift中解码/解析CSV和类似CSV的文件 [英] Decoding/parsing CSV and CSV-like files in Swift

查看:177
本文介绍了在Swift中解码/解析CSV和类似CSV的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将不得不编写一个非常定制的类似CSV的解析器/解码器.我在Github上寻找开源的,但是没有找到适合我需求的东西.我可以解决这个问题,但是我的问题是,将其作为Swift中的TopLevelDecoder实现是否完全违反了键/值解码.

I'll have to write a very customised CSV-like parser/decoder. I have looked for open source ones on Github, but not found any that fits my needs. I can solve this, but my question is if it would be a total violation of the key/value decoding, to implement this as a TopLevelDecoder in Swift.

我有键,但不完全是键/值对.在CSV文件中,每列数据都有一个键,

I have keys, but not exactly key/value pairs. In CSV files, there is rather a key for each column of data,

我需要解析的文件有很多问题:

There are a number of problem with the files I need to parse:

  1. 逗号不仅用于分隔字段,而且在某些字段内也有逗号.示例:

//If I convert to an array
Struct Family {
    let name: String?
    let parents: [String?]
    let siblings: [String?]
}

在此示例中,两个父母的姓名都在同一个字段中,需要将其转换为数组,并且还需要将其兄弟姐妹字段转换为一个数组.

In this example, both parents' names are within the same field, and needs to be converted into an array, and also the siblings field.

"Name", "Parents","Siblings"
"Danny", "Margaret, John","Mike, Jim, Jane"

对于父母来说,我可以将其拆分为

In the case of the parents, I could have split that into two fields in a struct like

Struct Family {
    let name: String?
    let mother: String?
    let father: String?
}

,但兄弟姐妹"字段不起作用,因为可能存在从零到许多兄弟姐妹的所有兄弟姐妹.因此,我将不得不使用数组.

but with the Siblings field that doesn't work, since there can be all from zero to many siblings. Therefore I will have to use an array.

在某些情况下,我会分成两个字段.

There are cases when I will split into two fields though.

  1. 我需要解析的所有文件都不严格是CSV.所有文件都具有表格数据(以逗号或制表符分隔),但是有些文件具有我需要考虑的几行注释(有时包含元数据).这些文件的扩展名为.txt,而不是.csv.

## File generated 2020-05-02
"Name", "Parents","Siblings"
"Danny", "Margaret, John","Mike, Jim, Jane"

因此,我需要查看第一行以确定是否有此类注释,并且在解析之后,我可以继续将文件的其余部分视为CSV.

Therefore I need to peek at the first line(s) to determine if there are such comments, and after that has been parsed I can continue to treat the rest of the file as CSV.

从应用程序的角度来看,我打算使其看起来像任何解码器,但是在我的解码器内部,我可以处理像它们是键/值对的事情,因为只有一组键,也就是说文件的第一行(如果开头没有注释).我仍然想使用CodingKeys.

I plan to make it look like any Decoder, from the applications point of view, but internally in my decoder i can handle things like they were a key/value pair, because there is just one set of keys, and that is the first line in the file, if there are no comments in the beginning. I still want to use CodingKeys though.

您有什么想法?我应该实现为解码器(实际上是Swift中的TopLevelDecoder),还是会滥用键/值解码的想法?另一种方法是将其实现为解析器,但是我必须处理几种类型的文件(JSON,GraphQL,CSV和类似CSV的文件),而且我认为如果我可以对所有文件使用解码器,则我的应用程序代码会简单得多.文件的类型.

What are your thoughts? Should I implement in as a decoder (actually TopLevelDecoder in Swift), or would that be an abuse of the idea of key/value decoding? The alternative is to implement this as a parser, but I have to handle several types of files (JSON, GraphQL, CSV and CSV-like files), and I think my application code would be a lot simpler if I could use Decoders for all the types of files.

对于JSON来说没有问题,因为Swift中已经有了HSON解码器.对于GraphQL也不是问题,因为我可以使用无键容器编写解码器.问题文件是那些CSV和类似CSV的文件.

For JSON there's no problem, since there is already a HSON decoder in Swift. For GraphQL it's not a problem either, because I can write a decoder with an unkeyed container. The problem files are those CSV and CSV-like files.

其中有些内容的所有内容都用双引号引起来,但对于CSV标头中的键"和值而言.有些仅对键使用双引号,而对值则没有.有些具有逗号分隔的字段,有些具有制表符分隔的字段.有些字段内有逗号,需要特殊处理.有些文件的开头带有注释,在将文件的其余部分解析为CSV之前,需要先跳过这些注释.

Some of them have everything in double-quotes, but for the "keys" in the CSV header and for the values. Some only have double-quotes for the keys, but not for the values. Some have comma-separated fields, and some tab-separated. Some have commas within fields, that needs special handling. Some have comments in the beginning of the file, that needs to be skipped, before parsing the rest of the file as CSV.

某些文件的第一列中有两个字段.这些文件的格式对我没有任何影响,所以我只需要处理它.

Some files have two fields in the first column. I have no influence whatsoever of the format of these files, so I just have to deal with it.

如果您想知道它们是什么文件,我可以告诉您它们是原始DNA的文件,具有DNA匹配的文件,具有与我匹配DNA的人的共同DNA片段的文件.来自多家DNA测试公司的文件略有不同.我希望他们都使用标准格式的JSON,其中所有键对于所有公司也是标准的.但是它们都有不同的CSV标头和其他差异.

If you wonder what files they are, I can tell you that they are files of raw DNA, files with DNA matches, files with common DNA segments with people I have matching DNA with. It's quite a few slightly different files, from several DNA testing companies. I wish they all had used JSON in a standard format, where all keys also were standard for all the companies. But they all have different CSV headers, and other differences.

我还必须解码Gedcom文件,该文件也具有键/值编码对,但是这种格式也不符合文件中的纯键/值编码.

I also have to decode Gedcom files, which sort of also has key/value coded pairs, but that format too doesn't conform to a pure key/value coding in the files.

也:我已经搜索了存在类似问题的其他人,但又不完全相同,因此我不想劫持他们的线程. 请参阅此线程有关从CSV> JSON> Swift对象转到的建议

ALso: I have searched for others with similar problems, but not exactly the same, so I didn't want to hijack their threads. See this thread Advice for going from CSV > JSON > Swift objects

更多的是如何在Swift中将CSV转换为JSON,然后转换为内部数据结构的问题.我知道我可以编写一个解析器来解决这个问题,但是我认为用解码器处理所有这些文件会更优雅,但是我希望您对此有所想法.

That was more of a question of how to convert from CSV to JSON and then to internal data structs in Swift. I know I can write a parser to solve this, but I think it would be more elegant to handle all these files with decoders, but I want your thoughts about it.

我还想到了制定新协议

protocol ColumnCodingKey: CodingKey {
)

我还没有决定协议中的内容(如果有的话). 可以像在示例中那样将其设置为空,然后让我的解码器与之保持一致,这样可能就不会对键/值解码造成很大的破坏.

I haven't decided yet what to have in the protocol, if anything. It might work by just having it empty like in the example, and then let my decoder conform to it, then it maybe wouldn't be a very big violation of the key/value decoding.

提前谢谢!

推荐答案

可以使用正则表达式解析CSV文件.为了帮助您入门,这可能会节省一些时间.很难知道您真正需要什么,因为它看起来有很多不同的场景,并且可能会增长到更多的场景?

CSV files could be parsed using regular expression. To get you started this might save some time. It's hard to know what you really need because it looks like there are many different scenarios, it might grow to even more situations?

用于解析CSV文件中的一行的正则表达式可能看起来像这样
(?:(?:"(?:[^"]|"")*"|(?<=,)[^,]*(?=,))|^[^,]+|^(?=,)|[^,]+$|(?<=,)$)

Regex expression to parse one line in a CSV file might look something like this
(?:(?:"(?:[^"]|"")*"|(?<=,)[^,]*(?=,))|^[^,]+|^(?=,)|[^,]+$|(?<=,)$)

这里是有关如何与javascript示例一起使用的详细说明 构建CSV解析器

Here is a detailed description on how it works with a javascript sample Build a CSV parser

这篇关于在Swift中解码/解析CSV和类似CSV的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆