我应该如何检测哪些分隔符在文本文件中使用? [英] How should I detect which delimiter is used in a text file?

查看:113
本文介绍了我应该如何检测哪些分隔符在文本文件中使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要能够解析CSV和TSV文件。我不能依靠用户知道其中的差别,所以我想避免让用户选择的类型。有一个简单的方法来检测其分隔符是使用?

I need to be able to parse both CSV and TSV files. I can't rely on the users to know the difference, so I would like to avoid asking the user to select the type. Is there a simple way to detect which delimiter is in use?

一个办法是阅读中的每一行,并计算两个标签和逗号,并找出哪些是最稳定在每行使用。当然,该数据可能包括逗号或制表符,这样可能会更容易做起来难。

One way would be to read in every line and count both tabs and commas and find out which is most consistently used in every line. Of course, the data could include commas or tabs, so that may be easier said than done.

编辑:这个项目的另一个有趣的方面是,我还需要检测文件的模式,当我在读它,因为它可能是其中之一。这意味着,我不知道我有多少领域有,直到我可以分析它。

Another fun aspect of this project is that I will also need to detect the schema of the file when I read it in because it could be one of many. This means that I won't know how many fields I have until I can parse it.

推荐答案

您可以向他们在preVIEW窗口中的结果 - 类似于到Excel做它的方式。这是pretty明确时,在这种情况下,正在使用错误的分隔符。然后,您可以让他们选择一定范围的分隔符,并有实时preVIEW更新。

You could show them the results in preview window - similar to the way Excel does it. It's pretty clear when the wrong delimiter is being used in that case. You could then allow them to select a range of delimiters and have the preview update in real time.

然后,你可以只是做一个简单的猜测,分隔符与启动(例如没有一个逗号或制表符是第一位的)。

Then you could just make a simple guess as to the delimiter to start with (e.g. does a comma or a tab come first).

这篇关于我应该如何检测哪些分隔符在文本文件中使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆