Ruby:如何检测/智能猜测CSV文件中使用的分隔符? [英] Ruby : How can I detect/intelligently guess the delimiter used in a CSV file?

查看:133
本文介绍了Ruby:如何检测/智能猜测CSV文件中使用的分隔符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要能够弄清楚在我的Ruby项目中的csv文件(逗号,空格或分号)中使用了哪个分隔符。我知道,在csv模块中有一个Sniffer类在Python中,可以用来猜测给定文件的分隔符。在Ruby中有类似的东西吗?非常感谢任何类型的帮助或想法。

I need to be able to figure out which delimiter is being used in a csv file (comma, space or semicolon) in my Ruby project. I know, there is a Sniffer class in Python in the csv module that can be used to guess a given file's delimiter. Is there anything similar to this in Ruby ? Any kind of help or idea is greatly appreciated.

推荐答案

看起来像py实现只检查几种方言:excel或excel_tab。因此,只检查\t的简单实现是:

Looks like the py implementation just checks a few dialects: excel or excel_tab. So, a simple implementation of something that just checks for "," or "\t" is:

COMMON_DELIMITERS = ['","',"\"\t\""]

def sniff(path)
  first_line = File.open(path).first
  return nil unless first_line
  snif = {}
  COMMON_DELIMITERS.each {|delim|snif[delim]=first_line.count(delim)}
  snif = snif.sort {|a,b| b[1]<=>a[1]}
  snif.size > 0 ? snif[0][0] : nil
end

注意:分隔符,它找到,例如,因此要获得,您可以更改 snif [0] [0 ] snif [0] [0] [1]

Note: that would return the full delimiter it finds, e.g. ",", so to get , you could change the snif[0][0] to snif[0][0][1].

我使用 count(delim),因为它更快一些,但如果添加了一个由两个(或更多)字符组成的分隔符, code> - ,那么当称量类型时,它可能每次出现两次(或更多),因此在这种情况下,最好使用 scan delim).length

Also, I'm using count(delim) because it is a little faster, but if you added a delimiter that is composed of two (or more) characters of the same type like --, then it would could each occurrence twice (or more) when weighing the type, so in that case, it may be better to use scan(delim).length.

这篇关于Ruby:如何检测/智能猜测CSV文件中使用的分隔符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆