PHP-检测CSV分隔符的最佳方法 [英] PHP - Best approach to detect CSV delimiter
问题描述
我看到了多个主题,这些主题涉及什么是自动检测传入CSV分隔符的最佳解决方案.它们大多数是长度在20到30行之间的函数,预先确定了多个循环的定界符列表,读取了前5行并匹配了计数e.t.c e.t.c
I have seen multiple threads about what the best solution to auto detect the delimiter for an incoming CSV. Most of them are functions of length between 20 - 30 lines, multiple loops pre-determined list of delimiters, reading the first 5 lines and matching counts e.t.c e.t.c
我刚刚执行了此过程,但做了一些修改.表现出色.
I have just implemented this procedure, with a few modifications. Works brilliantly.
找到以下代码后,
private function DetectDelimiter($fh)
{
$data_1 = null;
$data_2 = null;
$delimiter = self::$delim_list['comma'];
foreach(self::$delim_list as $key=>$value)
{
$data_1 = fgetcsv($fh, 4096, $value);
$delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
$data_2 = $data_1;
}
$this->SetDelimiter($delimiter);
return $delimiter;
}
在我看来,这似乎达到了相同的结果,其中$ delim_list是如下所示的定界符数组:
This to me looks like it's achieving the SAME results, where $delim_list is an array of delimiters as follows:
static protected $delim_list = array('tab'=>"\t",
'semicolon'=>";",
'pipe'=>"|",
'comma'=>",");
对于我为什么不应该以这种更简单的方式进行操作,以及为什么在我看来到处都是更复杂的解决方案的地方似乎都是公认的答案,有人能说清楚吗?
Can anyone shed any light as to why I shouldn't do it this simpler way, and why everywhere I look the more convoluted solution seems to be the accepted answer?
谢谢!
推荐答案
固定版本.
在您的代码中,如果一个字符串包含多个定界符,则会得到错误的结果(例如:val;字符串,带逗号; val2; val3).同样,如果文件有1行(行数<分隔符数).
In your code, if a string has more than 1 delimiter you'll get a wrong result (example: val; string, with comma;val2;val3). Also if a file has 1 row (count of rows < count of delimiters).
这是一个固定的变体:
private function detectDelimiter($fh)
{
$delimiters = ["\t", ";", "|", ","];
$data_1 = null; $data_2 = null;
$delimiter = $delimiters[0];
foreach($delimiters as $d) {
$data_1 = fgetcsv($fh, 4096, $d);
if(sizeof($data_1) > sizeof($data_2)) {
$delimiter = $d;
$data_2 = $data_1;
}
rewind($fh);
}
return $delimiter;
}
这篇关于PHP-检测CSV分隔符的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!