如何解析包含数据中linebreaks的excel CSV数据? [英] How can you parse excel CSV data that contains linebreaks in the data?

查看:136
本文介绍了如何解析包含数据中linebreaks的excel CSV数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用PHP解析一组CSV数据,但有一个主要问题。其中一个字段是长描述字段,其本身包含机柜内的换行符。

I'm attempting to parse a set of CSV data using PHP, but having a major issue. One of the fields is a long description field, which itself contains linebreaks within the enclosures.

我的主要问题是编写一段代码,可以逐行拆分数据,但也可以识别不应该使用数据中的换行符。此字段中的换行符未正确转义,使得它们难以与合法换行符区分开。

My primary issue is writing a piece of code that can split the data line by line, but also recognize when linebreaks within the data should not be used. The linebreaks within this field are not properly escaped, making them hard to distinguish from legitimate linebreaks.

我试图想出一个正常表达式,可以正确处理它,但没有运气到目前为止。任何想法?

I've tried to come up with a regular expression that can properly handle it, but had no luck so far. Any ideas?

CSV格式:

"####","text data here", "text data \n with linebreaks \n here"\n
"####","more text data", "more data \n with \n linebreaks \n here"\n


推荐答案

至aleske,PHP的 fgetcsv 功能的文档中的评论者:

According to aleske, a commenter in the documentation for PHP's fgetcsv function:


PHP的CSV处理是非标准的,与RFC4180相矛盾,因此fgetcsv()无法正确处理文件[包含换行符] ...

The PHP's CSV handling stuff is non-standard and contradicts with RFC4180, thus fgetcsv() cannot properly deal with files [that contain line breaks] ...

他提出了以下函数来解决这个限制:

And he offered up the following function to get around this limitation:

function csvstring_to_array(&$string, $CSV_SEPARATOR = ';', $CSV_ENCLOSURE = '"', $CSV_LINEBREAK = "\n") { 
  $o = array(); 

  $cnt = strlen($string); 
  $esc = false; 
  $escesc = false; 
  $num = 0; 
  $i = 0; 
  while ($i < $cnt) { 
$s = $string[$i]; 

if ($s == $CSV_LINEBREAK) { 
  if ($esc) { 
    $o[$num] .= $s; 
  } else { 
    $i++; 
    break; 
  } 
} elseif ($s == $CSV_SEPARATOR) { 
  if ($esc) { 
    $o[$num] .= $s; 
  } else { 
    $num++; 
    $esc = false; 
    $escesc = false; 
  } 
} elseif ($s == $CSV_ENCLOSURE) { 
  if ($escesc) { 
    $o[$num] .= $CSV_ENCLOSURE; 
    $escesc = false; 
  } 

  if ($esc) { 
    $esc = false; 
    $escesc = true; 
  } else { 
    $esc = true; 
    $escesc = false; 
  } 
} else { 
  if ($escesc) { 
    $o[$num] .= $CSV_ENCLOSURE; 
    $escesc = false; 
  } 

  $o[$num] .= $s; 
} 

$i++; 
  } 

//  $string = substr($string, $i); 

  return $o; 
} 

看起来它会做的。

这篇关于如何解析包含数据中linebreaks的excel CSV数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆