读取带有未转义机柜的CSV文件 [英] Reading CSV file with unescaped enclosures

查看:104
本文介绍了读取带有未转义机柜的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取CSV文件,但是某些值未转义,因此PHP读取错误.这是不良行的示例:

I am reading a CSV file but some of the values are not escaped so PHP is reading it wrong. Here is an example of a line that is bad:

"635",","AUBREY R. PHILLIPS(1920-)-粉彩描绘了 陡峭的两面河谷,可能是北威尔士,已签名并注明日期 2000,裱框,66厘米x 48厘米.另一幅乡村风景,名为反观" 收获时间,萨默塞特郡"签名并注明日期为'87,框长69厘米乘49厘米. (2)注意:Aubrey Phillips是伍斯特郡的一名艺术家,曾就读于 ," 40," 60," WAT,"绘画,版画和 水彩",

" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated '87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",

您会看到 Somerset收获时间周围有引号,使PHP认为它具有新的价值.

You can see Harvest Time, Somerset has quotes around it, causing PHP to think its a new value.

当我在每一行上执行print_r()时,折线最终看起来像这样:

When i do print_r() on each line, the broken lines end up looking like this:

Array
(
    [0] =>  635
    [1] =>  
    [2] => AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time
    [3] => Somerset" signed and dated '87
    [4] => framed
    [5] => 69cm by 49cm. (2)  NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art."
    [6] => 40
    [7] => 60
    [8] => WAT
    [9] => Paintings, prints and watercolours
    [10] => 
)

这显然是错误的,因为它现在包含比其他正确行更多的数组元素.

Which is obviously wrong, as it now contains many more array elements than other correct rows.

这是我正在使用的PHP:

Here is the PHP i am using:

$i = 1;
if (($file = fopen($this->request->data['file']['tmp_name'], "r")) !== FALSE) {
    while (($row = fgetcsv($file, 0, ',', '"')) !== FALSE) {
        if ($i == 1){
            $header = $row;
        }else{
            if (count($header) == count($row)){
                $lots[] = array_combine($header, $row);
            }else{
                $error_rows[] = $row;
            }

        }
        $i++;
    }
    fclose($file);
}

具有错误数量值的行被放入$error_rows,其余的被放入大的$lots数组.

Rows with the wrong amount of values get put into $error_rows and the rest get put into a big $lots array.

该如何解决?谢谢.

推荐答案

如果您将始终获得条目0和1,并且数组中的最后5个条目始终是正确的,那么它只是描述性条目由于未转义的字符而损坏",那么您可以使用 array_slice() implode()其余的返回单个字符串(恢复丢失的引号),并正确地重建数组.

If you know that you'll always get entries 0 and 1, and that the last 5 entries in the array are always correct, so it's just the descriptive entry that's "corrupted" because of unescaped enclosure characters, then you could extract the first 2 and last 5 using array_slice(), implode() the remainder back into a single string (restoring the lost quotes), and rebuild the array correctly.

$testData = '" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated \'87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",';

$result = str_getcsv($testData, ',', '"');

$hdr = array_slice($result,0,2);
$bdy = array_slice($result,2,-5);
$bdy = trim(implode('"',$bdy),'"');
$ftr = array_slice($result,-5);

$fixedResult = array_merge($hdr,array($bdy),$ftr);
var_dump($fixedResult);

结果是:

array
  0 => string ' 635' (length=4)
  1 => string ' ' (length=1)
  2 => string 'AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time" Somerset" signed and dated '87" framed" 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.' (length=362)
  3 => string '40' (length=2)
  4 => string '60' (length=2)
  5 => string 'WAT' (length=3)
  6 => string 'Paintings, prints and watercolours' (length=34)
  7 => string '' (length=0)

不完美,但可能足够好

另一种选择是让生成csv的任何人正确地逃避其机箱

The alternative is to get whoever is generating the csv to properly escape their enclosures

这篇关于读取带有未转义机柜的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆