读取带有未转义机柜的CSV文件 [英] Reading CSV file with unescaped enclosures
问题描述
我正在读取CSV文件,但是某些值未转义,因此PHP读取错误.这是不良行的示例:
I am reading a CSV file but some of the values are not escaped so PHP is reading it wrong. Here is an example of a line that is bad:
"635",","AUBREY R. PHILLIPS(1920-)-粉彩描绘了 陡峭的两面河谷,可能是北威尔士,已签名并注明日期 2000,裱框,66厘米x 48厘米.另一幅乡村风景,名为反观" 收获时间,萨默塞特郡"签名并注明日期为'87,框长69厘米乘49厘米. (2)注意:Aubrey Phillips是伍斯特郡的一名艺术家,曾就读于 ," 40," 60," WAT,"绘画,版画和 水彩",
" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated '87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",
您会看到 Somerset收获时间周围有引号,使PHP认为它具有新的价值.
You can see Harvest Time, Somerset has quotes around it, causing PHP to think its a new value.
当我在每一行上执行print_r()时,折线最终看起来像这样:
When i do print_r() on each line, the broken lines end up looking like this:
Array
(
[0] => 635
[1] =>
[2] => AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time
[3] => Somerset" signed and dated '87
[4] => framed
[5] => 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art."
[6] => 40
[7] => 60
[8] => WAT
[9] => Paintings, prints and watercolours
[10] =>
)
这显然是错误的,因为它现在包含比其他正确行更多的数组元素.
Which is obviously wrong, as it now contains many more array elements than other correct rows.
这是我正在使用的PHP:
Here is the PHP i am using:
$i = 1;
if (($file = fopen($this->request->data['file']['tmp_name'], "r")) !== FALSE) {
while (($row = fgetcsv($file, 0, ',', '"')) !== FALSE) {
if ($i == 1){
$header = $row;
}else{
if (count($header) == count($row)){
$lots[] = array_combine($header, $row);
}else{
$error_rows[] = $row;
}
}
$i++;
}
fclose($file);
}
具有错误数量值的行被放入$error_rows
,其余的被放入大的$lots
数组.
Rows with the wrong amount of values get put into $error_rows
and the rest get put into a big $lots
array.
该如何解决?谢谢.
推荐答案
如果您将始终获得条目0和1,并且数组中的最后5个条目始终是正确的,那么它只是描述性条目由于未转义的字符而损坏",那么您可以使用 array_slice(), implode()其余的返回单个字符串(恢复丢失的引号),并正确地重建数组.
If you know that you'll always get entries 0 and 1, and that the last 5 entries in the array are always correct, so it's just the descriptive entry that's "corrupted" because of unescaped enclosure characters, then you could extract the first 2 and last 5 using array_slice(), implode() the remainder back into a single string (restoring the lost quotes), and rebuild the array correctly.
$testData = '" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated \'87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",';
$result = str_getcsv($testData, ',', '"');
$hdr = array_slice($result,0,2);
$bdy = array_slice($result,2,-5);
$bdy = trim(implode('"',$bdy),'"');
$ftr = array_slice($result,-5);
$fixedResult = array_merge($hdr,array($bdy),$ftr);
var_dump($fixedResult);
结果是:
array
0 => string ' 635' (length=4)
1 => string ' ' (length=1)
2 => string 'AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time" Somerset" signed and dated '87" framed" 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.' (length=362)
3 => string '40' (length=2)
4 => string '60' (length=2)
5 => string 'WAT' (length=3)
6 => string 'Paintings, prints and watercolours' (length=34)
7 => string '' (length=0)
不完美,但可能足够好
另一种选择是让生成csv的任何人正确地逃避其机箱
The alternative is to get whoever is generating the csv to properly escape their enclosures
这篇关于读取带有未转义机柜的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!