使用fputcsv()/fgetcsv()写入csv时,数据出现乱码 [英] Data gets garbled when writing to csv with fputcsv() / fgetcsv()

查看:295
本文介绍了使用fputcsv()/fgetcsv()写入csv时,数据出现乱码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

fputcsv()和fgetcsv()在PHP中似乎存在编码问题或错误.

以下PHP代码:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before);

rewind($fh);

$row_after = fgetcsv($fh);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

给我这个输出:

BEFORE:
array (
  0 => 'A',
  1 => '["a","\\\\","b"]',
  2 => 'B',
)

AFTER:
array (
  0 => 'A',
  1 => '["a","\\\\',
  2 => 'b""]"',
  3 => 'B',
)

很明显,数据在途中被损坏.最初,该行中只有3个单元格,之后则是4个单元格.由于反斜线也用作转义字符,因此中间单元被拆分了.

另请参阅 https://3v4l.org/nc1oE 或在此处,使用明确的定界符,附件,escape_char值: https://3v4l.org/Svt7m

在写入CSV之前,有什么方法可以清理/转义数据,以确保从文件读取的数据完全相同?

CSV是一种完全可逆的格式吗?

目标将是一种以csv形式正确写入和读取ANY数据的机制,以便在一次往返之后数据仍然相同.

我意识到我不太了解$ escape_char参数.另请参见 fgetcsv/fputcsv $ escape参数从根本上被破坏也许对此的答案也使我们更接近解决方案.

解决方案

罪魁祸首是fputcsv()使用转义字符,这是CSV的非标准扩展. (嗯,就RFC 7111而言,它可以视为标准.)基本上,必须禁用此转义字符,但是将空字符串作为$ escape传递给fputcsv()无效.通常,传递NUL字符应该可以达到预期的效果,但是,请参见 https://3v4l.org/MlluN

There seems to be an encoding issue or bug in PHP with fputcsv() and fgetcsv().

The following PHP code:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before);

rewind($fh);

$row_after = fgetcsv($fh);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

Gives me this output:

BEFORE:
array (
  0 => 'A',
  1 => '["a","\\\\","b"]',
  2 => 'B',
)

AFTER:
array (
  0 => 'A',
  1 => '["a","\\\\',
  2 => 'b""]"',
  3 => 'B',
)

So clearly, the data is damaged on the way. Originally there were just 3 cells in the row, afterwards there are 4 cells in the row. The middle cell is split thanks to the backslash that is also used as an escape character.

See also https://3v4l.org/nc1oE Or here, with explicit values for delimiter, enclosure, escape_char: https://3v4l.org/Svt7m

Is there any way I can sanitize / escape my data before writing to CSV, to guarantee that the data read from the file will be exactly the same?

Is CSV a fully reversible format?

EDIT: The goal would be a mechanism to properly write and read ANY data as csv, so that after one round trip the data is still the same.

EDIT: I realize that I do not really understand the $escape_char parameter. See also fgetcsv/fputcsv $escape parameter fundamentally broken Maybe an answer to this would also bring us closer to a solution.

解决方案

The culprit is that fputcsv() uses an escape character, which is a non-standard extension to CSV. (Well, as far as RFC 7111 can be regarded as standard.) Basically, this escape character would have to be disabled, but passing an empty string as $escape to fputcsv() doesn't work. Usually, passing a NUL character should give the desired results, however, see https://3v4l.org/MlluN.

这篇关于使用fputcsv()/fgetcsv()写入csv时,数据出现乱码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆