fgetcsv正在使用字符串的第一个字母,如果它是一个Umlaut [英] fgetcsv is eating the first letter of a String if it's an Umlaut
问题描述
我正在将Excel生成的CSV文件中的内容导入到XML文档中,如:
I am importing contents from an Excel-generated CSV-file into an XML document like:
$csv = fopen($csvfile, r);
$words = array();
while (($pair = fgetcsv($csv)) !== FALSE) {
array_push($words, array('en' => $pair[0], 'de' => $pair[1]));
}
插入的数据是英语/德语表达式。
The inserted data are English/German expressions.
我将这些值插入到XML结构中并输出XML,如下所示:
I insert these values into an XML structure and output the XML as following:
$dictionary = new SimpleXMLElement('<dictionary></dictionary>');
//do things
$dom = dom_import_simplexml($dictionary) -> ownerDocument;
$dom -> formatOutput = true;
header('Content-encoding: utf-8'); //<3 UTF-8
header('Content-type: text/xml'); //Headers set to correct mime-type for XML output!!!!
echo $dom -> saveXML();
这很好,但我遇到一个很奇怪的问题。当字符串的第一个字母是Umlaut时(如Österreich
或Ägypten
),字符将被省略, gypten
或 sterreich
。如果Umlaut在String( RussischeFöderation
)的中间,它会正确传输。同样适用于ß
或é
或任何。
This is working fine, yet I am encountering one really strange problem. When the first letter of a String is an Umlaut (like in Österreich
or Ägypten
) the character will be omitted, resulting in gypten
or sterreich
. If the Umlaut is in the middle of the String (Russische Föderation
) it gets transferred correctly. Same goes for things like ß
or é
or whatever.
所有文件都是以UTF-8编码的,并以UTF-8格式提供。
All files are UTF-8 encoded and served in UTF-8.
这看起来很奇怪和bug类似于我,很多聪明的人在这里。
This seems rather strange and bug-like to me, yet maybe I am missing something, there's a lot of smart people around here.
推荐答案
好,所以这似乎是一个错误 fgetcsv
Ok, so this seems to be a bug in fgetcsv
.
我现在正在自己处理CSV数据(有点麻烦),但它是工作,我没有任何编码
I am now processing the CSV data on my own (a little cumbersome), but it is working and I do not have any encoding issues at all.
这是(我尚未优化的版本)我在做什么:
This is (a not-yet-optimized version of) what I am doing:
$rawCSV = file_get_contents($csvfile);
$lines = preg_split ('/$\R?^/m', $rawCSV); //split on line breaks in all operating systems: http://stackoverflow.com/a/7498886/797194
foreach ($lines as $line) {
array_push($words, getCSVValues($line));
}
getCSVValues
来自 此处 ,是必需的以处理这样的CSV行(逗号!):
The getCSVValues
is coming from here and is needed to deal with CSV lines like this (commas!):
"I'm a string, what should I do when I need commas?",Howdy there
它看起来像:
function getCSVValues($string, $separator=","){
$elements = explode($separator, $string);
for ($i = 0; $i < count($elements); $i++) {
$nquotes = substr_count($elements[$i], '"');
if ($nquotes %2 == 1) {
for ($j = $i+1; $j < count($elements); $j++) {
if (substr_count($elements[$j], '"') %2 == 1) { // Look for an odd-number of quotes
// Put the quoted string's pieces back together again
array_splice($elements, $i, $j-$i+1,
implode($separator, array_slice($elements, $i, $j-$i+1)));
break;
}
}
}
if ($nquotes > 0) {
// Remove first and last quotes, then merge pairs of quotes
$qstr =& $elements[$i];
$qstr = substr_replace($qstr, '', strpos($qstr, '"'), 1);
$qstr = substr_replace($qstr, '', strrpos($qstr, '"'), 1);
$qstr = str_replace('""', '"', $qstr);
}
}
return $elements;
}
有一点解决方法,但看起来工作正常。
Quite a bit of a workaround, but it seems to work fine.
编辑:
还有一个 提交错误 ,这显然取决于区域设置。
There's a also a filed bug for this, apparently this depends on the locale settings.
这篇关于fgetcsv正在使用字符串的第一个字母,如果它是一个Umlaut的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!