fgetcsv正在使用字符串的第一个字母,如果它是一个Umlaut [英] fgetcsv is eating the first letter of a String if it's an Umlaut

查看:178
本文介绍了fgetcsv正在使用字符串的第一个字母,如果它是一个Umlaut的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Excel生成的CSV文件中的内容导入到XML文档中,如:

I am importing contents from an Excel-generated CSV-file into an XML document like:

$csv = fopen($csvfile, r);
$words = array();

while (($pair = fgetcsv($csv)) !== FALSE) {
    array_push($words, array('en' => $pair[0], 'de' => $pair[1]));
}

插入的数据是英语/德语表达式。

The inserted data are English/German expressions.

我将这些值插入到XML结构中并输出XML,如下所示:

I insert these values into an XML structure and output the XML as following:

$dictionary = new SimpleXMLElement('<dictionary></dictionary>');
//do things
$dom = dom_import_simplexml($dictionary) -> ownerDocument;
$dom -> formatOutput = true;

header('Content-encoding: utf-8'); //<3 UTF-8
header('Content-type: text/xml'); //Headers set to correct mime-type for XML output!!!!

echo $dom -> saveXML();

这很好,但我遇到一个很奇怪的问题。当字符串的第一个字母是Umlaut时(如ÖsterreichÄgypten),字符将被省略, gypten sterreich 。如果Umlaut在String( RussischeFöderation)的中间,它会正确传输。同样适用于ßé或任何。

This is working fine, yet I am encountering one really strange problem. When the first letter of a String is an Umlaut (like in Österreich or Ägypten) the character will be omitted, resulting in gypten or sterreich. If the Umlaut is in the middle of the String (Russische Föderation) it gets transferred correctly. Same goes for things like ß or é or whatever.

所有文件都是以UTF-8编码的,并以UTF-8格式提供。

All files are UTF-8 encoded and served in UTF-8.

这看起来很奇怪和bug类似于我,很多聪明的人在这里。

This seems rather strange and bug-like to me, yet maybe I am missing something, there's a lot of smart people around here.

推荐答案

好,所以这似乎是一个错误 fgetcsv

Ok, so this seems to be a bug in fgetcsv.

我现在正在自己处理CSV数据(有点麻烦),但它是工作,我没有任何编码

I am now processing the CSV data on my own (a little cumbersome), but it is working and I do not have any encoding issues at all.

这是(我尚未优化的版本)我在做什么:

This is (a not-yet-optimized version of) what I am doing:

$rawCSV = file_get_contents($csvfile);

$lines = preg_split ('/$\R?^/m', $rawCSV); //split on line breaks in all operating systems: http://stackoverflow.com/a/7498886/797194

foreach ($lines as $line) {
    array_push($words, getCSVValues($line));
}

getCSVValues 来自 此处 ,是必需的以处理这样的CSV行(逗号!):

The getCSVValues is coming from here and is needed to deal with CSV lines like this (commas!):

"I'm a string, what should I do when I need commas?",Howdy there

它看起来像:

function getCSVValues($string, $separator=","){

    $elements = explode($separator, $string);

    for ($i = 0; $i < count($elements); $i++) {
        $nquotes = substr_count($elements[$i], '"');
        if ($nquotes %2 == 1) {
            for ($j = $i+1; $j < count($elements); $j++) {
                if (substr_count($elements[$j], '"') %2 == 1) { // Look for an odd-number of quotes
                    // Put the quoted string's pieces back together again
                    array_splice($elements, $i, $j-$i+1,
                        implode($separator, array_slice($elements, $i, $j-$i+1)));
                    break;
                }
            }
        }
        if ($nquotes > 0) {
            // Remove first and last quotes, then merge pairs of quotes
            $qstr =& $elements[$i];
            $qstr = substr_replace($qstr, '', strpos($qstr, '"'), 1);
            $qstr = substr_replace($qstr, '', strrpos($qstr, '"'), 1);
            $qstr = str_replace('""', '"', $qstr);
        }
    }
    return $elements;

}

有一点解决方法,但看起来工作正常。

Quite a bit of a workaround, but it seems to work fine.

编辑:

还有一个 提交错误 ,这显然取决于区域设置。

There's a also a filed bug for this, apparently this depends on the locale settings.

这篇关于fgetcsv正在使用字符串的第一个字母,如果它是一个Umlaut的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆