ŸŒcsv中的字符不显示php [英] Ÿ Œ charcters in csv don't get displayed php

查看:45
本文介绍了ŸŒcsv中的字符不显示php的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编码的新手,所以请耐心等待.我正在一个用户上载csv的系统上工作,我需要做的是显示内容,然后将其保存在数据库中.(utf-8编码)

有人要求我解决某些法语字符无法正确显示的问题.我几乎解决了这个问题,我正在显示

之类的字符

ÀàâÆÄääççÉéÈèÊêËëîÏïÔôœÖöÙÙÛûÜÜÿ

但是标题Ÿ Œ中提到的两个在网页上尚未正确显示.

到目前为止,这是我的php代码:

 //在csv中说我们有ÖüÜߟÀàÂ"$ content = file_get_contents(addslashes($ file_name));var_dump($ content)//输出:string(54)"  ߟ   "if(!mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)){$ data = iconv('macintosh','UTF-8',$ content);}//处理已知的编码类型否则if(mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)=='ISO-8859-1'){//$ data = mb_convert_encoding($ content,'UTF-8',mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true));//不起作用$ data = iconv('ISO-8859-1','UTF-8',$ content);//不起作用}否则if(mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)=='UTF-8'){$数据= $内容}//如果我打印$ dataŸŒ"没有打印出来,它们迷失在某个地方//在这里做更多的事情 

我正在处理的文件的编码类型为 ISO-8859-1 (当我打印出 mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true),它显示 ISO-8859-1 ).

是否有人对如何处理这种特殊情况有想法?

解决方案

字符Ÿ和The在ISO-8859-1中无法表示.由于Windows-1252在某些代码位置保留了ISO-8859-1中的控制字符,因此似乎传入的数据实际上是Windows-1252(Windows拉丁语1)编码的,因为Windows-1252在某些代码位置具有图形字符,包括Ÿ和Œ.>

因此,您可能应该将Windows-1252添加到公认的编码列表中,并把公认的ISO-8859-1视为Windows-1252,即使用 iconv('windows-1252','UTF-8',$ content),即使ISO-8859-1已被识别为蜜蜂.错误标记为ISO-8859-1的Windows-1252数据非常常见.

I am new to encoding so please be patient. I am working on a system where a user upload a csv, what i need to do is to display the content and then save it in the database. (utf-8 encoding)

I have been asked to fix a issue with some french alphabet characters that weren't displayed correctly. I have almost solved the problem, I am displaying characters such as

ÀàÂâÆÄäÇçÉéÈèÊêËëÎîÏïÔôœÖöÙùÛûÜüÿ

However the two mentioned in the title ٠Πare not displayed correctly yet on the webpage.

Here is my php code so far:

// say in the csv we have "ÖüÜߟÀàÂ"
$content = file_get_contents(addslashes($file_name));
var_dump($content) // output: string(54) "���ߟ��� "
if(!mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)){
     $data = iconv('macintosh', 'UTF-8', $content);
} 
// deal with known encoding types
else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'ISO-8859-1'){
    //$data  = mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); // does not work
    $data = iconv('ISO-8859-1', 'UTF-8', $content); //does not work

}else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'UTF-8'){
    $data = $content
}
//if i print $data "٠Π" are not printed out... they got lost somewhere

       //do more stuff here

the file I am dealing with has an encoding type of ISO-8859-1(when i print out mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) it displays ISO-8859-1).

Is there anyone that have an idea on how to deal with this special cases?

解决方案

The characters Ÿ and Œ are not representable in ISO-8859-1. It seems that the incoming data is actually windows-1252 (Windows Latin 1) encoded, since windows-1252 has graphic characters, including Ÿ and Œ, in some code positions that are reserved for control characters in ISO-8859-1.

So you should probably add windows-1252 to the list of recognized encodings and treat recognized ISO-8859-1 as windows-1252, i.e use iconv('windows-1252', 'UTF-8', $content) even when ISO-8859-1 has bee recognized. Windows-1252 data mislabeled as ISO-8859-1 is very common.

这篇关于ŸŒcsv中的字符不显示php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆