用正确的编码读取文件 [英] reading a file with the right encoding

查看:139
本文介绍了用正确的编码读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个txt文件,如果我打开一个标准的文本编辑器作为记事本或scite,我可以读这些字符串:

  Artist1  -  Title 1 
Artist2 - Title 2

比我打开它我的PHP脚本,我读了这些行:

  $ tracklistFile_name = time()。rand(1,1000) .pathinfo($ _ FILES ['tracklistFile'] ['name'],PATHINFO_EXTENSION); 
如果(((pathinfo($ tracklistFile_name,PATHINFO_EXTENSION)=='txt'))&(move_uploaded_file($ _ FILES ['tracklistFile'] ['tmp_name'],'import /'.$ tracklistFile_name) )){
$ fileArray = file('import /'.$ tracklistFile_name,FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$ fileArray = array_values(array_filter($ fileArray,trim)); $($ i = 0; $ i< sizeof($ fileArray); $ i ++){
echo $ fileArray [$ i]


}
}

和... WOW ...我得到这个结果:

  Artist1 Title1 
Artist2 Title2

???那些符号是什么?我认为编码失败。
这个符号是错误的,我无法将它们插入到数据库上,而不是使用 mysql_real_escape_string()。实际上,当我尝试插入它时,我收到这个错误:

 不正确的字符串值:'\x96 Titl ...'在第1行的列atl

我如何解决这个问题?建议?



编辑



尝试在插入/添加之前添加utf8_encode这些字符串:现在插入不会失败,但结果是:

  Artist1 Title1 
Artist2 Title2

所以我丢失了信息。为什么?

解决方案

您应该阅读 Joel Spolsky关于UTF-8和编码的文章



您的问题几乎绝对源于编码不匹配,您的第一份工作是找出这种不匹配发生的位置,您的问题可能在一堆不同的地方。 / p>

1)您的php代码可能会使用错误的编码来读取输入(如果您尝试读入iso-8859,但源文件是以其他方式编码的) / p>

2)您的php代码可能会使用错误的编码编写输出



3)无论您使用何种方式阅读输出(您的浏览器)可能被设置为与您正在编写的字节不同的编码。



一旦找出3个地方中哪一个导致您的问题,您可以通过了解您的源代码是什么,以及如何使用该源代码编码而不是其他编码(您的系统可能设置为默认编码)来读取/编写它们。



编辑:不知道php好,看起来你可以使用 mb_detect_encoding ,也可能还有 mb-convert-encoding


I have a txt file where, if I open with a standart text editor as notepad or scite, I can read strings like these :

Artist1 – Title 1
Artist2 – Title 2

Than I open it with my PHP script and I read the lines :

$tracklistFile_name=time().rand(1, 1000).".".pathinfo($_FILES['tracklistFile']['name'], PATHINFO_EXTENSION);
if(((pathinfo($tracklistFile_name, PATHINFO_EXTENSION)=='txt')) && (move_uploaded_file($_FILES['tracklistFile']['tmp_name'], 'import/'.$tracklistFile_name))) {
    $fileArray=file('import/'.$tracklistFile_name, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
    $fileArray=array_values(array_filter($fileArray, "trim"));

    for($i=0; $i<sizeof($fileArray); $i++) {
        echo $fileArray[$i]."<br />";
    }
}

and...WOW... i get this result :

Artist1 � Title1 
Artist2 � Title2 

??? What are those symbol? I think the encoding fail. The symbol are so wrong that I can't insert them on database, neither with mysql_real_escape_string(). In fact I get this error when I try to insert them :

Incorrect string value: '\x96 Titl...' for column 'atl' at row 1

How can I resolve this problem? Suggestions?

EDIT

Tried to add utf8_encode() before insert/add these strings : now the Insert don't fail, but the result is :

Artist1  Title1 
Artist2  Title2

So i've lost information. Why?

解决方案

You should read Joel Spolsky's article on UTF-8 and encoding.

Your problem almost definitely stems from an encoding mismatch, your first job is to figure out where this mismatch is occurring, your problem could be in a bunch of different places.

1) your php code could be reading input using an incorrect encoding (if you are trying to read in iso-8859, but the source file is encoded some other way)

2) your php code could be writing output using an incorrect encoding

3) whatever you are using to read the output (your browser) could be set to a different encoding than the bytes you are writing.

once you figure out which of the 3 places is causing your problem, you can figure out how to fix it by understanding what your source encoding is, and how to read/write using that source encoding instead of another encoding (which your system has probably set as the default).

EDIT: not knowing php well, it looks like you could use mb_detect_encoding and possibly also mb-convert-encoding.

这篇关于用正确的编码读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆