用正确的编码读取文件 [英] reading a file with the right encoding
问题描述
Artist1 - Title 1
Artist2 - Title 2
比我打开它我的PHP脚本,我读了这些行:
$ tracklistFile_name = time()。rand(1,1000) .pathinfo($ _ FILES ['tracklistFile'] ['name'],PATHINFO_EXTENSION);
如果(((pathinfo($ tracklistFile_name,PATHINFO_EXTENSION)=='txt'))&(move_uploaded_file($ _ FILES ['tracklistFile'] ['tmp_name'],'import /'.$ tracklistFile_name) )){
$ fileArray = file('import /'.$ tracklistFile_name,FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$ fileArray = array_values(array_filter($ fileArray,trim)); $($ i = 0; $ i< sizeof($ fileArray); $ i ++){
echo $ fileArray [$ i]
}
}
和... WOW ...我得到这个结果:
Artist1 Title1
Artist2 Title2
???那些符号是什么?我认为编码失败。
这个符号是错误的,我无法将它们插入到数据库上,而不是使用 mysql_real_escape_string()
。实际上,当我尝试插入它时,我收到这个错误:
不正确的字符串值:'\x96 Titl ...'在第1行的列atl
我如何解决这个问题?建议?
编辑
尝试在插入/添加之前添加utf8_encode这些字符串:现在插入不会失败,但结果是:
Artist1 Title1
Artist2 Title2
所以我丢失了信息。为什么?
您应该阅读 Joel Spolsky关于UTF-8和编码的文章。
您的问题几乎绝对源于编码不匹配,您的第一份工作是找出这种不匹配发生的位置,您的问题可能在一堆不同的地方。 / p>
1)您的php代码可能会使用错误的编码来读取输入(如果您尝试读入iso-8859,但源文件是以其他方式编码的) / p>
2)您的php代码可能会使用错误的编码编写输出
3)无论您使用何种方式阅读输出(您的浏览器)可能被设置为与您正在编写的字节不同的编码。
一旦找出3个地方中哪一个导致您的问题,您可以通过了解您的源代码是什么,以及如何使用该源代码编码而不是其他编码(您的系统可能设置为默认编码)来读取/编写它们。
编辑:不知道php好,看起来你可以使用 mb_detect_encoding ,也可能还有 mb-convert-encoding 。
I have a txt file where, if I open with a standart text editor as notepad or scite, I can read strings like these :
Artist1 – Title 1
Artist2 – Title 2
Than I open it with my PHP script and I read the lines :
$tracklistFile_name=time().rand(1, 1000).".".pathinfo($_FILES['tracklistFile']['name'], PATHINFO_EXTENSION);
if(((pathinfo($tracklistFile_name, PATHINFO_EXTENSION)=='txt')) && (move_uploaded_file($_FILES['tracklistFile']['tmp_name'], 'import/'.$tracklistFile_name))) {
$fileArray=file('import/'.$tracklistFile_name, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$fileArray=array_values(array_filter($fileArray, "trim"));
for($i=0; $i<sizeof($fileArray); $i++) {
echo $fileArray[$i]."<br />";
}
}
and...WOW... i get this result :
Artist1 � Title1
Artist2 � Title2
??? What are those symbol? I think the encoding fail.
The symbol are so wrong that I can't insert them on database, neither with mysql_real_escape_string()
. In fact I get this error when I try to insert them :
Incorrect string value: '\x96 Titl...' for column 'atl' at row 1
How can I resolve this problem? Suggestions?
EDIT
Tried to add utf8_encode() before insert/add these strings : now the Insert don't fail, but the result is :
Artist1 Title1
Artist2 Title2
So i've lost information. Why?
You should read Joel Spolsky's article on UTF-8 and encoding.
Your problem almost definitely stems from an encoding mismatch, your first job is to figure out where this mismatch is occurring, your problem could be in a bunch of different places.
1) your php code could be reading input using an incorrect encoding (if you are trying to read in iso-8859, but the source file is encoded some other way)
2) your php code could be writing output using an incorrect encoding
3) whatever you are using to read the output (your browser) could be set to a different encoding than the bytes you are writing.
once you figure out which of the 3 places is causing your problem, you can figure out how to fix it by understanding what your source encoding is, and how to read/write using that source encoding instead of another encoding (which your system has probably set as the default).
EDIT: not knowing php well, it looks like you could use mb_detect_encoding and possibly also mb-convert-encoding.
这篇关于用正确的编码读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!