PHP scandir()和htmlentities():charset和/或特殊字符的问题 [英] PHP scandir() and htmlentities(): issues with charset and/or special characters

查看:177
本文介绍了PHP scandir()和htmlentities():charset和/或特殊字符的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 jqueryFileTree 显示目录列表在服务器上具有到目录中的文件的下载链接。
最近我遇到了包含特殊字符的文件的问题:

I am using jqueryFileTree to show a directory listing on the server with download links to the files in the directory. Recently I've run into an issue with files which contain special characters:


  • test.pdf:工作正常

  • tést.pdf:无效(请注意文件名中的é - 急性口音)

当调试jqueryFileTree的php连接器时,我看到它通过$ _GET传递的目录的一个scandir(),然后循环遍历目录的每个文件/目录。
在将文件解析到url之前,脚本似乎正确地对文件名执行了一个htmlentities()。
问题似乎是这个htmlentities($ file)调用只是返回一个空字符串,根据 php docs 这可能是输入字符串在给定编码内包含无效代码单元的情况。但我尝试通过调用隐式传递字符集:

When debugging the php connector of jqueryFileTree, I see it's doing a scandir() of the directory passed via $_GET, and then looping over each file/dir of the directory. Before parsing the filename into the url, the script seems to correctly perform a htmlentities() over the file name. The problem seems to be that this htmlentities($file) call just returns an empty string, which according to the php docs this can be the case when the input string contains an invalid code unit within the given encoding. However i tried passing the charset implicitly by calling:

$file = htmlentities($file,ENT_QUOTES,'UTF-8');

但这也返回一个空字符串。

But this also returns an empty string.

如果我调用:
$ file = htmlentities($ file,ENT_IGNORE,'UTF-8');
e紧急字符刚刚删除(所以tést.pdf成为tst.pdf)

If I call: $file = htmlentities($file,ENT_IGNORE,'UTF-8'); The e acute character is just dropped (so tést.pdf becomes tst.pdf)

当使用xdebug调试我的php脚本时,我可以看到源字符串包含一个未知字符(看起来像)。

When debugging my php script with xdebug I can see the source string contains an unknown character (looks like this).

所以我在这里的智慧结束找到这个解决方案。
欢迎任何帮助。

So I'm quite at my wits end here to find the solution for this. Any help would be welcome.

FYI:


  • 我的页面的字符集是UTF-8(在元数据中指定)

  • 文件存储在windows 2003文件服务器上,scandir()使用UNC路径执行(例如// fileserver / sharename / sourcedir)

  • 我的php.ini中的默认编码设置为UTF-8

  • PHP 5.4.26正在Windows 2008 R2服务器上运行

  • The charset of my page is UTF-8 (specified in metadata)
  • The file is stored on a windows 2003 fileserver and scandir() is executed with the UNC path (e.g. //fileserver/sharename/sourcedir)
  • The default encoding in my php.ini is set to UTF-8
  • The webserver & PHP 5.4.26 are running on a windows 2008 R2 server

推荐答案

我最好的猜测是文件名本身没有使用UTF-8。或者至少 scandir()不会这样拾取。

My best guess is that the filename itself isn't using UTF-8. Or at least scandir() isn't picking it up like that.

也许

var_dump(mb_detect_encoding($filename));

如果没有,尝试猜测编码(CP1252或ISO-8859-1将是我的第一个猜测)并将其转换为UTF-8,查看输出是否有效:

If not, try to guess the encoding (CP1252 or ISO-8859-1 would be my first guess) and convert it to UTF-8, see if the output is valid:

var_dump(mb_convert_encoding($filename, 'UTF-8' 'Windows-1252'));
var_dump(mb_convert_encoding($filename, 'UTF-8' 'ISO-8859-1'));
var_dump(mb_convert_encoding($filename, 'UTF-8' 'ISO-8859-15'));

或使用 iconv()



Or using iconv():

var_dump(iconv('WINDOWS-1252', 'UTF-8', $filename));
var_dump(iconv('ISO-8859-1',   'UTF-8', $filename));
var_dump(iconv('ISO-8859-15',  'UTF-8', $filename));

然后当你弄清楚实际使用的编码,你的代码应该看起来像这样假设CP1252):

Then when you've figured out which encoding is actually used, your code should look somewhat like this (assuming CP1252):

$filename = htmlentities(mb_convert_encoding($filename, 'UTF-8' 'Windows-1252'), ENT_QUOTES, 'UTF-8');

这篇关于PHP scandir()和htmlentities():charset和/或特殊字符的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆