用户提交的CSV文件上传UTF-8关注 [英] User submitted CSV file upload UTF-8 concern

查看:156
本文介绍了用户提交的CSV文件上传UTF-8关注的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个功能,可以使用 fgetcsv 等将用户提交的CSV文件上传到我的数据库。

I have a feature that uploads a user submitted CSV file into my database using fgetcsv etc.

数据库的排序规则为utf8_general_ci,网站字符集设置为utf-8。

My database has a collation of utf8_general_ci and the website charset is set to utf-8.

如何确保将CSV中的数据插入到数据库中,网站,正确的编码设置?

How can I ensure that when inserting the data from CSV into my database for display on the website, the correct encoding is set?

我必须测试每个字符串使用 mb_detect_encoding bit memory intensive)或者我可以只是 utf8_encode 整个字符串。

Do I have to test every string using something like mb_detect_encoding (seems a bit memory intensive) or can I just utf8_encode the whole string. Or should I not be worrying at all?

推荐答案

自动检测用户提交的文件的编码确实非常不稳定。

Auto-detecting the encoding of a user-submitted file is indeed extremely shaky.

考虑一种手动方法:


  • Have the user upload the file.

iframe 中,向他们显示如何插入数据的预览。 (像OpenOffice在将未知文件导入电子表格时)。例如此处

In an iframe, show them a preview of how the data is going to be inserted. (like OpenOffice does when importing an unknown file into a spreadsheet). An illustration of that is here

在旁边,显示一个提供所有相关编码的下拉菜单。

Next to that, show a drop-down offering all relevant encodings.

如果用户切换到其他编码,请使用 iconv()

If the user switches to a different encoding, update the preview on-the-fly using iconv():

$data = iconv($chosen_encoding, "utf-8", $data);


  • 一旦用户确认数据在所选编码中正确显示,

  • Once the user has confirmed that the data is displayed correctly in the selected encoding, do a final iconv() on the data and insert it into your database.

    这样做的缺点是,用户需要了解他们最可能不了解的问题,并且对此没有兴趣。但这是确保进入系统的数据可用的唯一方法。

    The downside of this is that the user needs to be educated about an issue that they're most likely ignorant of, and rightly not interested in. But it's the only way to make sure the data that enters the system is okay.

    回复您的评论


    想让这个透明的用户。会在字符串上做一个utf8_encode,至少确保正确的编码设置不管,还是会把所有的数据拧紧?

    I really want to make this transparent to the user. Would doing a utf8_encode on the string at least ensure the proper encoding is set regardless, or would it screw all of the data up?

    utf8_encode 只是iconv(iso-8859-1,utf-8,$ data)的同义词。如果传入的数据不是ISO-8859-1,它会被过程搞砸了。这是一个棘手的问题。

    utf8_encode is just a synonym for iconv("iso-8859-1", "utf-8", $data). If the incoming data is not ISO-8859-1, it will get screwed up by the process. It's a tricky issue.

    如果你需要这个透明,你必须尝试运气 mb_detect_encoding 文件,不幸的是,因为ISO-8859-1和UTF-8共享同一组基本(ASCII)字符,但在其他一切像UmlautsÄÖÜ不同。

    If you need this to be transparent, you'll have to try your luck with mb_detect_encoding - on the full file unfortunately, because ISO-8859-1 and UTF-8 share the same set of base (ASCII) characters but differ in everything else like Umlauts ÄÖÜ.

    请注意,如果文件来自世界各地(即可以有任何编码),编码检测接近无用。

    Note that encoding detection is close to useless if files come in from all over the world (ie. could have any encoding)

    这篇关于用户提交的CSV文件上传UTF-8关注的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆