处理不同编码的上传文本文件的最佳方式是什么? [英] What is the best way to handle uploaded text files of different encodings?

查看:121
本文介绍了处理不同编码的上传文本文件的最佳方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

内部我们的PHP应用程序使用UTF-8,我们处理.csv文件和固定宽度(文本)文件。我们已经写了一些好的库来处理这些文件(本质上是类)。

Internally our PHP application uses UTF-8, and we do processing on .csv files and fixedwidth (text) files. We have written some nice libraries to work with these files (classes essentially).

我们最近添加了管理员上传这些类型的文件的功能,以便能够处理这些文件,并在多个操作系统中快速遇到问题。我们很快就意识到,正在读入的文件是与我们的应用程序不同的编码(例如Windows-1252或ISO-8859)。

We recently added the ability for administrators to upload files of these types so they could be processed and quickly ran into issues across multiple OS's. What we soon realised is that the files being read in were of different encodings to our application (i.e Windows-1252 or ISO-8859).

由于不可能控制什么编码的文件提交给我们我的问题是;什么是处理不同编码的上传文本文件的最佳方式?目前我可以想到两种解决方案:

Since it is impossible to control what encoding of files are submitted to us my question is; what is the best way to handle uploaded text files of different encodings? I can think of two solutions currently:


  • 接收到文件时,检测其编码并将其转换为UTF-保存。

  • 更改csv /固定宽度库,让他们自己变成编码感知

我也考虑过这些的pro和con:

I also thought about the pro's and con's of these too:



  • 使库在内部意识到 - 这似乎涉及更多的代码,但可能会更多快速

想想吧?

编辑:在结构上应用时,应该发生字符编码/转换 - 是在输入点还是在文件使用过程中?

I am really interested to know where to apply, architecturally, character encoding/transforming should happen - is it at the point of input or during the use of the files?

推荐答案

这是棘手的,没有完美的解决方案。

This is tricky, and there is no perfect solution.

phpMyAdmin为用户提供了指定上传文件的编码的可能性。看到所有的自动检测方法不是100%可靠,如果可能,这是去IMO的最好方法。

phpMyAdmin for example offers the user the possibility to specify the encoding of the uploaded file. Seeing as all the automatic detection methods are not 100% reliable, if at all possible, this is the best way to go IMO.

一个导入对话框,允许用户选择正确的编码,同时预览其数据在该编码中的外观可能是最佳的。

An import dialog that allows the user to select the right encoding while seeing a preview of what their data looks like in that encoding might be optimal.

这样做的方法可以是


  • Receive the uploaded file and store it in a temporary file

显示一个对话框,其中包含最重要编码的下拉选择

Display a dialog with a drop-down selection of the most important encodings

有一个iframe,当下拉列表中选定的值更改时,使用 iconv() =所选编码; target = utf-8)并显示预览。

Have an iframe that, when the selected value in the drop-down changes, converts the contents of the uploaded file using iconv() (source = the selected encoding; target = utf-8) and shows a preview.

当用户选择编码时,请执行最后 iconv(),并将文件存储为UTF -8。

When the user selects an encoding, do a final iconv() and store the file as UTF-8.

这篇关于处理不同编码的上传文本文件的最佳方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆