阿拉伯字符编码问题:UTF-8与Windows-1256 [英] Arabic Character Encoding Issue: UTF-8 versus Windows-1256

查看:338
本文介绍了阿拉伯字符编码问题:UTF-8与Windows-1256的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快速背景:我继承了一个大型的SQL转储文件,其中包含英语和阿拉伯文本的组合,(我认为)它最初是使用latin1导出的。我在导入文件之前将所有出现的latin1更改为utf8。阿拉伯文本没有正确显示在phpmyadmin(我猜是正常的),但是当我加载文本到一个网页与以下...

Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the arabic text didn't appear correctly in phpmyadmin (which I guess is normal), but when I loaded the text to a web page with the following...

<meta http-equiv='Content-Type' content='text/html; charset=windows-1256'/> 

...一切看起来都很好,阿拉伯文本完美显示。

...everything looked good and the arabic text displayed perfectly.



问题:我的客户真的很挑剔,不想更改他的...


Problem: My client is really really really picky and doesn't want to change his...

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

...到'Windows-1256'我没有想到这将是一个问题,但是当我将字符集值更改为UTF-8,所有的阿拉伯字符出现为带有问号的钻石。应该不是UTF-8正确显示阿拉伯文字?

...to the 'Windows-1256' equivalent. I didn't think this would be a problem, but when I changed the charset value to 'UTF-8', all of the arabic characters appeared as diamonds with question marks. Shouldn't UTF-8 display arabic text correctly?



这是关于我的数据库配置的几个注释:


Here are a few notes about my database configuration:


  • 数据库字符集为utf8

  • 数据库连接排序规则为utf8_general_ci

  • 所有数据库,表格和适用字段已整理为utf8_general_ci

我一直在整理堆栈溢出和其他论坛任何与我的问题有关。我发现类似的问题,但不是解决方案似乎工作对我的具体情况。希望有人可以帮助!

I've been scouring stack overflow and other forums for anything the relates to my issue. I've found similar problems, but not of the solutions seem to work for my specific situation. Hope someone can help!

推荐答案

如果文档在声明为windows-1256编码时看起来正确,是 windows-1256编码。所以它显然不是使用latin1导出的 - 这是不可能的,因为latin1没有阿拉伯字母。

If the document looks right when declared as windows-1256 encoded, then it most probably is windows-1256 encoded. So it was apparently not exported using latin1—which would have been impossible, since latin1 has no Arabic letters.

如果这只是一个文件,那么最简单的方法是将其从windows-1256编码转换为utf-8编码,使用eg 记事本++ 。 (打开其中的文件,通过文件格式菜单将编码更改为阿拉伯语,Windows-1256,然后在文件格式菜单中选择转换为UTF-8,然后执行文件→保存。)

If this is just about a single file, then the simplest way is to convert it from windows-1256 encoding to utf-8 encoding, using e.g. Notepad++. (Open the file in it, change the encoding, via File format menu, to Arabic, windows-1256. Then select Convert to UTF-8 in the File format menu and do File → Save.)

Windows-1256和UTF-8是完全不同的编码,所以如果你将Windows-1256数据声明为UTF-8或者反之亦然,数据会变得乱七八糟。只有ASCII字符(如英文字母)在两种编码中都具有相同的表示。

Windows-1256 and UTF-8 are completely different encodings, so data gets all messed up if you declare windows-1256 data as UTF-8 or vice versa. Only ASCII characters, such as English letters, have the same representation in both encodings.

这篇关于阿拉伯字符编码问题:UTF-8与Windows-1256的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆