将ISO-8859-2转换为UTF-8(波兰语字符) [英] Convert ISO-8859-2 to UTF-8 (Polish characters)
问题描述
我正在尝试解析XML文件( http://jstryczek.blox.pl/rss2),表示其字符集为ISO-8859-2。我的数据库位于UTF-8中,因此我想将其转换为UTF-8。
I'm trying to parse an XML file (http://jstryczek.blox.pl/rss2) that says its character set is ISO-8859-2. My database is in UTF-8, so I want to convert it to UTF-8.
为此,我在字符串上运行以下命令:
To do that I run the following on the string:
$content = iconv('ISO-8859-2', 'UTF-8//TRANSLIT', $content);
由于某种原因,我返回了奇数编码,因此:
For some reason, I'm getting back an odd encoding, so that:
Gdzie są różnice
通过以下方式出现:
Gdzie sÄ róşnice
有没有解释为什么波兰字符无法通过? UTF-8不支持它们吗?
Is there an explanation for why the Polish characters aren't coming through? Does UTF-8 not support them?
推荐答案
我通过将字符串更改为json来解决此问题,然后将所有波兰语特殊符号替换为html代码。我在结果下方添加:
I fix this by changing the string to json and then replace all polish special signs to html code. I add below my result:
$specialChars = [
'\u0105', # ą
'\u0107', # ć
'\u0119', # ę
'\u0142', # ł
'\u0144', # ń
'\u00f3', # ó
'\u015b', # ś
'\u017a', # ź
'\u017c', # ż
'\u0104', # Ą
'\u0106', # Ć
'\u0118', # Ę
'\u0141', # Ł
'\u0143', # Ń
'\u00d3', # Ó
'\u015a', # Ś
'\u0179', # Ż
'\u017b', # Ż
];
$polishHtmlCodes = [
'ą', # ą
'ć', # ć
'ę', # ę
'ł', # ł
'ł', # ń
'ó', # ó
'ś', # ś
'ź', # ź
'ż', # ż
'Ą', # Ą
'Ć', # Ć
'Ę', # Ę
'Ł', # Ł
'Ń', # Ń
'Ó', # Ó
'Ś', # Ś
'Ź', # Ż
'Ż', # Ż
];
$result = str_replace($specialChars, $polishHtmlCodes, json_encode($string));
var_dump(json_decode($result));
// prints
// e.g. 'Różowe okulary'
这篇关于将ISO-8859-2转换为UTF-8(波兰语字符)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!