将 cURL 响应编码为 UTF-8 时的奇怪行为 [英] Strange behaviour when encoding cURL response as UTF-8

查看:62
本文介绍了将 cURL 响应编码为 UTF-8 时的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在向第三方网站发出 cURL 请求,该网站返回一个文本文件,我需要对其进行一些字符串替换,以将某些字符替换为其 html 实体等效项,例如,我需要替换 í by í.

I'm making a cURL request to a third party website which returns a text file on which I need to do a few string replacements to replace certain characters by their html entity equivalents e.g I need to replace í by í.

直接在响应上使用 string_replace/preg_replace_callback 不会导致匹配(无论是直接搜索 í 还是使用其十六进制代码 \x00\xED),所以我在执行替换之前使用了 utf8_encode().但是 utf8_encode 将所有 í 字符替换为 Ã.

Using string_replace/preg_replace_callback on the response directly didn't result in matches (whether searching for í directly or using its hex code \x00\xED), so I used utf8_encode() before carrying out the replacement. But utf8_encode replaces all the í characters by Ã.

为什么会发生这种情况,使用 php 对任意一段文本执行 UTF-8 替换的正确方法是什么?

Why is this happening, and what's the correct approach to carrying out UTF-8 replacements on an arbitrary piece of text using php?

*edit - 一些进一步的研究揭示

*edit - some further research reveals

utf8_decode("í") == í;
utf8_encode("í") == í;
utf8_encode("\xc3\xad") ==  í;

推荐答案

您可能在 php 源代码中指定要通过字符串文字替换的字符/字符串?如果这样做,那么这些字符串文字的值取决于您保存 php 文件的编码.因此,当您看到字符 í 时,文字值可能是拉丁编码的 í,例如 8859-1 编码,或者可能是它的windows cp1252 í,或者它的 utf8 í,甚至 utf32 í...我不知道其中有多少是不同的,但我知道至少有一些有不同的字节表示,因此在 php 字符串比较中不会匹配.

You're probably specify the characters/strings you want replaced via string literals in the php source code? If you do, then the values of those string literals depends on the encoding you save your php file in. So while you see the character í, maybe the literal value is a latin encoded í, like maybe 8859-1 encoding, or maybe its windows cp1252 í, or maybe its utf8 í, or maybe even utf32 í...i dont know off hand how many of those are different, but i know at least some have different byte representations, and so wont match in a php string comparison.

我的观点是,您需要指定正确的字符,以匹配传入文本所采用的任何编码.

my point is, you need to specify the correct character that will match whatever encoding your incoming text is in.

这是一个不使用文字的例子

heres an example without using literals

$iso8859_1 = chr(236);
$utf8 = utf8_encode(chr(236));

请注意,如果您决定将文件编码更改为 utf8,文本编辑器可能会也可能不会在您更改编码时转换现有字符.我见过编辑器在更改编码时做了非常奇怪的事情.从一个新文件开始.

be warned, text editors may or may not convert the existing characters when you change the encoding, if you decide to change the file encoding to utf8. I've seen editors do really bizarre things when changing the encoding. Start with a fresh file.

也-仅仅因为另一台服务器声称它的 utf8,并不意味着它真的是.

also-just because the other server claims its utf8, doesn't mean it really is.

这篇关于将 cURL 响应编码为 UTF-8 时的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆