用preg_replace替换空白会导致UTF-8字符无效 [英] Replacing empty space with preg_replace causes invalid characters with UTF-8

查看:472
本文介绍了用preg_replace替换空白会导致UTF-8字符无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的PHP Web应用程序(在Windows Server 2008 R2上运行的PHP 5.6.30)使用UTF-8编码,但是需要从使用Windows-1252编码的文件中导入数据.导入数据后,将其转换为UTF-8,如下所示.

Our PHP web application (PHP 5.6.30 running on Windows Server 2008 R2) uses UTF-8 encoding but needs to import data from files that are encoded using Windows-1252. When the data is imported it is converted to UTF-8 as follows.

iconv('Windows-1252', 'UTF-8', $value);

当我们导入以下示例数据时,对于大多数Windows-1252字符,转换都可以正常工作,但是在下面的第8行中,à字符会出现问题,并且转换不正确.

When we import the following sample data, the conversion works correctly for most of the Windows-1252 characters, but in line 8 below, the à character gives problems and is not correctly converted.

1;€
2;é
3;è
4;ë
5;ï
6;ä
7;á
8;à
9;ç
10;ß
11;ø 
12;í
13;ì
14;ñ
15;@
16;û

以下是屏幕截图,显示了在网站上显示此数据的结果.

Here is a screenshot showing the result of displaying this data on the website.

有人知道为什么PHP iconv不能正确转换à字符吗?

Does anyone know why the PHP iconv is not correctly converting the à character?

推荐答案

我解决了这个问题,最终与iconv毫无关系,就像我最初想到的那样.所需的更改很小,只有一个字符,但是我花了很长时间才找到.事实证明,令人反感的陈述实际上是以下内容:

I resolved this issue and it ended up having nothing to do with iconv like I initially thought. The change that was required was such a small one, only one character, but it took me ages to hunt this down. It turns out that the offending statement was actually the following:

preg_replace('/\s+/', ' ',$columnvalue))

此正则表达式的目的是从值中除去空格,但是由于编码为UTF-8,因此此regular expression具有破坏à字符的残留效果.我解决了这个问题,但在正则表达式定义的末尾添加了u(unicode modifier).因此表达式变为:

The purpose of this regular expression is to remove white space from the value, but because the encoding was UTF-8 this regular expression had a residual effect of corrupting the à character. I resolved this but adding u (unicode modifier) to the end of the regular expression definition. So the expression became:

preg_replace('/\s+/u', ' ',$columnvalue))

然后页面的编码是正确的.

And then the encoding of the page was correct.

这篇关于用preg_replace替换空白会导致UTF-8字符无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆