从字符串中删除非ASCII字符 [英] Remove non-ascii characters from string

查看:122
本文介绍了从字符串中删除非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从网站提取数据时,我遇到奇怪的字符:

I'm getting strange characters when pulling data from a website:

Â

如何删除不是非扩展ASCII字符的任何内容?

How can I remove anything that isn't a non-extended ASCII character?

推荐答案

使用正则表达式替换将是最佳选择.使用$str作为示例字符串,并使用:print:进行匹配,该:print: POSIX字符类:

A regex replace would be the best option. Using $str as an example string and matching it using :print:, which is a POSIX Character Class:

$str = 'aAÂ';
$str = preg_replace('/[[:^print:]]/', '', $str); // should be aA

:print:的作用是查找所有可打印的字符.反面:^print:查找所有不可打印的字符.所有不属于当前字符集的字符都将被删除.

What :print: does is look for all printable characters. The reverse, :^print:, looks for all non-printable characters. Any characters that are not part of the current character set will be removed.

注意:使用此方法之前,必须确保当前字符集为ASCII. POSIX字符类同时支持ASCII和Unicode,并且仅根据当前字符集进行匹配.自PHP 5.6起,默认字符集为UTF-8.

Note: Before using this method, you must ensure that your current character set is ASCII. POSIX Character Classes support both ASCII and Unicode and will match only according to the current character set. As of PHP 5.6, the default charset is UTF-8.

这篇关于从字符串中删除非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆