在PHP(%E2%80%8E)中确定和删除字符串中的不可见字符 [英] Determining and removing invisible characters from a string in PHP (%E2%80%8E)

查看:552
本文介绍了在PHP(%E2%80%8E)中确定和删除字符串中的不可见字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有从数据库中读取的PHP字符串。这些字符串是URL,乍一看它们看上去不错,但最后似乎有些怪异的字符。在浏览器的地址栏中,字符串'%E2%80%8E'被附加到URL上,这会破坏URL。

I have strings in PHP which I read from a database. The strings are URLs and at first glance they look good, but there seems to be some weird character at the end. In the address bar of the browser, the string '%E2%80%8E' gets appended to the URL, which breaks the URL.

我发现这篇关于剥离左至-a在PHP字符串中右标记,这似乎与我的问题有关,但是该解决方案对我不起作用,因为我的字符似乎是其他字符。

I found this post on stripping the left-to-right-mark from a string in PHP and it seems related to my problem, but the solution does not work for me because my characters seem to be something else.

那么我该如何确定我拥有哪个字符,以便将其从字符串中删除?

So how can I determine which character I have so I can remove it from the strings?

(我将在此处发布其中一个网址作为示例,但是当我将其粘贴到此处时,堆栈溢出表单会在结尾处剥离字符。)

(I would post one of the URLs here as an example, but the stack overflow form strips the character at the end as soon as I paste it in here.)

我知道我只能在字符串中允许某些字符并丢弃所有字符其他。但是我仍然想知道它是什么字符,以及它如何进入数据库。

I know that I could only allow certain chars in the string and discard all others. But I would still like to know what char it is -- and how it gets into the database.

编辑:问题已得到解答,接受的代码已接受答案对我有用:

The question has been answered and the code given in the accepted answer works for me:

$str = preg_replace('/\p{C}+/u', "", $str);


推荐答案

如果输入内容是utf8编码的,则可以使用< a href = http://www.regular-expressions.info/unicode.html rel = noreferrer> unicode regex 匹配/剥离不可见的控制字符,例如 e2808e (从左至右标记)。使用 u(PCRE_UTF8) 修饰符 \p {C} \p {Other}

If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8) modifier and \p{C} or \p{Other}.

去除所有不可见区域

$str = preg_replace('/\p{C}+/u', "", $str);

这是 \p {Other}

检测/识别不可见对象

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
  echo "<pre>\n";
  foreach($out[0] AS $k => $v) {
    echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
  }
  echo "</pre>";
}

输出

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

对eval.in进行测试

要识别,请查看Google,例如fileformat.info:

To identify, look up at Google e.g. fileformat.info:

@google: site:fileformat.info e2808e

这篇关于在PHP(%E2%80%8E)中确定和删除字符串中的不可见字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆