htmlspecialchars导致文本消失 [英] htmlspecialchars causing text to disapear
问题描述
我遇到了一个特定的字符串(它不是完全可打印的,但是您可以在下面看到它),这导致htmlspecialchars()返回长度为零的字符串.有什么办法可以解决这个问题?
I encountered a particular string (it's not completely printable, but you can see it below) that causes a htmlspecialchars() to return a zero-length string. Is there any way this can be fixed?
$Stmnt = 'SELECT subject_name FROM bans WHERE id = 2321';
$Fetch = $Conn->query($Stmnt);
if(!$Fetch)
die('Could not query DB');
while($Row = $Fetch->fetch_array(MYSQLI_ASSOC))
{
$RawName = $Row['subject_name'];
$RawLen = strlen($RawName);
echo('RAW NAME: ['.$RawName.']'.', LENGTH: ['.$RawLen.']'.'<br />');
for($i = 0; $i < $RawLen; $i++)
echo('CHAR '.$i.' = ['.$RawName[$i].'] (ORD: '.ord($RawName[$i]).')<br />');
$CleanName = htmlspecialchars($RawName, ENT_QUOTES, 'UTF-8');
$CleanLen = strlen($CleanName);
echo('CLEAN NAME: ['.$CleanName.']'.', LENGTH: ['.$CleanLen.']'.'<br />');
for($i = 0; $i < $CleanLen; $i++)
echo('CHAR '.$i.' = ['.$CleanName[$i].'] (ORD: '.ord($CleanName[$i]).')<br />');
}
$Fetch->close();
echo('DONE');
输出:
RAW NAME: [━═★ Coммander Fι5н �], LENGTH: [31]
CHAR 0 = [�] (ORD: 226)
CHAR 1 = [�] (ORD: 148)
CHAR 2 = [�] (ORD: 129)
CHAR 3 = [�] (ORD: 226)
CHAR 4 = [�] (ORD: 149)
CHAR 5 = [�] (ORD: 144)
CHAR 6 = [�] (ORD: 226)
CHAR 7 = [�] (ORD: 152)
CHAR 8 = [�] (ORD: 133)
CHAR 9 = [ ] (ORD: 32)
CHAR 10 = [C] (ORD: 67)
CHAR 11 = [o] (ORD: 111)
CHAR 12 = [�] (ORD: 208)
CHAR 13 = [�] (ORD: 188)
CHAR 14 = [�] (ORD: 208)
CHAR 15 = [�] (ORD: 188)
CHAR 16 = [a] (ORD: 97)
CHAR 17 = [n] (ORD: 110)
CHAR 18 = [d] (ORD: 100)
CHAR 19 = [e] (ORD: 101)
CHAR 20 = [r] (ORD: 114)
CHAR 21 = [ ] (ORD: 32)
CHAR 22 = [F] (ORD: 70)
CHAR 23 = [�] (ORD: 206)
CHAR 24 = [�] (ORD: 185)
CHAR 25 = [5] (ORD: 53)
CHAR 26 = [�] (ORD: 208)
CHAR 27 = [�] (ORD: 189)
CHAR 28 = [ ] (ORD: 32)
CHAR 29 = [�] (ORD: 226)
CHAR 30 = [�] (ORD: 148)
CLEAN NAME: [], LENGTH: [0]
DONE
推荐答案
我现在了解为什么它返回零长度的字符串.很抱歉问这个问题.发布之前,我应该做更多的研究.无论如何,答案如下:
I understand now why it's returning a zero-length string. Sorry for asking this question. I should have researched more before posting. Anyway, the answer is the following:
在PHP手册用于htmlspecialchars的页面:
如果输入字符串在给定的编码范围内包含无效的代码单元序列,则将返回一个空字符串,除非设置了ENT_IGNORE或ENT_SUBSTITUTE标志.
If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.
然后我问自己,这个字符串什么是无效"?在Wiki 页面上的UTF-8 给出了一个很好的UTF-图8种编码方式.表示纯文本ASCII"的所有代码点均为0-127(字节中的MSB始终为0).
Then I ask myself what is "invalid" about this string? On the Wiki page for UTF-8 it gives a good diagram of UTF-8 encoding. All codepoints representing "plain text ASCII" would be 0-127 (the MSB in the byte is always 0).
如果一个字节的MSB为1(十进制128到255),它将告诉UTF-8兼容的解析器该代码点由一个多字节链组成. 下一个字节的前两个最高有效位必须为1,然后为0.
If a byte's MSB is 1 (decimal 128 to 255) it tells a UTF-8 compliant parser that the codepoint consists of a multi-byte chain. And the next byte's first two Most-Significant-Bits must be a 1 followed by a 0.
很明显,在该字符串中,存在一个字节超过127并且下一个字节不是以1&开头的情况. 0.因此它是无效的UTF-8编码.
Obviously in this string, there is a case where one byte is over 127 and the following byte does not begin with a 1 & 0. Therefore it is invalid UTF-8 encoding.
感谢此SO帖子我认为将使用ENT_SUBSTITUTE标志进行解析(或者如果您确定删除这些不符合要求的字节不会
Thanks for this SO post for the resolution, which in my opinion, is to use the ENT_SUBSTITUTE flag (or I suppose ENT_IGNORE if you are sure that deleting these non-conforming bytes won't be a security issue).
这篇关于htmlspecialchars导致文本消失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!