为什么var_dump返回的值大于字符串长度? [英] Why would var_dump return a bigger value than the string length?

查看:229
本文介绍了为什么var_dump返回的值大于字符串长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用API​​获取一些歌曲歌词,并将歌词字符串转换为单词数组.我在preg_replace函数中遇到了一些异常行为.当我使用var_dump进行调试时,我看到var_dump为字符串"you"返回的值为10,这告诉我可能有问题.之后,preg_replace行为异常.

I am working on getting some song lyrics using an API, and converting the lyrics string into an array of words. I am getting some unusual behaviors in preg_replace function. When I did some debugging using var_dump, I see that var_dump returns a value of 10 for the string "you", which tells me that there might be something wrong. After that preg_replace acts weirdly.

这是我的代码:

$source = get_chart_lyrics_data("madonna","frozen");
$pieces = explode("\n", $source);
$lyrics = array();
for($i=0;$i<count($pieces);$i++){
  if($i>10){
    $words = explode(" ",$pieces[$i]);
    foreach($words as $_word){
      if($_word=="")
        continue;
      var_dump($_word);
      $word = strtolower($_word);
      var_dump($word);
      $word = trim($word);
      var_dump($word);
      $word = preg_replace("/[^A-Za-z ]/", '', $word);
      var_dump($word);
      $lyrics[$word]++;
    }
  }
}

这是此代码返回的前4行:

This is the first 4 lines this code returns:

string(10) "You"
string(10) "you"
string(10) "you"
string(8) "lyricyou"

为什么var_dump为您"返回10的值?为什么preg_replace会那样行事?

How come var_dump is returning a value of 10 for "you"? And why preg_replace is acting like that?

谢谢.

推荐答案

最可能的答案是该字符串包含超出您"的不可打印字符.要弄清楚它到底包含什么,您必须查看原始字节.使用echo bin2hex($word)执行此操作.这将输出一个类似于666f6f...的字符串,其中每2个字符以十六进制表示为一个字节.您可以使用类似以下内容来使其更具可读性:

The likeliest answer is that the string contains non-printable characters beyond "you". To figure out what exactly it contains, you'll have to look at the raw bytes. Do this with echo bin2hex($word). This outputs a string like 666f6f..., where every 2 characters are one byte in hexadecimal notation. You may make that more readable with something like:

echo join(' ', str_split(bin2hex($word), 2));
// 66 6f 6f ...

现在使用您喜欢的ASCII/Unicode表(取决于字符串的编码)来找出这些字符代表什么以及从何处获取它们.

Now use your favourite ASCII/Unicode table (depending on the encoding of the string) to figure out what individual characters those represent and where you got them from.

也许您的字符串采用UTF-16编码,在这种情况下,您应该每两个字符看到一个00个字节.

Perhaps your string is encoded in UTF-16, in which case you should see telltale 00 bytes every two characters.

这篇关于为什么var_dump返回的值大于字符串长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆