为什么此A0字符出现在我的HTML :: Element输出中? [英] Why is this A0 character appearing in my HTML::Element output?

查看：56 发布时间：2021/5/4 19:15:18 perl encoding

本文介绍了为什么此A0字符出现在我的HTML :: Element输出中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在解析带有两个Perl模块的HTML文档: HTML :: TreeBuilder 和 HTML :: .出于某种原因，只要标签的内容只是& nbsp; ，它就会被HTML :: Element作为我从未见过的奇怪字符返回:

替代文字http://www.freeimagehosting.net/uploads/2acca201ab.jpg

我无法复制字符，因此无法对其进行Google搜索，无法在字符映射图中找到它，而且奇怪的是，当我使用正则表达式进行搜索时， \ w 会找到它.当我将返回的文档转换为ANSI或UTF-8时，它完全消失了.在HTML :: Element文档中也找不到任何信息.

我如何检测和替换为更有用的字符(如 null )?将来如何处理这种奇怪的字符?

解决方案

字符为"\ xa0" (即160)，这是& nbsp;的标准Unicode转换..(也就是说，这是Unicode的不间断空格.)如果愿意，您应该可以使用 s/\ xa0//g 删除它们.

I'm parsing an HTML document with a couple Perl modules: HTML::TreeBuilder and HTML::Element. For some reason whenever the content of a tag is just  , which is to be expected, it gets returned by HTML::Element as a strange character I've never seen before:

alt text http://www.freeimagehosting.net/uploads/2acca201ab.jpg

I can't copy the character so can't Google it, couldn't find it in character map, and strangely when I search with a regular expression, \w finds it. When I convert the returned document to ANSI or UTF-8 it disappears altogether. I couldn't find any info on it in the HTML::Element documentation either.

How can I detect and replace this character with something more useful like null and how should I deal with strange characters like this in the future?

解决方案

The character is "\xa0" (i.e. 160), which is the standard Unicode translation for  . (That is, it's Unicode's non-breaking space.) You should be able to remove them with s/\xa0/ /g if you like.

这篇关于为什么此A0字符出现在我的HTML :: Element输出中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么此A0字符出现在我的HTML :: Element输出中? [英] Why is this A0 character appearing in my HTML::Element output?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么此A0字符出现在我的HTML :: Element输出中? [英] Why is this A0 character appearing in my HTML::Element output?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭