在线检测html转义码 [英] in-line detection of html escape codes

查看:83
本文介绍了在线检测html转义码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个for循环,它会遍历每个角色,并且
在每80个角落之间放置一个空格,实际上强制自动换行发生

。这可以使用正则表达式轻松实现。


如果我想改进这一点,并在url中做出这样的东西没有

计入80个字符的限制,正则表达式不会足够
。然而,一个简单的for循环呢。


所以现在我很好奇如何计算html转义码,例如

& nbsp;和& copy;。因为我有一个for-loop,在线检测似乎是要走的路,尽管我不确定如何实现它。


i认为我可以对一个finate状态机进行排序,即当调用该函数时,
返回当前状态。然后将当前

状态重新下载到finate状态机中,并且字符串中的下一个字符将是

,并且可以返回新状态。

如果返回的状态是接受状态,我们只计算传递给FSM

一次的字符串中的所有

字符,如果没有状态被退回,我们可以将所有角色朝向

的80个字符限制。


但是,我不确定如何实现上述功能。一个

的问题是,似乎有很多html转义码,而且...

是啊...


任何帮助将不胜感激 - 谢谢! :)

解决方案

2004年6月2日星期三21:05:08 -0700,yawnmoth写道:

所以现在我很好奇如何计算html转义码,如
& nbsp;和& copy;。因为我有一个for-loop,在线检测似乎是要走的路,虽然我不确定如何实现它。




一个词:正则表达式。


-

相信我,我知道我在做什么。 (大锤)


2004年6月3日星期四00:09:14 -0400,Mladen Gogala

< go ****@sbcglobal.net>写道:

2004年6月2日星期三21:05:08 -0700,yawnmoth写道:

所以现在我很好奇如何计算html转义码,如
& nbsp;和& copy;。因为我有一个for-loop,在线检测似乎是要走的路,虽然我不确定如何实现它。



一个单词:正则表达式。




因为我正在尝试做的事情的一半*不能定期完成

表达式(你可以通过使用抽取

引理来验证这个),为什么我要在常规的

表达式中做另一半?我希望我的代码尽可能接近
O(n) - 而不是O(n ** 3),或其他什么。


,你会提出什么样的正则表达式? &安培; [^&安培;;] *;不是好b / b
一个好的,因为不仅仅是&和/或和;

可以生成一个html转义码 - 只有某些可以。一个例子

一个不是& asdf;


i假设我可以做类似&(nbsp | amp | gt | lt |等);或者

&((n(bsp | tilde))| amp);,但是......前者不会很快超过

(特别是因为我已经不得不循环通过字符串,无论如何),

和...后来将非常*很难写,有大量的

paranthesis另外,我还不知道每个单独的html转义代码是什么。


无论如何,如同我之前说过,我认为要采用的方法是使用一些有限状态机的实现,该状态机为每个字符输入返回当前的

状态。正则表达式

不适用于此任务,因为它们不会返回状态等。


关于这个众所周知的引用,通常归结为到yawnmoth'的名言2

Jun 2004 21:05:08 -0700"演讲:

说我有一个for循环,遍历每一个角色,并在每80个之间放一个空格,实际上强制自动换行发生。这可以使用正则表达式轻松实现。

如果我想对此进行改进,并使其成为url的内容,则不会计入80个字符的限制,正则表达式不会足够。然而,一个简单的for循环呢。

所以现在我很好奇如何计算html转义码,如
& nbsp;和& copy;。因为我有一个for-loop,在线检测似乎是要走的路,虽然我不确定如何实现它。

我在想我能做到sorta模拟一个finate状态机,它在调用函数时返回当前状态。然后将当前
状态重新发送到finate状态机以及字符串中的下一个字符,并且可以返回新状态。
如果返回的状态是接受状态,我们只统计传递给FSM的字符串中的所有字符
一次,如果没有返回状态,我们可以将所有字符朝向80字符限制。<但是,我不确定如何实现上述功能。一个问题是,似乎有很多html转义码,并且...
是的...

任何帮助都将不胜感激 - 谢谢! :)



这个没有经过测试的代码(看一遍,本地有点晚了),但是我想b / b
想想你是否html_entity_decode ()任何看起来像HTML实体的东西,

,结果只有一个字符,比你可以放心地认为它是一个有效的HTML实体。

有效的HTML实体。


参考: http://us3.php.net/manual/en/functio...ity-decode.php

<?php


say i have a for loop that would iterate through every character and
put a space between every 80th one, in effect forcing word wrap to
occur. this can be implemented easily using a regular expression.

if i wanted to improve on this, and make it so stuff in url''s didn''t
count towards that 80 character limit, a regular expression would not
suffice. however, a simple for loop does.

so now i''m currious how to account for html escape codes such as
&nbsp; and &copy;. since i have a for-loop, in-line detection seems
to be the way to go, although i''m not really sure how to implement it.

i was thinking i could sorta simulate a finate state machine that
returns the current state when the function is called. the current
state would then be repassed into the finate state machine along with
the next character in the string, and the new state could be returned.
if the state returned is an accept state, we only count all the
characters in the string of characters that was passed to the FSM
once, and if no state is returned, we could all the characters towards
the 80 character limit.

however, i''m not really sure how to implement the above function. one
problem is that there seem to be a lot of html escape codes, and...
yeah...

any help would be appreciated - thanks! :)

解决方案

On Wed, 02 Jun 2004 21:05:08 -0700, yawnmoth wrote:

so now i''m currious how to account for html escape codes such as
&nbsp; and &copy;. since i have a for-loop, in-line detection seems
to be the way to go, although i''m not really sure how to implement it.



One word: regular expressions.

--
Trust me, I know what I''m doing. (Sledge Hammer)


On Thu, 03 Jun 2004 00:09:14 -0400, Mladen Gogala
<go****@sbcglobal.net> wrote:

On Wed, 02 Jun 2004 21:05:08 -0700, yawnmoth wrote:

so now i''m currious how to account for html escape codes such as
&nbsp; and &copy;. since i have a for-loop, in-line detection seems
to be the way to go, although i''m not really sure how to implement it.



One word: regular expressions.



because half of what i''m trying to do *can''t* be done doing regular
expressions (you can verify this for yourself by using the pumping
lemma on it), why would i want to do the other half in regular
expressions? i want my code to have a big-o effeciency as close to
O(n) as possible - not O(n**3), or whatever.

also, what exact regular expression would you propose? &[^&;]*; isn''t
a good one because not just any string of characters between a & and ;
can make an html escape code - only certain ones can. an example of
one that isn''t is &asdf;

i suppose i could do something like &(nbsp|amp|gt|lt| etc ); or
&((n(bsp|tilde))|amp);, but... the former isn''t going to be uber fast
(especially since i already have to loop through the string, anyway),
and... the later is going to be *very* hard to write, having tons of
paranthesis, being very long, etc.

additionally, i don''t know what every single html escape code is.

anyway, as i said before, i think the way to go is to use some
implementation of a finite state machine that returns the current
state for each one character input. regular expressions are
unsuitable for this task because they don''t return states, etc.


Regarding this well-known quote, often attributed to yawnmoth''s famous "2
Jun 2004 21:05:08 -0700" speech:

say i have a for loop that would iterate through every character and
put a space between every 80th one, in effect forcing word wrap to
occur. this can be implemented easily using a regular expression.

if i wanted to improve on this, and make it so stuff in url''s didn''t
count towards that 80 character limit, a regular expression would not
suffice. however, a simple for loop does.

so now i''m currious how to account for html escape codes such as
&nbsp; and &copy;. since i have a for-loop, in-line detection seems
to be the way to go, although i''m not really sure how to implement it.

i was thinking i could sorta simulate a finate state machine that
returns the current state when the function is called. the current
state would then be repassed into the finate state machine along with
the next character in the string, and the new state could be returned.
if the state returned is an accept state, we only count all the
characters in the string of characters that was passed to the FSM
once, and if no state is returned, we could all the characters towards
the 80 character limit.

however, i''m not really sure how to implement the above function. one
problem is that there seem to be a lot of html escape codes, and...
yeah...

any help would be appreciated - thanks! :)


This isn''t tested code (look it over, it''s a bit late, locally), but I
think if you html_entity_decode() anything that looks like an HTML entity,
and the result is only one character, than you can safely assume it''s a
valid HTML entity.

Ref: http://us3.php.net/manual/en/functio...ity-decode.php

<?php


这篇关于在线检测html转义码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆