通过HTML实体拆分字符串? [英] Split string by HTML entities?

查看:99
本文介绍了通过HTML实体拆分字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的字符串包含很多HTML实体,像这样


" Hello& nbsp;& lt; everybody& lt; ;& nbsp; there& x22;

我想通过HTML实体将它拆分为:


你好

每个人

存在


有人可以建议我采取这种方式吗?可能是使用正则表达式?

解决方案

看起来您可以分割& [^;] *; 正则表达式。也就是说,分隔符是以& 开头的字符串,以; 结尾,并且可以有任何内容但是;



如果在一行中可以有多个分隔符,并且不需要空字符串在它们之间,只需使用(& [^;] *;)+ (或者一般 delim )+ 模式)。

如果您可以在字符串的开头或前面添加分隔符,并且您不希望它们中包含由它们引起的空字符串,那么请在分割之前将它们修剪掉。





示例

  var s =com / IOYVV =nofollow noreferrer>另见ideone.com ) &安培;#X22;你好&安培; NBSP;&安培; LT;每个人&安培; GT;&安培; NBSP;还有&安培;#X22; 

print(s.split(/& [^;] *; /));
//,你好,大家,那里,

print(s.split(/(?:& [^;] *;)+ /));
//,大家好,大家好,

print(
s.replace(/ ^(?:& [^;] *;)+ /, )
.replace(/(?:& [^;] *;)+ $ /,)
.split(/(?:& [^;] *;)+ / )
);
//你好,每个人都有


My string contain a lot of HTML entities, like this

"Hello <everybody> there"

And I want to split it by HTML entities into this :

Hello
everybody
there

Can anybody suggest me a way to do this please? May be using Regex?

解决方案

It looks like you can just split on &[^;]*; regex. That is, the delimiter are strings that starts with &, ends with ;, and in between there can be anything but ;.

If you can have multiple delimiters in a row, and you don't want the empty strings between them, just use (&[^;]*;)+ (or in general (delim)+ pattern).

If you can have delimiters in the beginning or front of the string, and you don't want them the empty strings caused by them, then just trim them away before you split.


Example

Here's a snippet to demonstrate the above ideas (see also on ideone.com):

var s = ""Hello <everybody> there""

print (s.split(/&[^;]*;/));
// ,Hello,,everybody,,there,

print (s.split(/(?:&[^;]*;)+/));
// ,Hello,everybody,there,

print (
   s.replace(/^(?:&[^;]*;)+/, "")
    .replace(/(?:&[^;]*;)+$/, "")
    .split(/(?:&[^;]*;)+/)
);
// Hello,everybody,there

这篇关于通过HTML实体拆分字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆