将字符串标记或拆分为文本 &Html 标签项 [英] Tokenize or Split String Into Text & Html Tag Items

查看:39
本文介绍了将字符串标记或拆分为文本 &Html 标签项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种最有效的方式来接受一个字符串并将其标记为一个数组,以将任何 HTML 标记组分开.

I am looking for the most efficient way to accept a string and token ize it into an array separating out any HTML tag groups.

Example Input (String): 
    "I can format my text so that <strong>This is bold</strong> and this is not."

Desired Output (String[] array): 
    "I can format my text so that",
    "<strong>",
    "This is bold",
    "</strong>",
    "and this is not."

Alternate Output Just As Good(String[] array): 
    "I",
    "can",
    "format",
    "my",
    "text",
    "so",
    "that",
    "<strong>",
    "This",
    "is",
    "bold",
    "</strong>",
    "and",
    "this",
    "is",
    "not."

我不确定解决此问题的最佳方法.任何帮助将不胜感激.

I am unsure as to the best way to approach this problem. Any help would be appreciated.

推荐答案

您可以使用带有一组零长度断言的 Regex.Split() 来分割位置,然后是 < 或前面有 >:

You can use Regex.Split() with a set of zero-length assertions to split in places followed by < or preceded by >:

string input = "I can format my text so that <strong>This is bold</strong> and this is not.";
string[] output = Regex.Split(input, "(?=<)|(?<=>)");

(?=pattern) 被称为先行断言,确保遵循 pattern.
(?<=pattern) 是一个后视断言,相同的概念,但在位置之前查看字符

(?=pattern) is known as a look-ahead assertion, ensuring that pattern follows.
(?<=pattern) is a look-behind assertion, same concept but looking at characters before the position

这篇关于将字符串标记或拆分为文本 &amp;Html 标签项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆