通过剥离html显示字符串的前n个字符 [英] Display first n characters of a string by stripping html

查看:377
本文介绍了通过剥离html显示字符串的前n个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个长字符串,我想显示它的前50个字符(不包括HTML内容)。有人可以建议任何方法吗?



一些示例HTML代码:



 <  !DOCTYPE     html     PUBLIC     -  // W3C // DTD     XHTML     1.1 // EN   http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd >  
< html > ;
< head >
< title > Paula - Microsoft Word - 不同图像压缩算法的比较.doc < / title >
< title > < / title > < link href = / DigitalLibrary / extData.aspx?filePath = stylesheet.css& epub = b3aab940-fb48-4f6c-ae63-d599f4893795_aguilera_rpt.epub < /跨度> < span class =code-attribute> type = text / css rel = 样式表 / >
< / head >
< 正文 >

< div class = body >
< div id = frontmatter >
< < span class =code-leadattribute> div id = titlepage >
< / div >
< / div >
< span class =code-keyword>< / div >


< a id = 1 > < / a > < p > < pre >

不同图片

压缩格式的比较

 

 

解决方案

jQuery 非常强大,可以提取 HTML 文档的内容。



但是,如果您不能使用 jQuery ,则可以使用 Regex 类来提取<$之间的内容C $ C><标题>和< / title> ,这是问题中提到的,如下所示:

 字符串 htmlText =  @ <!DOCTYPE html PUBLIC -  // W3C // DTD XHTML 1.1 // ENhttp://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"\"> 
< html>
< head>
< title> Paula - Microsoft Word - 不同图像压缩算法的比较.d< / title>
< title>< / title>< link href =/ DigitalLibrary / extData。 aspx?filePath = stylesheet.css& epub = b3aab940-fb48-4f6c-ae63-d599f4893795_aguilera_rpt.epubtype =text / css =stylesheet/>
< / head> ;
< body>
< div class =body>
< div id =frontmatter>
< div id = titlepage>
< / DIV>
< / div>
< / div>
< a id =1>
;

匹配匹配= Regex.Match(htmlText, @ < title>([^<>] *)< / title>
RegexOptions.CultureInvariant | RegexOptions。 IgnoreCase);

if (match.Success&& match.Groups.Count > ; 1
Console.WriteLine(match.Groups [ 1 ]。值);

// 输出
// Paula - Microsoft Word - 不同图像压缩算法的比较.doc


请参考以下链接获取html标签。



for C#:



将HTML转换为纯文本 [ ^ ]



HTML Tag Stripper [ ^ ]



for SQL:

MS SQL功能 [ ^ ]


使用jquery

使用


I have a long string and I want to display the first 50 characters of it (without including the HTML content). Can anyone suggest any method?

Some sample HTML code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
			<html>
			   <head>
				  <title>Paula - Microsoft Word - Comparison of the different image compression algorithms.doc</title>
				  <title></title><link href="/DigitalLibrary/extData.aspx?filePath=stylesheet.css&epub=b3aab940-fb48-4f6c-ae63-d599f4893795_aguilera_rpt.epub" type="text/css" rel="stylesheet"/>
			   </head>
			   <body>
				  
      <div class="body">
         <div id="frontmatter">
            <div id="titlepage">
            </div>    
         </div>
      </div>
   

<a id="1"></a><p><pre> 

Comparison of different image

compression formats 

解决方案

jQuery is much powerful to extract the content of HTML document.

However, if you can't use jQuery then the Regex class can be used to extract the content between <title> and </title>, which is required as mentioned in the question, as shown below:

string htmlText = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN"" ""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">
            <html>
               <head>
                  <title>Paula - Microsoft Word - Comparison of the different image compression algorithms.doc</title>
                  <title></title><link href=""/DigitalLibrary/extData.aspx?filePath=stylesheet.css&epub=b3aab940-fb48-4f6c-ae63-d599f4893795_aguilera_rpt.epub"" type=""text/css"" rel=""stylesheet""/>
               </head>
               <body>
                <div class=""body"">
                    <div id=""frontmatter"">
                        <div id=""titlepage"">
                        </div>
                    </div>
                </div>
            <a id=""1"">";

    Match match = Regex.Match(htmlText,@"<title>([^<>]*)</title>",
                RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);

    if (match.Success && match.Groups.Count > 1)
        Console.WriteLine(match.Groups[1].Value);

//Output
//Paula - Microsoft Word - Comparison of the different image compression algorithms.doc


please refer below link for html tag stripping.

for C# :

Convert HTML to Plain Text[^]

HTML Tag Stripper[^]

for SQL :
MS SQL Function[^]


use jquery
use


这篇关于通过剥离html显示字符串的前n个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆