如何只读取文本并以字符串形式获取该文本? [英] how to read only the text and get that text in string?
本文介绍了如何只读取文本并以字符串形式获取该文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<style type="text/css">
<!--
.style3 {font-family: "Trebuchet MS"; font-size: 30px; font-weight: bold; color: #333333; }
body {
background-image: url();
}
.style18 {font-family: "Trebuchet MS"; font-size: 30px; font-weight: bold; color: #F9F9F8; }
.style8 {color: #FFFFFF}
-->
</style>
</head>
<body>
<table width="99%" cellspacing="0" cellpadding="0">
<tr>
<td height="20" valign="top" bgcolor="#719315"> </td>
</tr>
<tr>
<td height="151" valign="top" bgcolor="#719315"><table width="90%" align="center" cellpadding="0" cellspacing="0">
<tr>
<td height="69" bgcolor="#587410"><table width="97%" border="0" align="right" cellpadding="0" cellspacing="0">
<tr>
<td valign="top"><div align="left"><span class="style18">A building project</span></div></td>
</tr>
</table></td>
</tr>
<tr>
<td height="5" bgcolor="#FFFFFF"></td>
</tr>
<tr>
<td bgcolor="#F5EDE3"><table width="100%" align="center" cellpadding="0" cellspacing="0" style="border-collapse:collapse">
<tr>
<td bgcolor="#B9D276"><table width="96%" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td height="29"><span class="style8"></span></td>
</tr>
<tr>
<td height="25"><table width="98%" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td colspan="3"><p class="Style2" style="line-height: 131%; margin-right: 74.95pt"> <span style="line-height: 131%; font-family: Book Antiqua; letter-spacing: -.2pt; font-weight: 700"> <font color="#800000"> <br>
</font> <font color="#000000"> </font><font color="#800000"> </font> </span><b><font color="#800000"> <span style="font-family: Book Antiqua; letter-spacing: -.2pt"> <br>
</span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> Few people are committed to a </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> building </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> project. <br>
They are </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> discussing </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> about ways </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.1pt"> and </span> <span style="line-height: 133%; font-family: Book Antiqua; letter-spacing: -.2pt"> means to fund it.</span></font></b></p>
<p class="Style2" style="line-height: 131%; margin-right: 74.95pt"> <font color="#000000"> <span style="font-family: Book Antiqua; letter-spacing: .3pt"> Member </span> <span style="font-family: Book Antiqua; letter-spacing: 1.05pt"> 1:</span><span style="font-family: Book Antiqua; letter-spacing: .3pt"> Well, girls there aren't many ways open to us to raise money</span><span style="line-height: 150%; font-family: Book Antiqua; letter-spacing: .1nbsp; Wonderful, if all of us work together, surely we will complete this project soon. Yea.</font></span></span></p>
<p></td>
</tr>
</table></td>
</tr>
<tr>
<td height="10"> </td>
</tr>
</table></td>
</tr>
</table></td>
</tr>
</table></td>
</tr>
<tr>
<td valign="top" bgcolor="#719315"> </td>
</tr>
</table>
</body>
</html>
我有HTML格式的文档,但是我只需要阅读文本,而不是文档中使用的标签.
是否有可能................ ???????????????
am having the document in HTML format, but i need to read only the text not the tags which are used in the document.
is it possible..........?????????????????
推荐答案
阅读以下内容:
http://stackoverflow.com/questions/2113651/how-to-从目标文件中提取文本 [ ^ ]
http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c [ http://htmlagilitypack.codeplex.com/ [ ^ ]
Read the following :
http://stackoverflow.com/questions/2113651/how-to-extract-text-from-resonably-sane-html[^]
http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c[^]
http://htmlagilitypack.codeplex.com/[^]
使用C#代码
string htmlContent = System.IO.File.ReadAllText("Url of your html file");
lblOnlyText.Text = System.Text.RegularExpressions.Regex.Replace(htmlContent, "<[^>]*>", "");
使用VB.Net代码
Using VB.Net Code
Dim htmlContent As String = System.IO.File.ReadAllText("Url of your html file")
lblOnlyText.Text = System.Text.RegularExpressions.Regex.Replace(htmlContent, "<[^>]*>", "")
使用Jquery获取文本内容.您所有需要的内容都在标签中.
Use Jquery for getting your text content. All your needed content is in tag right.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.min.js"></script>
<script type="text/javascript">
这篇关于如何只读取文本并以字符串形式获取该文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文