你如何将HTML转换为纯文本? [英] How do you convert Html to plain text?
问题描述
我有存储在表中的HTML片段。的不是整个页面,没有任何标记或类似,只是基本的格式。的
I have snippets of Html stored in a table. Not entire pages, no tags or the like, just basic formatting.
我想是能够显示HTML作为纯文本,的没有格式的,一个给定的页面上。(实际上只是第30 - 50个字符,但是这是很容易位)
I would like to be able to display that Html as text only, no formatting, on a given page (actually just the first 30 - 50 characters but that's the easy bit).
我如何使用HTML中的文本转换成字符串连胜文?
How do I place the "text" within that Html into a string as straight text?
所以这块code的。
<b>Hello World.</b><br/><p><i>Is there anyone out there?</i><p>
变成了:
您好世界。有没有人在那里?
Hello World. Is there anyone out there?
推荐答案
如果你在谈论的标签剥离,这是比较简单的,如果你不担心这样的事情&LT;脚本&GT ;
标记。如果你需要做的就是显示没有标记文本可以完成与常规的前pression:
If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like <script>
tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:
<[^>]*>
如果你担心&LT;脚本&GT;
标签之类的,那么你就需要一些更强大的然后定期EX pressions因为你需要跟踪状态,omething更像是一个上下文无关文法(CFG)。 Althought你也许能完成它左到右或非贪婪匹配。
If you do have to worry about <script>
tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.
如果您可以使用普通的前pressions有很多网页那里有良好的信息:
If you can use regular expressions there are many web pages out there with good info:
- http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx
- http://www.google.com/search?hl=en&q=html+tag+stripping+&btnG=Search
如果你需要一个CFG我会建议使用第三方工具的更复杂的行为,不幸的是我不知道一个好建议的。
If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.
这篇关于你如何将HTML转换为纯文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!