你如何将HTML转换为纯文本? [英] How do you convert Html to plain text?

查看:740
本文介绍了你如何将HTML转换为纯文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有存储在表中的HTML片段。的不是整个页面,没有任何标记或类似,只是基本的格式。

I have snippets of Html stored in a table. Not entire pages, no tags or the like, just basic formatting.

我想是能够显示HTML作为纯文本,的没有格式的,一个给定的页面上。(实际上只是第30 - 50个字符,但是这是很容易位)

I would like to be able to display that Html as text only, no formatting, on a given page (actually just the first 30 - 50 characters but that's the easy bit).

我如何使用HTML中的文本转换成字符串连胜文?

How do I place the "text" within that Html into a string as straight text?

所以这块code的。

<b>Hello World.</b><br/><p><i>Is there anyone out there?</i><p>

变成了:

您好世界。有没有人在那里?

Hello World. Is there anyone out there?

推荐答案

如果你在谈论的标签剥离,这是比较简单的,如果你不担心这样的事情&LT;脚本&GT ; 标记。如果你需要做的就是显示没有标记文本可以完成与常规的前pression:

If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like <script> tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:

<[^>]*>

如果你担心&LT;脚本&GT; 标签之类的,那么你就需要一些更强大的然后定期EX pressions因为你需要跟踪状态,omething更像是一个上下文无关文法(CFG)。 Althought你也许能完成它左到右或非贪婪匹配。

If you do have to worry about <script> tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.

如果您可以使用普通的前pressions有很多网页那里有良好的信息:

If you can use regular expressions there are many web pages out there with good info:

如果你需要一个CFG我会建议使用第三方工具的更复杂的行为,不幸的是我不知道一个好建议的。

If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.

这篇关于你如何将HTML转换为纯文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆