如何从单词内容中删除html标签? [英] How to remove html tags from word content?

查看:88
本文介绍了如何从单词内容中删除html标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有一些关于线程的内容,简单地说就是使用

I know there are a couple threads about it which says simply using

Regex.Replace(input, "<.*?>", String.Empty);

但是我不能在用word doc编写的文本中使用它. 我的代码就像:

but I cant use it in text written in word doc. my code is like:

Microsoft.Office.Interop.Word.Document wBelge = oWord.Documents.Add(ref oMissing,
    ref oMissing, ref oMissing, ref oMissing);
Microsoft.Office.Interop.Word.Paragraph paragraf2;
paragraf2 = wBelge.Paragraphs.Add(ref oMissing);
paragraf2.Range.Text ="some long text";

我可以通过查找和替换来改变

I can change with finding and replacing like

Word.Find findObject = oWord.Selection.Find;
findObject.ClearFormatting();
findObject.Text = "<strong>";
findObject.Replacement.Text = "";
findObject.Replacement.ClearFormatting();               

object replaceAllc = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAllc, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

我需要对每个html标签执行此操作吗?

Do I need to do this for every html tag?

推荐答案

在注释中提供了一些帮助之后,我实现了以下可行的解决方案

With some help provided in the comments, i realized the following working solution

findObject.ClearFormatting();
findObject.Text = @"\<*\>";
findObject.MatchWildcards=true;                     
findObject.Replacement.ClearFormatting();
findObject.Replacement.Text = "";                       

object replaceAll = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAll, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

使用搜索模式\<*\>

(包含通配符*,因此 findObject.MatchWildcards 必须设置为 true ).

which is using the search pattern \<*\> (containing the wildcard character *, hence findObject.MatchWildcards must be set to true).

这篇关于如何从单词内容中删除html标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆