网页中的关键字 [英] Keywords from a Web Page

查看：78 发布时间：2019/6/21 15:26:03 C# ASP.NET

本文介绍了网页中的关键字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何使用C#从网页生成关键字和其计数.

我已经使用HTMLAgilityPack将网页放入字符串中，然后将它们转换为单词，并转换为arraylist.

但是现在，通过删除重复项来过滤关键字，以将其计数添加到一边.

我的代码:

How can i generate keywords and thier count from a webpage using C#.

I have got the web page into string using HTMLAgilityPack and then converted them into words into a arraylist.

But now filter the keyword as adding their counts on the side by removing the duplicate.

My Code:

//Uses HtmlAgilityPack
var webGet = new HtmlWeb();
var doc = webGet.Load(url);

HtmlNode bodyContent = doc.DocumentNode.SelectSingleNode("/html/body");

            if (bodyContent != null)
            {
                pmd.Html = stripHtml(bodyContent.InnerHtml.ToString());                
            }  

string wordsOnly = pmd.Html;

string[] arrayWordsOnly = wordsOnly.Split('' '');                    
                    char[] spChar = new char[] { ''?'', ''\"'', '','', ''\'''', '';'', '':'', ''.'', ''('', '')'', ''!'' };

foreach (string word in arrayWordsOnly)
{
   key = word.Trim(spChar).ToLower();                           
}

protected string stripHtml(string strHtml)
        {
            //Strips the HTML tags from strHTML
            Regex objRegExp = new Regex("&lt;(.|\n)+?&gt;");
            string strOutput;
            //Replace all HTML tag matches with the empty string
            strOutput = objRegExp.Replace(strHtml, "");
            strOutput = strOutput.Replace("&lt;", "&amp;lt;");
            strOutput = strOutput.Replace("&gt;", "&amp;gt;");
            objRegExp = null;
            return strOutput;
        }

推荐答案

首先，您谈论使用ArrayList，因此不再建议这样做.您可能应该使用List<string>( MSDN页面 [ ^ ]).

接受此操作后，应执行以下操作:

Firstly, you talk about using an ArrayList, this is no longer recommended. You should probably use a List<string> (MSDN page[^]).

Accepting that you do this, something like the following should do the trick:

List<string> uniqueWords = new List<string>();
foreach (string word in arrayWordsOnly)
{
   key = word.Trim(spChar).ToLower();
   if (!uniqueWords.Contains(key))
   {
      uniqueWords.Add(key);
   }
}

如果确定要使用ArrayList，则只需将每次出现的List<string>替换为ArrayList

If you are determined to use ArrayList then simply replace each occurrence of List<string> with ArrayList

这篇关于网页中的关键字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网页中的关键字 [英] Keywords from a Web Page

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

网页中的关键字 [英] Keywords from a Web Page

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭