当关键字匹配某些条件时在文本中查找关键字 - C# [英] Find keyword in text when keyword match certain conditions - C#

查看:122
本文介绍了当关键字匹配某些条件时在文本中查找关键字 - C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个很好的方法来完成以下工作:



我有一篇文章,其中包含HTML标签,例如锚和段落等。
我也有我需要在文章中找到的关键字,并将其设置为锚(我有一些网址设置在那里)。

如果文章中存在关键字它应该匹配下面的两个条件之前,使它成为一个锚点


  1. 它不能在任何标签内。
    例如,类似于

     < img alt =keyword> 

    不会有效/匹配。



  2. 关键字不能在锚内。例如,像

     < a>关键字< / a> 

    不会有效/匹配。




    任何帮助将不胜感激。
    Thanks $ / $>


解决方案

我已经设法完成了!


非常感谢这篇文章,它帮助我了解了xpath表达式:
http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/beae72d6- 844f-4a9b-ad56-82869d685037 /

我的任务是在我的数据库中使用关键字和URL的表格添加X关键字。

一旦匹配关键字 - 它不会再次搜索它,但会尝试查找文本中的下一个关键字。

'关键字'可能由多个单词组成。这就是为什么我添加了Replace(,\s +)。

另外,我必须优先选择最长的关键字。这就是说,如果我有:

美好的一天和好的作为两个不同的关键字 - 美好的一天总能赢。



是我的解决方案:

pre $ static public String AddLinksToArticle(string article,int linksToAdd)
{
try
{
//载入关键字和网址
var dt = new DAL()。GetArticleLinks();

//对它进行排序
IEnumerable< ArticlesRow> sortedArticles = dt.OrderBy(row => row.keyword,new StringLengthComparer());

//迭代字典以获取关键字来替换锚点
foreach(sortArticles中的var项目)
{
article = FindAndReplaceKeywordWithAnchor(article,item.keyword, item.url,ref linksToAdd);
if(linksToAdd == 0)
{
break;
}
}

返回文章;
}
catch(Exception ex)
{
Utils.LogErrorAdmin(ex);
返回null;



private static string FindAndReplaceKeywordWithAnchor(string article,string keyword,string url,ref int linksToAdd)
{
//将文本转换为html
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(article);

// \w * - 表示它可以以任何字母数字字符串开头
// \ s + - 用于替换所有空格(当有多个单词时)。
// \b - 为关键字
设置bounderies string pattern = @\b+ keyword.Trim()。Insert(0,\\w *)。Replace (,\\s +)+ @\b;

//获取除锚点元素之外的所有元素文本propery
var nodes = doc.DocumentNode.SelectNodes(// text()[not(ancestor :: a)])? ?新的HtmlAgilityPack.HtmlNodeCollection(null);
foreach(节点中的var节点)
{
if(node.InnerHtml.Contains(keyword))
{
Regex regex = new Regex(pattern);
node.InnerHtml = regex.Replace(node.InnerHtml,< a href = \+ url +\>+ keyword +< / a>,1); //仅匹配第一个匹配项
linksToAdd--;
休息;
}
}

return doc.DocumentNode.OuterHtml;
}
}

public class StringLengthComparer:IComparer< string>
{
public int Compare(string x,string y)
{
return y.Length.CompareTo(x.Length);
}
}

希望它能帮助未来的某个人。


I'm looking for a nice way to do the following:

I have an article which has HTML tags in it like anchors and paragraphs and so on.
I also have keyword which i need to find in the article and set it as anchor (I have some url to set there).
If the keyword does exist in the article it should then match the following TWO conditions BEFORE making it an anchor:

  1. It can not be inside any tag. For example, something like

    <img alt="keyword"> 
    

    will not be valid/matched.

  2. The keyword can't already be inside anchor. For example, somthing like

    <a>keyword</a>
    

    will not be valid/matched.


    Any help would be appreciated. Thanks

解决方案

I have managed to get it done!

Very much thanks to this post which helped me a lot with the xpath expression: http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/beae72d6-844f-4a9b-ad56-82869d685037/

My task was to add X keywords to the article using table of keywords and urls on my database.
Once keyword was matched - it won't search for it again, but will try to find the next keyword in the text.
The 'keyword' could have been made of more than one word. That's why i added the Replace(" ", "\s+").
Also, i had to give precedence to the longest keywords first. That is if i had:
"good day" and "good" as two different keywords - "good day" always wins.

This is my solution:

static public string AddLinksToArticle(string article, int linksToAdd)
    {
        try
        {
            //load keywords and urls
            var dt = new DAL().GetArticleLinks();

            //sort the it
            IEnumerable<ArticlesRow> sortedArticles = dt.OrderBy(row => row.keyword, new StringLengthComparer());

            // iterate the dictionary to get keyword to replace with anchor
            foreach (var item in sortedArticles)
            {
                article = FindAndReplaceKeywordWithAnchor(article, item.keyword, item.url, ref linksToAdd);
                if (linksToAdd == 0)
                {
                    break;
                }
            }

            return article;
        }
        catch (Exception ex)
        {
            Utils.LogErrorAdmin(ex);
            return null;
        }
    }

    private static string FindAndReplaceKeywordWithAnchor(string article, string keyword, string url, ref int linksToAdd)
    {
        //convert text to html
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(article);

        // \w* - means it can start with any alphanumeric charactar
        // \s+ - was placed to replace all white spaces (when there is more than one word).
        // \b - set bounderies for the keyword
        string pattern = @"\b" + keyword.Trim().Insert(0, "\\w*").Replace(" ", "\\s+") + @"\b";

        //get all elements text propery except for anchor element 
        var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlAgilityPack.HtmlNodeCollection(null);
        foreach (var node in nodes)
        {
            if (node.InnerHtml.Contains(keyword))
            {
                Regex regex = new Regex(pattern);
                node.InnerHtml = regex.Replace(node.InnerHtml, "<a href=\"" + url + "\">" + keyword + "</a>", 1);//match only first occurrence
                linksToAdd--;
                break;
            }
        }

        return doc.DocumentNode.OuterHtml;
    }
}

public class StringLengthComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        return y.Length.CompareTo(x.Length);
    }
}

Hope it will help someone in the future.

这篇关于当关键字匹配某些条件时在文本中查找关键字 - C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆