如何在asp.net中提取html [英] how to extract html in asp.net

查看:66
本文介绍了如何在asp.net中提取html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我想从这个链接中提取所有类别和子类别。有人请告诉我怎么做,请给我提示

http:// www.codeproject.com/script/Content/SiteMap.aspx [ ^ ]



我正在练习webclient,请只为我提供注册。应收集所有类别和子类别





以下是我到目前为止所编写的代码:我希望ly regex传递给它收集所有数据





 public ExtractHtml(String url)
{
client = new WebClient();
strm = client.OpenRead(url);
StreamReader strrdr = new StreamReader(strm,Encoding.ASCII);
code = strrdr.ReadToEnd();

}
public List < string > Extract(String regex)
{
lines = new List < string > ();
Regex rgx =新的Regex(正则表达式,RegexOptions.IgnoreCase);
MatchCollection cl = rgx.Matches(code);
foreach(cl中的匹配项)
{
lines.Add(item.Value);
}
返回行;
}





i希望从中提取文字

< a id =ctl00_MC_TCRp_ctl00_TCNLhref =/ Chapters / 1 / Desktop-Development.aspx>桌面开发< / a> 







< li>

< a id =ctl00_MC_TCRp_ctl00_TSRp_ctl01_TSNLhref =/ KB / buttons />按钮控件< / a>



< / li>



< li>

< a id =ctl00_MC_TCRp_ctl00_TSRp_ctl02_TSNLhref =/ KB / clipboard />剪贴板< / a>



< / li>



< li>

< a id =ctl00_MC_TCRp_ctl00_TSRp_ctl03_TSNLhref =/ KB / combobox /> Combo&列表框< / a>



< / li>



< li>

< a id =ctl00_MC_TCRp_ctl00_TSRp_ctl04_TSNLhref =/ KB / dialog />对话框和Windows< ; / a>



< / li>



< li>

< a id =ctl00_MC_TCRp_ctl00_TSRp_ctl05_TSNLhref =/ KB / gadgets />桌面小工具< / a>



< / li>

解决方案

如何构建正则表达式。 此处 [ ^ ]是一个很棒的在线网站,允许您通过反复试验来构建正则表达式。既然你要求我们为你做这件事,这就是我要给你的全部,但是你应该继续寻求解决方案。欢呼声。

hi ,
i want to extract all the categories and sub categories from this link. anyone please tell me how i can do so , please give me hint
http://www.codeproject.com/script/Content/SiteMap.aspx[^]

im practising webclient , PLEASE ONLY GIVE ME REGEX FOR IT. all categories and subcategories should be gathered


following is the code i have wrote so far :i want on ly regex to pass on it to gather all data


public ExtractHtml(String url)
        {
            client = new WebClient();
            strm = client.OpenRead(url);
            StreamReader strrdr = new StreamReader(strm,Encoding.ASCII);
            code = strrdr.ReadToEnd();

        }
        public List<string> Extract(String regex)
        {
            lines = new List<string>();
            Regex rgx = new Regex(regex, RegexOptions.IgnoreCase);
            MatchCollection cl =  rgx.Matches(code);
            foreach (Match item in cl)
            {
                lines.Add(item.Value);
            }
            return lines;
        }



i want to extract text from this

<a id="ctl00_MC_TCRp_ctl00_TCNL" href="/Chapters/1/Desktop-Development.aspx">Desktop Development</a>




<li>
<a id="ctl00_MC_TCRp_ctl00_TSRp_ctl01_TSNL" href="/KB/buttons/">Button Controls</a>

</li>

<li>
<a id="ctl00_MC_TCRp_ctl00_TSRp_ctl02_TSNL" href="/KB/clipboard/">Clipboard</a>

</li>

<li>
<a id="ctl00_MC_TCRp_ctl00_TSRp_ctl03_TSNL" href="/KB/combobox/">Combo & List Boxes</a>

</li>

<li>
<a id="ctl00_MC_TCRp_ctl00_TSRp_ctl04_TSNL" href="/KB/dialog/">Dialogs and Windows</a>

</li>

<li>
<a id="ctl00_MC_TCRp_ctl00_TSRp_ctl05_TSNL" href="/KB/gadgets/">Desktop Gadgets</a>

</li>

解决方案

How about you build your regex. Here[^] is a great online site that will allow you to build your regex through trial and error. Since you are demanding that we do it for you, this is all I''m going to give you, but it should be enough for you to move on in your quest for a solution. Cheers.


这篇关于如何在asp.net中提取html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆