如何获得每个< td>来自网页。 [英] How to get every <td> from a webpage.

查看:91
本文介绍了如何获得每个< td>来自网页。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列网页,我需要梳理一下才能找到价值,并且不知道从哪里开始我对这一切都很新:)



I have a collection of webpages that i need to comb through to find values and have no idea where to start am pretty new to all this :)

<td><button value="Right" action="Guard" width="80" height="20"></button></td>





从里面我需要从Button值和Action中提取值



我的方式尝试下面的工作确定1个网页上的1个表格,其他网页的结构不同:S



建设性的批评是受欢迎的:)



我尝试了什么:





From inside this i need to extract the values from Button value and Action

the way i tried below works Ok for 1 table on 1 webpage, the other webpages are structure differently :S

constructive criticism is welcome :)

What I have tried:

int _Counter1 = webBrowser1.Document.GetElementsByTagName("table")[14].GetElementsByTagName("td").Count;
            if (_Counter1 > 0)
            {
                for (int index1 = 0; index1 < _Counter1 - 1; index1++)
                {
                    try
                    {
                        string one = (webBrowser1.Document.GetElementsByTagName("table")[14].GetElementsByTagName("td")[1 + index1].InnerHtml);
                        string two = one.Split('"', '"')[9];
                        string three = one.Split('"', '"')[11];

                        if( one.Contains("Right") && one.Contains("Guard"))
                        {
                            richTextBox1.AppendText(three + " " + two + Environment.NewLine);
                        }
                                               
                    }
                    catch { }
                }
            }

推荐答案

首先将所有网页存储在一个数组中,重定向到每个页面并收集所需的信息。在visual studio中,转到Nuget包管理器并添加HTML agility包。以下是供您参考的链接

在ASP.NET中使用HtmlAgilityPack(HAP)刮取HTML DOM元素 [ ^ ]

HTML Agility Pack入门 [ ^ ]
First store all the web pages in an array, redirect to every page and collect the information you require. In visual studio go to Nuget package manager and add the HTML agility package. Below are the links for your reference
Scraping HTML DOM elements using HtmlAgilityPack (HAP) in ASP.NET[^]
Getting Started With HTML Agility Pack[^]


尝试正则表达式:

Try a Regex:
(?<=<td><button value=")(?<Value>.*?)" action="(?<Action>.*?)(?=".*?></button></td>)

这会给你两组:价值和行动包含信息。

That will give you two groups: "Value" and "Action" containing the info.


作为我的幸运的是,我在问了几分钟后解决了这个问题:S $ / b


As my luck goes i solved it a few mins after asking :S

int _Counter1 = webBrowser1.Document.GetElementsByTagName("button").Count;
            if (_Counter1 > 0)
            {
                for (int index1 = 0; index1 < _Counter1 - 1; index1++)
                {
                    try
                    {
                        string one = webBrowser1.Document.GetElementsByTagName("button")[1 + index1].OuterHtml;
                        string two = one.Split('"', '"')[9];
                        string three = one.Split('"', '"')[11];
                        if (three.Length != 0)
                        {
                            richTextBox1.AppendText(three + " " + two + Environment.NewLine);
                            
                        }
                        //MessageBox.Show(one);
                    }
                    catch { }
                }
            }


这篇关于如何获得每个&lt; td&gt;来自网页。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆