Xpath表也随组合框更改而改变 [英] Xpath table changes as combobox changes too

查看:219
本文介绍了Xpath表也随组合框更改而改变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用c#中的应用程序访问网站,并从表中获取一些内容。它工作正常,但这里的问题:我得到的更改内容的表,因为我在组合框中选择不同的值。我使用的Xpath总是获得首先显示在网站上的表,我不知道如何得到其他的。我在这里发布我认为对你们帮助我有用的一切。

I'm working on an application in c# that goes to a website and gets some content out of a table. It's working fine, but here is the problem: the table that I'm getting the content of changes as I select a different value in a combobox. The Xpath that I use always gets the table that is first shown on the website and I don't know how to get the other ones. I'm posting here everything I think is useful for you guys to help me.

网页是:
http://br.soccerway.com/national/brazil/serie-a/2012/regular-season/

xpath / c#code:

xpath/c# code:

HtmlNodeCollection no2 = doc.DocumentNode
   .SelectNodes("//*[@id='page_competition_1_block_competition_matches_summary_6']/div[2]/table/tbody/tr/td[@class='team team-a ' or @class='date no-repetition' or @class='score-time score' or     @class='team team-b ']");

在网站上,您必须点击Por semana de jogo

On the website, you have to click on the "Por semana de jogo" option, right above the scores, for the combobox to be visible.

我需要获取所有表格的所有分数,而不仅仅是显示的分数。

I need to get all the scores from all the tables, not just the one that appears.

推荐答案

所以当你从下拉列表中选择一个游戏周(或点击下拉菜单上的前面或页面调用服务器以获取所选游戏周的数据。它只是通过GET发送一个URL到服务器。

So when you select a game week from the drop down (or click the "anterior" or "proximo" links above the drop down), the JavaScript in the page makes a call to the server to get the data for the selected game week. It just sends a URL to the server via GET.

数据以JSON对象的形式返回,在这个对象内是表HTML。这个HTML被加载到正确的地方的DOM和presto,浏览器显示该周的数据。

The data is returned in the form of a JSON object, and inside this object is the table HTML. This HTML is loaded into the DOM in the right place and presto, the browser displays the data for that week.

这是一个工作,以获得这个程序化,但它可以做到。你可以做的是确定每周的URL是什么。希望大多数查询字符串是常量,除了有问题的一周。因此,您将有一个样板URL,您可以在所需的周内进行调整,并将其发送到服务器。你得到JSON并解析出表HTML。

It is a bit of work to get this programmatically, but it can be done. What you can do is determine what the URL is for each week. Hopefully, most of the query strings are constant except for the week in question. So you will have a boilerplate URL that you tweak for the week you want, and send it off to the server. You get the JSON back and parse out the table HTML. Then, you're golden: you just feed that HTML into the Agility Pack and work with it as usual.

我做了一些调查,并使用Chrome的开发工具,在网络标签中,我发现当我选择一个游戏周,发送到服务器的URL看起来像这样(这是第14周):

I did a little investigation, and using Chrome's Developer Tools, in the Network tab, I found that when I selected a game week, the URL that is sent off to the server looks like so (this is for week 14):

http ://br.soccerway.com/a/block_competition_matches_summary?block_id = page_competition_1_block_competition_matches_summary_6& callback_params =%7B%22page%22%3A%229%22%2C%22round_id%22%3A%2217449%22%2C%22outgroup%22% 3A%22%22%2C%22view%22%3A%221%22%7D& action = changePage& params =%7B%22page%22%3A13%7D

(注意,你也可以使用其他工具,例如FireFox或Fiddler中的Firebug获取URL)。

(Note that you can also use other tools, such as Firebug in FireFox or Fiddler to get the URL).

它看起来像(选定的星期-1)在params查询字符串中的end ...结束附近找到:...%3A13 ...。所以在第15周,你会使用...%3A14 ...。幸运的是,在不同星期的URL之间似乎只有一个区域的差异,它在callback_params查询字符串中。不幸的是,我无法弄清楚它如何连接到选定的一周,但希望你能。

By trying other weeks and comparing, it looks like the (selected week - 1) is found in near the end in the params query string: "...%3A13...". So for week 15 you'd use "...%3A14...". Fortunately it looks like there is only one more area of difference among the URLs for different weeks and it is in the callback_params query string. Unfortunately, I wasn't able to figure out how it connects to the selected week, but hopefully you can.

所以当你把这个URL输入到浏览器,你得到返回JSON块。如果您搜索< table和/ table>您将看到所需的HTML。在你的C#代码中,你可以使用一个简单的正则表达式来解析JSON字符串:

So when you feed that URL into your browser, you get back the JSON block. If you search for "<table" and "/table>" you'll see the HTML that you want. In your C# code, you can just use a simple regular expression to parse it out of the JSON string:

string json = "..." // load the JSON string here

RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
Regex regx = new Regex( "(?<theTable><table.*/table>)", options );

Match match = regx.Match( json );

if ( match.Success ) {
    string tableHtml = match.Groups["theTable"].Value;
}

将HTML字符串输入Agility Pack,

Feed the HTML string into the Agility Pack and you should be on your way.

这篇关于Xpath表也随组合框更改而改变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆