从网页中提取数据,解析它的某些片段,并显示它 [英] Pulling data from a webpage, parsing it for specific pieces, and displaying it

查看:205
本文介绍了从网页中提取数据,解析它的某些片段,并显示它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用这个网站很长一段时间来找到答案我的问题,但我不能够找到这个问题的答案。

I've been using this site for a long time to find answers to my questions, but I wasn't able to find the answer on this one.

我对一类项目一小群工作。我们建立一个小型的游戏交易的网站,允许人们进行登记,把在一场比赛中他们有他们想要交易,并接受他人交易或要求交易。

I am working with a small group on a class project. We're to build a small "game trading" website that allows people to register, put in a game they have they want to trade, and accept trades from others or request a trade.

我们在网站运行长提前,所以我们正在努力添加更多的网站。有一件事我想做我自己就是链接放入到Metacritic的游戏。

We have the site functioning long ahead of schedule so we're trying to add more to the site. One thing I want to do myself is to link the games that are put in to Metacritic.

这就是我需要做的。我需要(用在Visual Studio 2012 ASP和C#)获得在Metacritic正确的游戏页面,拉它的数据,分析它的特定部位,然后显示我们的页面上的数据。

Here's what I need to do. I need to (using asp and c# in visual studio 2012) get the correct game page on metacritic, pull its data, parse it for specific parts, and then display the data on our page.

从本质上讲,当你选择你要换取我们希望有一个小格与游戏的信息和评价显示游戏。我想要做这种方式来学习更多,得到的东西这个项目我没有下手的。

Essentially when you choose a game you want to trade for we want a small div to display with the game's information and rating. I'm wanting to do it this way to learn more and get something out of this project I didn't have to start with.

我想知道是否有人能告诉我从哪里开始。我不知道如何从一个页面中提取数据。我还在试图找出如果我需要尝试写一些东西,以自动搜索本场比赛的冠军,并找到页面无论如何,如果我能找到一些方式来直去游戏的页面。一旦我已经得到的数据,我不知道如何把我从它所需的特定信息。

I was wondering if anyone could tell me where to start. I don't know how to pull data from a page. I'm still trying to figure out if I need to try and write something to automatically search for the game's title and find the page that way or if I can find some way to go straight to the game's page. And once I've gotten the data, I don't know how to pull the specific information I need from it.

的事情之一,这并不使这个简单的是我学习C ++与C#和ASP所以我不断收到我的电线穿过一起。如果有人能在正确的方向指向我,这将是一个很大的帮助。谢谢

One of the things that doesn't make this easy is that I'm learning c++ along with c# and asp so I keep getting my wires crossed. If someone could point me in the right direction it would be a big help. Thanks

推荐答案

这个小例子使用 HtmlAgilityPack ,并使用的XPath 选择去所需的元素。

This small example uses HtmlAgilityPack, and using XPath selectors to get to the desired elements.

protected void Page_Load(object sender, EventArgs e)
{
    string Url = "http://www.metacritic.com/game/pc/halo-spartan-assault";
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load(Url);

    string metascore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
    string userscore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
    string summary = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;
}

一个简单的方法来获得的XPath 给定元素是使用Web浏览器(我使用Chrome)开发工具:

An easy way to obtain the XPath for a given element is by using your web browser (I use Chrome) Developer Tools:


  • 打开开发人员工具(<大骨节病> F12 或<大骨节病>控制 + <大骨节病>移 + <大骨节病> C 在Windows或<大骨节病>命令 + <大骨节病>移 + <大骨节病> C 适用于Mac)。

  • 选择在您要为XPath的页面元素。

  • 右键单击该元素在元素选项卡。

  • 单击复制为XPath的。

  • Open the Developer Tools (F12 or Ctrl + Shift + C on Windows or Command + Shift + C for Mac).
  • Select the element in the page that you want the XPath for.
  • Right click the element in the "Elements" tab.
  • Click on "Copy as XPath".

您可以将其粘贴完全一样,在C#(如在我的code),但一定要逃引号。

You can paste it exactly like that in c# (as shown in my code), but make sure to escape the quotes.

您必须确保你使用一些错误处理技术,因为如果他们改变页面的HTML格式的网页刮痧会导致错误。

You have to make sure you use some error handling techniques because Web Scrapping can cause errors if they change the HTML formatting of the page.

这篇关于从网页中提取数据,解析它的某些片段,并显示它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆