铁蟒蛇,美丽的汤,win32的应用 [英] Iron python, beautiful soup, win32 app

查看:236
本文介绍了铁蟒蛇,美丽的汤,win32的应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否美丽的汤工作,用铁的python? 如果是与铁Python版本? 有多容易分发使用铁python的.NET 2.0中的Windows桌面应用程序(主要是C#调用一些蟒蛇code解析HTML)?

Does beautiful soup work with iron python? If so with which version of iron python? How easy is it to distribute a windows desktop app on .net 2.0 using iron python (mostly c# calling some python code for parsing html)?

推荐答案

我在问自己同样的问题,并努力遵循的建议在这里和其他地方获得IronPython和BeautifulSoup与我现有的code,我决定很好地发挥后去寻找一个替代原生.NET解决方案。 BeautifulSoup是code一个美妙位和起初它看起来并不像有什么可比性提供.NET,但后来我发现的 HTML敏捷性包如果有什么我想我确实获得了一些维修过BeautifulSoup。这需要清洁或这些混沌的HTML,并产生从它优雅的XML DOM,可以通过的XPath查询。随着$ C $的几行C,你甚至可以拿回原始的XDocument,然后<一href="http://vijay.screamingpens.com/archive/2008/05/26/linq-amp-lambda-part-3-html-agility-pack-to-linq.aspx">craft在LINQ查询到XML 。老实说,如果网页抓取是你的目标,这是关于最干净的解决方案,您可能会发现。

I was asking myself this same question and after struggling to follow advice here and elsewhere to get IronPython and BeautifulSoup to play nicely with my existing code I decided to go looking for an alternative native .NET solution. BeautifulSoup is a wonderful bit of code and at first it didn't look like there was anything comparable available for .NET, but then I found the HTML Agility Pack and if anything I think I've actually gained some maintainability over BeautifulSoup. It takes clean or crufty HTML and produces a elegant XML DOM from it that can be queried via XPath. With a couple lines of code you can even get back a raw XDocument and then craft your queries in LINQ to XML. Honestly, if web scraping is your goal, this is about the cleanest solution you are likely to find.

修改

下面是一个简单的(阅读:不稳健的话)的例子,分析了美国众议院的再presentatives假期计划:

Here is a simple (read: not robust at all) example that parses out the US House of Representatives holiday schedule:

using System;
using System.Collections.Generic;
using HtmlAgilityPack;

namespace GovParsingTest
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlWeb hw = new HtmlWeb();
            string url = @"http://www.house.gov/house/House_Calendar.shtml";
            HtmlDocument doc = hw.Load(url);

            HtmlNode docNode = doc.DocumentNode;
            HtmlNode div = docNode.SelectSingleNode("//div[@id='primary']");
            HtmlNodeCollection tableRows = div.SelectNodes(".//tr");

            foreach (HtmlNode row in tableRows)
            {
                HtmlNodeCollection cells = row.SelectNodes(".//td");
                HtmlNode dateNode = cells[0];
                HtmlNode eventNode = cells[1];

                while (eventNode.HasChildNodes)
                {
                    eventNode = eventNode.FirstChild;
                }

                Console.WriteLine(dateNode.InnerText);
                Console.WriteLine(eventNode.InnerText);
                Console.WriteLine();
            }

            //Console.WriteLine(div.InnerHtml);
            Console.ReadKey();
        }
    }
}

这篇关于铁蟒蛇,美丽的汤,win32的应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆