通过html字符串搜索 [英] Searching through an html string

查看:122
本文介绍了通过html字符串搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨.
从字符串形式获取的html代码中提取信息时,我遇到了一些问题.
我在网站上发布了文章,并以字符串形式返回了html代码中的网站.看起来像这样:

Hi.
I have a little problem with pulling information out of html code I get as a string.
I make a post to a website and get the website in html code back as a string. Looks like this:

<h3 class="category">Asia</h3>
                                <div class="server-list">
    <div class="server">
        <div class="status-icon up" data-tooltip="Available">
        </div>
        <div class="server-name">



我在同一字符串中得到大约20个.每个都有不同的名称,只有两个可能的数据工具提示,可用"或不可用".
我遇到的麻烦是搜索字符串并检查每个项目是否可用.
有人知道这样做的好方法吗?



I get around 20 of these in the same string. Each with different name and only two possible data-tooltip, either "Available" or "Unavailable".
What I am having trouble with is searching through the string and checking each item if it is available or not.
Does someone know a good method for doing this?

推荐答案

假设所有条目看起来都像这样,并且您想要亚洲"-可用",依此类推,您可以使用以下正则表达式(单行):
Supposing that all your entries look like this and you want "Asia"-"Available" and so on, you can use the following regular expression (singleline):
<h3.*?>(.*?)</h3>.*?data-tooltip="(.*?)"


解析所有结果,您将拥有它.

更新:给定的第二个文本示例需要一个不同的正则表达式,但是无论哪种方式都可以看到完整的代码:


Parse all the results, and you will have it.

Update: the second text sample given needs a different regular expression, but either way see this complete code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace t1
{
    class Program
    {
        public struct ServerStatus
        {
            public string ServerName { get; set; }
            public string Status { get; set; }
        }

        public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
        {
            List<ServerStatus> result = new List<ServerStatus>();
            Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);

            Match m = r.Match(HTMLString);
            while (m.Success)
            {
                result.Add(new ServerStatus() { ServerName = m.Groups[2].Value, Status = m.Groups[1].Value });
                m = m.NextMatch();
            }

            return result;
        }
        
        static void Main(string[] args)
        {
            string x = @"<div class=""server alt"">
		<div class=""status-icon up"" data-tooltip=""Available"">
		</div>
		<div class=""server-name"">
				Hardcore
		</div>
	<span class=""clear""><!-- --></span>
	</div>
	<div class=""server"">
		<div class=""status-icon down"" data-tooltip=""Maintenance"">
		</div>
		<div class=""server-name"">
				USD
		</div>
	<span class=""clear""><!-- --></span>
	</div>";

            foreach (ServerStatus s in GetStatusFromHtml(x))
            {
                Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
            }
        }
    }
}



Update 2:使用真实来源.



Update 2: using the real source.

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;

namespace t1
{
    class Program
    {
        public struct ServerStatus
        {
            public string ServerName { get; set; }
            public string Status { get; set; }
        }

        public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
        {
            List<ServerStatus> result = new List<ServerStatus>();
            Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);

            Match m = r.Match(HTMLString);
            while (m.Success)
            {
                string ServerName = m.Groups[2].Value;
                string Status = m.Groups[1].Value;
                if(ServerName != string.Empty && Status != string.Empty)
                {
                    result.Add(new ServerStatus() { ServerName = ServerName, Status = Status});
                }
                m = m.NextMatch();
            }

            return result;
        }
        
        static void Main(string[] args)
        {
            WebRequest request = WebRequest.Create(@"http://us.battle.net/d3/en/status");
            using (WebResponse response = request.GetResponse())
            {
                using (StreamReader reader = new StreamReader
                   (response.GetResponseStream(), Encoding.UTF8))
                {
                    string content = reader.ReadToEnd();
                    
                    foreach (ServerStatus s in GetStatusFromHtml(content))
                    {
                        Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
                    }
                }
            }            
        }
    }
}


这篇关于通过html字符串搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆