通过html字符串搜索 [英] Searching through an html string
问题描述
嗨.
从字符串形式获取的html代码中提取信息时,我遇到了一些问题.
我在网站上发布了文章,并以字符串形式返回了html代码中的网站.看起来像这样:
Hi.
I have a little problem with pulling information out of html code I get as a string.
I make a post to a website and get the website in html code back as a string. Looks like this:
<h3 class="category">Asia</h3>
<div class="server-list">
<div class="server">
<div class="status-icon up" data-tooltip="Available">
</div>
<div class="server-name">
我在同一字符串中得到大约20个.每个都有不同的名称,只有两个可能的数据工具提示,可用"或不可用".
我遇到的麻烦是搜索字符串并检查每个项目是否可用.
有人知道这样做的好方法吗?
I get around 20 of these in the same string. Each with different name and only two possible data-tooltip, either "Available" or "Unavailable".
What I am having trouble with is searching through the string and checking each item if it is available or not.
Does someone know a good method for doing this?
推荐答案
假设所有条目看起来都像这样,并且您想要亚洲"-可用",依此类推,您可以使用以下正则表达式(单行):
Supposing that all your entries look like this and you want "Asia"-"Available" and so on, you can use the following regular expression (singleline):
<h3.*?>(.*?)</h3>.*?data-tooltip="(.*?)"
解析所有结果,您将拥有它.
更新:给定的第二个文本示例需要一个不同的正则表达式,但是无论哪种方式都可以看到完整的代码:
Parse all the results, and you will have it.
Update: the second text sample given needs a different regular expression, but either way see this complete code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
namespace t1
{
class Program
{
public struct ServerStatus
{
public string ServerName { get; set; }
public string Status { get; set; }
}
public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
{
List<ServerStatus> result = new List<ServerStatus>();
Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);
Match m = r.Match(HTMLString);
while (m.Success)
{
result.Add(new ServerStatus() { ServerName = m.Groups[2].Value, Status = m.Groups[1].Value });
m = m.NextMatch();
}
return result;
}
static void Main(string[] args)
{
string x = @"<div class=""server alt"">
<div class=""status-icon up"" data-tooltip=""Available"">
</div>
<div class=""server-name"">
Hardcore
</div>
<span class=""clear""><!-- --></span>
</div>
<div class=""server"">
<div class=""status-icon down"" data-tooltip=""Maintenance"">
</div>
<div class=""server-name"">
USD
</div>
<span class=""clear""><!-- --></span>
</div>";
foreach (ServerStatus s in GetStatusFromHtml(x))
{
Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
}
}
}
}
Update 2:使用真实来源.
Update 2: using the real source.
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;
namespace t1
{
class Program
{
public struct ServerStatus
{
public string ServerName { get; set; }
public string Status { get; set; }
}
public static IList<ServerStatus> GetStatusFromHtml(string HTMLString)
{
List<ServerStatus> result = new List<ServerStatus>();
Regex r = new Regex(@"class=""server.*?data-tooltip=""(.*?)"".*?class=""server-name"">\s*(.*?)\s*</div>", RegexOptions.Singleline);
Match m = r.Match(HTMLString);
while (m.Success)
{
string ServerName = m.Groups[2].Value;
string Status = m.Groups[1].Value;
if(ServerName != string.Empty && Status != string.Empty)
{
result.Add(new ServerStatus() { ServerName = ServerName, Status = Status});
}
m = m.NextMatch();
}
return result;
}
static void Main(string[] args)
{
WebRequest request = WebRequest.Create(@"http://us.battle.net/d3/en/status");
using (WebResponse response = request.GetResponse())
{
using (StreamReader reader = new StreamReader
(response.GetResponseStream(), Encoding.UTF8))
{
string content = reader.ReadToEnd();
foreach (ServerStatus s in GetStatusFromHtml(content))
{
Console.WriteLine("{0}:{1}", s.ServerName, s.Status);
}
}
}
}
}
}
这篇关于通过html字符串搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!