使用htmlagilitypack从网页中提取所有的"href"/请求任何内容 [英] Extract all a `href`s from webpage with htmlagilitypack/requests anything

查看：87 发布时间：2020/11/24 19:23:45 c# httpwebrequest html-agility-pack

本文介绍了使用htmlagilitypack从网页中提取所有的"href"/请求任何内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有此网页源:

<a href="/StefaniStoikova"><img alt="" class="head" id="face_6306494" src="http://img0.ask.fm/assets/054/771/271/thumb_tiny/sam_7082.jpg" /></a>
<a href="/devos"><img alt="" class="head" id="face_18603180" src="http://img7.ask.fm/assets/043/424/871/thumb_tiny/devos.jpg" /></a>
<a href="/frenop"><img alt="" class="head" id="face_4953081" src="http://img1.ask.fm/assets/029/163/760/thumb_tiny/dsci0744.jpg" /></a>

我想在<a href-"之后提取字符串.但是我的主要问题是这些字符串是不同的，我似乎也找不到办法.既没有敏捷包也没有Web请求.

And I want to extract the string right after the <a href-". But my main problem is that these strings are different and I don't seem to find a way. With neither agilitypack or webrequests.

也许有人对正则表达式有所了解?分享.

Maybe someone has idea about regular expression? Share it.

推荐答案

使用HtmlAgilityPack获得所需的东西应该很简单.假设您已将文档加载到名为doc的HtmlDocument对象中:

It should be quite simple to get what you need with the HtmlAgilityPack. Assuming you have your document loaded into an HtmlDocument object named doc:

HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//a[@href]");

foreach (HtmlNode node in collection)
{
    // Do what you want with the href value in here. As an example, this just
    //  just prints the value to the console.
    Console.WriteLine(node.GetAttributeValue("href", "default"));
}

这篇关于使用htmlagilitypack从网页中提取所有的"href"/请求任何内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用htmlagilitypack从网页中提取所有的"href"/请求任何内容 [英] Extract all a `href`s from webpage with htmlagilitypack/requests anything

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

使用htmlagilitypack从网页中提取所有的"href"/请求任何内容 [英] Extract all a `href`s from webpage with htmlagilitypack/requests anything

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭