F#.Data HTML解析器从节点提取字符串 [英] F#.Data HTML Parser Extracting Strings From Nodes

查看:53
本文介绍了F#.Data HTML解析器从节点提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用FSharp.Data的HTML解析器从href属性中提取一个字符串链接列表.

I am trying to use FSharp.Data's HTML Parser to extract a string List of links from href attributes.

我可以将链接打印出到控制台,但是,我正在努力将它们列入列表.

I can get the links printed out to console, however, i'm struggling to get them into a list.

可打印所需链接的代码的工作片段:

Working snippet of a code which prints the wanted links:

let results = HtmlDocument.Load(myUrl)
let links = 
    results.Descendants("td")
    |> Seq.filter (fun x -> x.HasClass("pagenav"))
    |> Seq.map (fun x -> x.Elements("a"))
    |> Seq.iter (fun x -> x |> Seq.iter (fun y -> y.AttributeValue("href") |> printf "%A"))

如何将这些字符串存储到变量链接中,而不是将其打印出来?

How do i store those strings into variable links instead of printing them out?

干杯

推荐答案

在最后一行,您最终得到一个序列序列-对于每个 td.pagenav ,您都有一串< a> ,每个都有一个 href .这就是为什么必须有两个嵌套的 Seq.iter s的原因-首先您对外部序列进行迭代,然后在每次迭代时对内部序列进行迭代.

On the very last line, you end up with a sequence of sequences - for each td.pagenav you have a bunch of <a>, each of which has a href. That's why you have to have two nested Seq.iters - first you iterate over the outer sequence, and on each iteration you iterate over the inner sequence.

要展平序列序列,请使用 Seq.collect .此外,要将序列转换为列表,请使用 Seq.toList List.ofSeq (它们是等效的):

To flatten a sequence of sequences, use Seq.collect. Further, to convert a sequence to a list, use Seq.toList or List.ofSeq (they're equivalent):

let a = [ [1;2;3];  [4;5;6]  ]
let b = a |> Seq.collect id |> Seq.toList
> val b : int list = [1; 2; 3; 4; 5; 6]

将此应用于您的代码:

let links = 
    results.Descendants("td")
    |> Seq.filter (fun x -> x.HasClass("pagenav"))
    |> Seq.map (fun x -> x.Elements("a"))
    |> Seq.collect (fun x -> x |> Seq.map (fun y -> y.AttributeValue("href")))
    |> Seq.toList

或者您可以通过在第一次遇到嵌套序列的位置应用 Seq.collect 使其更简洁一些:

Or you could make it a bit cleaner by applying Seq.collect at the point where you first encounter a nested sequence:

let links = 
    results.Descendants("td")
    |> Seq.filter (fun x -> x.HasClass("pagenav"))
    |> Seq.collect (fun x -> x.Elements("a"))
    |> Seq.map (fun y -> y.AttributeValue("href"))
    |> Seq.toList

也就是说,我宁愿将其重写为列表理解.看起来更干净:

That said, I would rather rewrite this as a list comprehension. Looks even cleaner:

let links = [ for td in results.Descendants "td" do
                if td.HasClass "pagenav" then
                  for a in td.Elements "a" ->
                    a.AttributeValue "href"
            ]

这篇关于F#.Data HTML解析器从节点提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆