如何选择第n个元素的特定类型的放大? [英] How to select nth element of particular type in enlive?
问题描述
我试图使用基于表格的布局从一个页面中刮除一些数据。所以,得到一些数据,我需要得到的东西像第三表里面第二表里面第五表里面的第一表里面的身体。我试图使用放大,但不能弄清楚如何使用nth-of-type和其他选择器步骤。更糟糕的是,有问题的页面在主体内有一个顶级表,但是(select data [:body:>:table])由于某种原因返回6个结果。
nth-of-type
,以下示例是否有帮助?
user> (require'[net.cgrand.enlive-html:as html])
user> (def test-html
< html>< head>< / head>< body>< p> first< / p>< p> second< / p>< p& ; / p>< / body>< / html>)
#'user / test-html
user> (html / select(html / html-resource(java.io.StringReader.test-html))
[[:p(html / nth-of-type 2)]])
({ tag:p,:attrs nil,:content [second]})
第二个问题。您的方法似乎与一个朴素的测试工作:
user> (def test-html< html>< head>< / head>< body>< div>< p> in div< / p>< / div>< p& / p>< / body>< / html>)
#'user / test-html
user> (html / select(html / html-resource(java.io.StringReader.test-html))[:body:>:p])
({:tag:p,:attrs nil,:content [ not in div]})
任何机会看你实际的HTML?
更新: (回应评论)
第二个内的
< div>
内的第二< p>
>< div>
$ b
(def test-html< html>< head>< / head>< body>< div>< p>这不是一个< / p>< p& < div>< p>或此事< / p>< p>跳过此< / p>< / div>< / div><一个< / p>< / span>< div>< p>不是这一个< / p>< p>不是< / p>< div>< p&几乎< / p>< p>这一个< / p>< / div>< / div>< p>肯定不是这一个< / p>< / body>< / html> b $ b#'user / test-html
user> (html / select(html / html-resource(java.io.StringReader.test-html))
[[:div(html / nth-of-type 2)]:>:div:> [ :p(html / nth-of-type 2)]])
({:tag:p,:attrs nil,:content [this one]})
I am trying to scrape some data from a page with a table based layout. So, to get some of the data I need to get something like 3rd table inside 2nd table inside 5th table inside 1st table inside body. I am trying to use enlive, but cannot figure out how to use nth-of-type and other selector steps. To make matters worse, the page in question has a single top level table inside the body, but (select data [:body :> :table]) returns 6 results for some reason. What the hell am I doing wrong?
For nth-of-type
, does the following example help?
user> (require '[net.cgrand.enlive-html :as html])
user> (def test-html
"<html><head></head><body><p>first</p><p>second</p><p>third</p></body></html>")
#'user/test-html
user> (html/select (html/html-resource (java.io.StringReader. test-html))
[[:p (html/nth-of-type 2)]])
({:tag :p, :attrs nil, :content ["second"]})
No idea about the second issue. Your approach seems to work with a naive test:
user> (def test-html "<html><head></head><body><div><p>in div</p></div><p>not in div</p></body></html>")
#'user/test-html
user> (html/select (html/html-resource (java.io.StringReader. test-html)) [:body :> :p])
({:tag :p, :attrs nil, :content ["not in div"]})
Any chance of looking at your actual HTML?
Update: (in response to the comment)
Here's another example where "the second <p>
inside the <div>
inside the second <div>
inside whatever" is returned:
user> (def test-html "<html><head></head><body><div><p>this is not the one</p><p>nor this</p><div><p>or for that matter this</p><p>skip this one too</p></div></div><span><p>definitely not this one</p></span><div><p>not this one</p><p>not this one either</p><div><p>not this one, but almost</p><p>this one</p></div></div><p>certainly not this one</p></body></html>")
#'user/test-html
user> (html/select (html/html-resource (java.io.StringReader. test-html))
[[:div (html/nth-of-type 2)] :> :div :> [:p (html/nth-of-type 2)]])
({:tag :p, :attrs nil, :content ["this one"]})
这篇关于如何选择第n个元素的特定类型的放大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!