如何从< a href>获取文本在嵌套的html元素使用jericho? [英] how to get text from <a href> in nested html elements using jericho?
问题描述
我有这样的html代码
< div class =itm hasOverlay lastrow>
< / a>
< div class =itm-overlay itm-group-mainbox-with-group>< / div>
< / div>
我应该怎么做文字 league-sepatu-casual-geof-sl-lo -hitambiru-68166.html
< a href =联盟sepatu-casual-geof-sl-lo-hitambiru-68166.html> ;
?
这应该是相当简单的...
Source source = new Source(new StringReader(inputString));
元素aElement = source.getFirstElement(HTMLElementName.A);
String href = aElement.getAttributeValue(href);
System.out.println(href);
...虽然这当然有一些假设:也就是说, inputString
只是您发布的字符串(并且此部分未包含在其他标记中),并且此部分仅包含单个链接( A
)。
(如果这些假设无效,则必须识别此特定的 I have some html code like this What should I do to get text league-sepatu-casual-geof-sl-lo-hitambiru-68166.html in That should be rather simple... ... although this makes some assumptions, of course: Namely, that the (If these assumptions are not valid, one somehow has to identify this particular 这篇关于如何从< a href>获取文本在嵌套的html元素使用jericho?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! div
和正确的<$ c例如,通过搜索属性 class =itm hasOverlay lastrow的
div
,对于 a
与类 class =itm-link itm -drk trackingOnClick
> - 无论如何,人们必须更多地了解从中提取这些信息的文档的实际结构 。<div class="itm hasOverlay lastrow">
<a id="3:LE343SPABGLIANID" class="itm-link itm-drk trackingOnClick" title="League Sepatu Casual Geof S/L LO - Hitam/Biru" href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html" rel="-standard|">
</a>
<div class="itm-overlay itm-group-mainbox-with-group"></div>
</div>
<a href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html">
?Source source=new Source(new StringReader(inputString));
Element aElement = source.getFirstElement(HTMLElementName.A);
String href = aElement.getAttributeValue("href");
System.out.println(href);
inputString
is only the string that you posted (and that this part is not enclosed in other tags), and that this part only contains a single link (a
). div
and the correct a
tag. For example, by searching for a div
with the attribute class="itm hasOverlay lastrow"
and for a a
with the class class="itm-link itm-drk trackingOnClick"
- in any case, one has to know more about the actual structure of the document from which this information should be extracted)