在jsoup中使用正则表达式 [英] using a regex in jsoup
问题描述
我正在jsoup
中尝试我的第一个认真的项目,但我在此问题上陷入了困境-
I'm trying my first serious project in jsoup
and I've got stuck in this matter-
我正在尝试从网站获取邮政编码.有一个邮政编码列表.
I'm trying to get zipcodes from a site. There is a list of zipcodes.
以下是显示邮政编码的几行之一-
Here is one of the lines that presents the zipcode-
<td align="center"><a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a></td>
所以我得到的想法是浏览页面,并从1-9获得所有包含6位数字的字符串.正则表达式为^[0-9]{6,6}$
So the idea I've got is going through the page and getting all the strings that contain 6 digits from 1-9. Regex is ^[0-9]{6,6}$
代码为-
doc.select("td:matchesOwn(^[0-9]{5,5}$)");
但是什么都没出来.我找不到从该站点获取这些邮政编码的方法.... 有人知道怎么做吗?
but nothing came out. I can't find the way to get these zipcodes out of that site.... Does anyone know how to do it?
这里真正的问题是如何获取不在任何标签中但只是公开写出的数字(我想有一个术语,但对于xml术语我不是那么好)
the real question here is how do i get the numbers that are not in any tags,but just written out in the open (i guess there is a term for that but im not that good with xml terms)
推荐答案
我使用Element#getElementsMatchingOwnText
解决了该问题:
public static void main(String[] args) {
final String html = "<td align=\"center\"><a href=\"http://www.zipcodestogo.com/Hialeah/FL/33011/\">33011</a></td> ";
final Elements elements = Jsoup.parse(html).getElementsMatchingOwnText("^[0-9]{5,5}$");
for (final Element element : elements) {
System.out.println("element = [" + element + "]");
System.out.println("zip = [" + element.text() + "]");
}
}
输出:
element = [<a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a>]
zip = [33011]
这篇关于在jsoup中使用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!