在jsoup中使用正则表达式 [英] using a regex in jsoup

查看：160 发布时间：2020/4/24 10:00:35 java jsoup web-crawler

本文介绍了在jsoup中使用正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在jsoup中尝试我的第一个认真的项目，但我在此问题上陷入了困境-

I'm trying my first serious project in jsoup and I've got stuck in this matter-

我正在尝试从网站获取邮政编码.有一个邮政编码列表.

I'm trying to get zipcodes from a site. There is a list of zipcodes.

以下是显示邮政编码的几行之一-

Here is one of the lines that presents the zipcode-

<td align="center"><a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a></td>

所以我得到的想法是浏览页面，并从1-9获得所有包含6位数字的字符串.正则表达式为^[0-9]{6,6}$

So the idea I've got is going through the page and getting all the strings that contain 6 digits from 1-9. Regex is ^[0-9]{6,6}$

代码为-

doc.select("td:matchesOwn(^[0-9]{5,5}$)");

但是什么都没出来.我找不到从该站点获取这些邮政编码的方法.... 有人知道怎么做吗?

but nothing came out. I can't find the way to get these zipcodes out of that site.... Does anyone know how to do it?

这里真正的问题是如何获取不在任何标签中但只是公开写出的数字(我想有一个术语，但对于xml术语我不是那么好)

the real question here is how do i get the numbers that are not in any tags,but just written out in the open (i guess there is a term for that but im not that good with xml terms)

推荐答案

我使用Element#getElementsMatchingOwnText解决了该问题:

public static void main(String[] args) {
    final String html = "<td align=\"center\"><a href=\"http://www.zipcodestogo.com/Hialeah/FL/33011/\">33011</a></td> ";
    final Elements elements = Jsoup.parse(html).getElementsMatchingOwnText("^[0-9]{5,5}$");

    for (final Element element : elements) {
        System.out.println("element = [" + element + "]");
        System.out.println("zip = [" + element.text() + "]");
    }
}

输出:

element = [<a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a>]
zip = [33011]

这篇关于在jsoup中使用正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在jsoup中使用正则表达式 [英] using a regex in jsoup

问题描述

推荐答案

输出:

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在jsoup中使用正则表达式 [英] using a regex in jsoup

问题描述

推荐答案

输出:

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭