JSOUP查找单词组 [英] JSOUP Finding Groups of Words

查看：55 发布时间：2020/4/24 10:00:52 java html-parsing jsoup

本文介绍了JSOUP查找单词组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于家庭作业，我必须编写一个程序，该程序从网站上抓取HTML，然后以某种方式在网站内查找短语.当我说短语时，是指以某种任意的方式组织文本，以便彼此接近的词放在同一组中.我知道这听起来确实不清楚，但是作业说明我们如何执行此操作取决于我们自己对如何找到短语"的解释.

For a homework assignment I have to write a program that scraps HTML from a website and then somehow find phrases within the website. When I say phrases I mean some sort of arbitrary way of organizing text so that words that are in close proximity to each other are put in the same group. I know this sounds really unclear, but the assignment states how we do this is up to our own interpretation of how to find "phrases".

目前，我的代码如下:

Document doc = Jsoup.connect("http://oracle.com/").get();
String html = doc.body().toString();

System.out.println(html);

在解析所有html时，哪一个页面上出现的所有不同单词会给我一个不错的打印输出.

Which will give me a decent printout of all the different words that appear on some webpage while parsing out all the html.

我的主要问题是我想不出一种解析HTML的方法，这样我就可以以某种方式将这些任意组组合在一起(而且我不知道我可以使用什么样的标准来任意地形成这些组"的单词).

My main problem is I can't think of a way to parse the HTML so that I can somehow get these arbitrary groups together (and I don't know what kind of criteria I can use to arbitrarily form these "groups" of words).

我知道这个问题听起来很糟糕，但是我不知道该怎么说，而且我真的不知道该做什么.给我的任务非常不清楚，当要求澄清时，我的教授只是告诉我自己解释.我想知道是否有人对如何解析html有任何想法，以便彼此接近的单词(可能在相似的html标签之内或类似的东西)可以类似于我现在的当前输出被过滤掉，除非在每个短语"之后都可以. 就像换行符或我可以解析的内容.

I know this question sounds terrible but I don't know how else I can state it, and I am really out of ideas as to what I can do. The assignment I was given is extremely unclear, and when asked for clarification my professor just tells me to interpret it myself. I was wondering if anyone had any ideas on how to parse the html so that words close to each other (maybe inside similar html tags or something) could be filtered out similar to the current output I have right now, except maybe after every "phrase" there's like a newline or something I can parse.

感谢您的任何想法或建议.

Thanks for any ideas or advice.

JSOUP查找单词组 [英] JSOUP Finding Groups of Words

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

JSOUP查找单词组 [英] JSOUP Finding Groups of Words

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭