Java网站元数据 [英] Java web site meta data

查看:81
本文介绍了Java网站元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Java从网站提取元数据的最佳方法是什么?

Using Java what is the best way to extract meta data from a website?

我打算请求整个页面,然后查找元数据在该页面中的位置-这似乎很麻烦,是否有更好的方法来实现这一目标?

I am planning on requesting the entire page, then finding where the meta data is located in that page - this seems cumbersome, is there a better way to achieve this?

推荐答案

就我所知,这确实很麻烦,实际上,这是唯一的方法.

Cumbersome as it is, it's practically the only way, as far as I know.

您可以做的是仅读取前几个字节,例如2000.这可能节省一些时间,但不能保证将读取所有元标记.

What you can do is reading only a certain first few bytes, say 2000. This might save some time but it won't guarantee that all meta tags will be read.

另一种方法是分块读取,扫描字符串</head>,否则,继续读取.不过,对于带有<head>大标签的页面,这可能会花费更长的时间.

Another way is to read in chunks, scan for the string </head>, if not, continue reading. This could potentially take longer for pages with large <head> tag, though.

原始html不应太长,无法进行处理.

Raw html shouldn't be too long to process anyway.

这篇关于Java网站元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆