Java网站元数据 [英] Java web site meta data
问题描述
使用Java从网站提取元数据的最佳方法是什么?
Using Java what is the best way to extract meta data from a website?
我打算请求整个页面,然后查找元数据在该页面中的位置-这似乎很麻烦,是否有更好的方法来实现这一目标?
I am planning on requesting the entire page, then finding where the meta data is located in that page - this seems cumbersome, is there a better way to achieve this?
推荐答案
就我所知,这确实很麻烦,实际上,这是唯一的方法.
Cumbersome as it is, it's practically the only way, as far as I know.
您可以做的是仅读取前几个字节,例如2000.这可能节省一些时间,但不能保证将读取所有元标记.
What you can do is reading only a certain first few bytes, say 2000. This might save some time but it won't guarantee that all meta tags will be read.
另一种方法是分块读取,扫描字符串</head>
,否则,继续读取.不过,对于带有<head>
大标签的页面,这可能会花费更长的时间.
Another way is to read in chunks, scan for the string </head>
, if not, continue reading. This could potentially take longer for pages with large <head>
tag, though.
原始html不应太长,无法进行处理.
Raw html shouldn't be too long to process anyway.
这篇关于Java网站元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!