如何解析Java网站上的Google CSE结果? [英] How to parse a Google CSE results located on a site in Java?
问题描述
我想解析自定义搜索元素JavaScript
函数.
这是此功能的模板 https://developers.google.com/custom-search /docs/element#overview .
I want to parse a Custom Search Element JavaScript
function.
Here's a template of this function https://developers.google.com/custom-search/docs/element#overview.
<!-- Put the following javascript before the closing tag. -->
<script>
(function() {
var cx = '123:456'; // Insert your own Custom Search engine ID here
var gcse = document.createElement('script'); gcse.type = 'text/javascript'; gcse.async = true;
gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(gcse, s);
})();
</script>
<!-- Place this tag where you want both of the search box and the search results to render -->
<gcse:search></gcse:search>
I want to parse this function from this site http://findmusicbylyrics.com/search.php?cx=partner-pub-1936238606905173%3A1893984547&cof=FORID%3A10&ie=UTF-8&q=Love&sa=Search+Lyrics which it's JavaScript
is:
<script>
(function() {
var cx = 'partner-pub-1936238606905173:8242090140';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = 'http://www.google.com/cse/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:search></gcse:search>
现在我不知道从哪里开始.我已经使用Java Jsoup
完成了一些HTML
解析,但这是我第一次碰到这个CSE
<script>
标记进行解析.
任何建议将不胜感激.
Now i have no idea of where to start with it. I've done some HTML
parsing using java Jsoup
but this is the first time i bump into this CSE
<script>
tag to parse.
Any suggestions will be very appreciated.
推荐答案
我已经使用Java Jsoup完成了一些HTML解析,但这是我第一次遇到这个CSE标签进行解析.
I've done some HTML parsing using java Jsoup but this is the first time i bump into this CSE tag to parse.
您将获取页面,然后找到脚本元素.完成后,您将在此元素上调用html()
方法.
You'll fetch the page and then find the script element. Once done, you'll call the html()
method on this element.
/**
*
* Extract the Custom Search Element JavaScript of a site.
*
* @param url
* The site url
* @param cssQuery
* The query for finding the script element
* @return the content of the between the tags <script> and </script>
* @throws IOException
* If the CSE Javscript is not found or an error occured during
* {@code url} fetching.
*
*/
public static String getCustomSearchElementJavascript(String url, String cssQuery) throws IOException {
Document doc = Jsoup.connect(url).get();
Element script = doc.select(cssQuery).first();
if (script == null) {
throw new IOException("Unable to find Custom Search Element JavaScript.");
}
return script.html();
}
示例代码
String url = "http://findmusicbylyrics.com/search.php?cx=partner-pub-1936238606905173%3A1893984547&cof=FORID%3A10&ie=UTF-8&q=Love&sa=Search+Lyrics+";
System.out.println( getCustomSearchElementJavascript(url, "div#content > script") );
输出
(function() {
var cx = 'partner-pub-1936238606905173:8242090140';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = 'http://www.google.com/cse/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
这篇关于如何解析Java网站上的Google CSE结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!