如何从与jsoup这个网站页面文字? [英] How to get text from this html page with jsoup?
问题描述
我用这code中检索本网页上的主要文章的文本。
I am using this code to retreive the text in the main article on this page.
public class HtmlparserExampleActivity extends Activity {
String outputtext;
TagFindingVisitor visitor;
Parser parser = null;
private static final String TAG = "TVGuide";
TextView outputTextView;
/** Called when the activity is first created. */
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
outputTextView = (TextView)findViewById(R.id.outputTextView);
String id = "main-article-content";
Document doc = null;
try {
doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Log.i("DOC", doc.toString().toString());
Elements elementsHtml = doc.getElementsByTag(id);
String[] temp1 = new String[99];
int i =0;
for(Element element: elementsHtml)
{
temp1[1] = element.text();
i++;
outputTextView.setText(temp1[1]);
问题是什么也没有显示出来,在TextView中。没有,我想中检索的文本被显示出来。该Log.i是显示了在调试日志段。所以我知道它的成功连接。只是不知道为什么,我不是凑了TextView中的任何文本。
The problem is nothing is showing up in the textview. None of the text that i am trying to retreive is showing up. The Log.i is showing up with the segments in the debug log. So i know its connecting successfully. Just dont know why im not getting any text in the textview.
推荐答案
下面是相关的,从你的问题简化提取物:
Here's a simplified extract of relevance from your question:
Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();
Elements elementsHtml = doc.getElementsByTag("main-article-content");
// ...
您正在做一个根本的错误在这里。有没有的HTML标签,如<主,文章内容>
在文档中。然而,有一个< DIV ID =主文章内容>
。根据大约一半此 Jsoup菜谱,您应该使用CSS选择器概述 #ID
选择。
You're making a fundamental mistake here. There are no HTML tags like <main-article-content>
in the document. However, there's a <div id="main-article-content">
. According the CSS selector overview about halfway this Jsoup cookbook, you should be using #id
selector.
Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();
Element mainArticleContent = doc.select("#main-article-content").first();
// ...
这篇关于如何从与jsoup这个网站页面文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!