如何解析简单的HTML code。与jsoup?安卓 [英] How to parse simple html code with jsoup? android
问题描述
这是我的html code部分
< DIV CLASS =条目themeform>
< H3>博士詹姆斯&放大器;#8211; opiekun naukowy< / H3 GT&;
&所述p为H.;&下; A HREF =http://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-31ALT =gryniaSRC = http://www.page.com/picture.jpg宽度=200HEIGHT =300/>&下; / A>&下; / p>
< H3>凯文&放大器;#8211; prezes< / H3 GT&;
&所述p为H.;&下; A HREF =http://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-35ALT =prezes SRC =http://www.page.com/picture.jpg宽度=217HEIGHT =300/>&下; / A>&下; / p>
< H3>&露西放大器;#8211; WICE prezes< / H3 GT&;
&所述p为H.;&下; A HREF =http://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-36ALT =露SRC = http://www.page.com/picture.jpg宽度=225HEIGHT =300/>&下; / A>&下; / p>
< H3>Zarząd< / H3 GT&;
&所述p为H.;&下; A HREF =http://www.page.com/picture.jpg>&下; IMG类=alignnone WP-图像-37ALT =zarzad_KNSESRC =HTTP:/ /www.page.com/picture.jpg宽度=489HEIGHT =256/>&下; / A>&下; / p>
< DIV CLASS =清除>< / DIV>
< / DIV><! - 。/入门 - >
首先,我想从解析文本标签在这股利。它,如果你帮我在这个分区解析图像也将是不错的(我改变了图片的URL,因为隐私)。我在jsoup新的,所以如果你给我写code,只是文字解析到Android的活动我将不胜感激。
修改
好吧,对于一开始我试图解析标题在你(SMR)教程中所示的。
下面是code:
进口java.io.IOException异常;进口org.jsoup.Jsoup;
进口org.jsoup.nodes.Document;进口com.example.uwbnewapptest.R;进口android.app.Activity;
进口android.os.AsyncTask;
进口android.os.Bundle;
进口android.view.View;
进口android.widget.TextView;公共类KnseActivity延伸活动{ // TextView的称号;
字符串URL =http://www.google.com
@覆盖
保护无效的onCreate(捆绑savedInstanceState){
// TODO自动生成方法存根
super.onCreate(savedInstanceState);
的setContentView(R.layout.knse_main);
//标题=(的TextView)findViewById(R.id.textView1); }
公共无效BT(视图v){
新标题()的execute()。
} 私有类题目扩展的AsyncTask<太虚,太虚,太虚> {
字符串称号; @覆盖
保护无效doInBackground(虚空...... PARAMS){
尝试{
//连接到网站
文献文件= Jsoup.connect(URL)获得();
//获取HTML文档标题
标题= document.title时();
}赶上(IOException异常五){
e.printStackTrace();
}
返回null;
} @覆盖
保护无效onPostExecute(虚空结果){
//设置标题为的TextView
TextView的txttitle =(的TextView)findViewById(R.id.textView1);
txttitle.setText(职称);
} }}
但是当我运行的应用程序并点击按钮,我有一个错误
编辑2:
06-21 16:18:01.808:E / AndroidRuntime(28063):致命异常:AsyncTask的#2
06-21 16:18:01.808:E / AndroidRuntime(28063):工艺:com.example.uwbnewapptest,PID:28063
06-21 16:18:01.808:E / AndroidRuntime(28063):了java.lang.RuntimeException:执行doInBackground发生错误()
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ 3.done(AsyncTask.java:300)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:355)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.setException(FutureTask.java:222)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.run(FutureTask.java:242)
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ SerialExecutor $ 1.run(AsyncTask.java:231)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:587)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.lang.Thread.run(Thread.java:841)
06-21 16:18:01.808:E / AndroidRuntime(28063):java.lang.NoClassDefFoundError的:产生的原因org.jsoup.Jsoup
06-21 16:18:01.808:E / AndroidRuntime(28063):在com.uwbapp.KnseActivity $ Title.doInBackground(KnseActivity.java:43)
06-21 16:18:01.808:E / AndroidRuntime(28063):在com.uwbapp.KnseActivity $ Title.doInBackground(KnseActivity.java:1)
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ 2.call(AsyncTask.java:288)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.run(FutureTask.java:237)
06-21 16:18:01.808:E / AndroidRuntime(28063):... 4个
有多种方式使用jsoup来提取数据。检查 http://jsoup.org/cookbook/extracting-data/selector-syntax。
在您的情况下得到的文本和图像的来源,你可以不喜欢
文档的DOC = Jsoup.connect(URL)获得();
对于(DIV元素:doc.select(格)){
的System.out.println(div.text());
对于(IMG元素:div.select(IMG)){
的System.out.println(img.attr(SRC));
}
}
This is the part of my html code
<div class="entry themeform">
<h3>dr James – opiekun naukowy</h3>
<p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-31" alt="grynia" src="http://www.page.com/picture.jpg" width="200" height="300" /></a></p>
<h3>Kevin – prezes</h3>
<p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-35" alt="prezes" src="http://www.page.com/picture.jpg" width="217" height="300" /></a></p>
<h3>Lucy – wice prezes</h3>
<p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-36" alt="Lucy" src="http://www.page.com/picture.jpg" width="225" height="300" /></a></p>
<h3>Zarząd</h3>
<p><a href="http://www.page.com/picture.jpg"><img class="alignnone wp-image-37" alt="zarzad_KNSE" src="http://www.page.com/picture.jpg" width="489" height="256" /></a></p>
<div class="clear"></div
>
</div><!--/.entry-->
Firstly, I want to parse text from tags in this div. It also would be nice if you help me with parsing the images in this div (I changed picture urls, because of privacy). I am new in jsoup, so I would be grateful if you write me a code, just for parsing the text to the android activity.
EDIT Ok, for the beginning I am trying to parse the title as it shown in yours (SMR) tutorial.
Here is the code:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import com.example.uwbnewapptest.R;
import android.app.Activity;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.View;
import android.widget.TextView;
public class KnseActivity extends Activity {
//TextView title;
String url="http://www.google.com";
@Override
protected void onCreate(Bundle savedInstanceState) {
// TODO Auto-generated method stub
super.onCreate(savedInstanceState);
setContentView(R.layout.knse_main);
//title = (TextView) findViewById(R.id.textView1);
}
public void bt(View v){
new Title().execute();
}
private class Title extends AsyncTask<Void, Void, Void> {
String title;
@Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document document = Jsoup.connect(url).get();
// Get the html document title
title = document.title();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
@Override
protected void onPostExecute(Void result) {
// Set title into TextView
TextView txttitle = (TextView) findViewById(R.id.textView1);
txttitle.setText(title);
}
}
}
But when I run app and click on the button, I have an error
EDIT 2:
06-21 16:18:01.808: E/AndroidRuntime(28063): FATAL EXCEPTION: AsyncTask #2
06-21 16:18:01.808: E/AndroidRuntime(28063): Process: com.example.uwbnewapptest, PID: 28063
06-21 16:18:01.808: E/AndroidRuntime(28063): java.lang.RuntimeException: An error occured while executing doInBackground()
06-21 16:18:01.808: E/AndroidRuntime(28063): at android.os.AsyncTask$3.done(AsyncTask.java:300)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:355)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.FutureTask.setException(FutureTask.java:222)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.FutureTask.run(FutureTask.java:242)
06-21 16:18:01.808: E/AndroidRuntime(28063): at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:231)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.lang.Thread.run(Thread.java:841)
06-21 16:18:01.808: E/AndroidRuntime(28063): Caused by: java.lang.NoClassDefFoundError: org.jsoup.Jsoup
06-21 16:18:01.808: E/AndroidRuntime(28063): at com.uwbapp.KnseActivity$Title.doInBackground(KnseActivity.java:43)
06-21 16:18:01.808: E/AndroidRuntime(28063): at com.uwbapp.KnseActivity$Title.doInBackground(KnseActivity.java:1)
06-21 16:18:01.808: E/AndroidRuntime(28063): at android.os.AsyncTask$2.call(AsyncTask.java:288)
06-21 16:18:01.808: E/AndroidRuntime(28063): at java.util.concurrent.FutureTask.run(FutureTask.java:237)
06-21 16:18:01.808: E/AndroidRuntime(28063): ... 4 more
There are various ways to extract data using jsoup. Check http://jsoup.org/cookbook/extracting-data/selector-syntax.
In your case to get the text and image sources you could do like
Document doc = Jsoup.connect(url).get();
for(Element div : doc.select("div")){
System.out.println(div.text());
for(Element img : div.select("img")){
System.out.println(img.attr("src"));
}
}
这篇关于如何解析简单的HTML code。与jsoup?安卓的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!