如何解析简单的HTML code。与jsoup?安卓 [英] How to parse simple html code with jsoup? android

查看:343
本文介绍了如何解析简单的HTML code。与jsoup?安卓的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的html code部分

 < D​​IV CLASS =条目themeform>
                        < H3>博士詹姆斯&放大器;#8211; opiekun naukowy< / H3 GT&;
    &所述p为H.;&下; A HREF =htt​​p://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-31ALT =gryniaSRC = http://www.page.com/picture.jpg宽度=200HEIGHT =300/>&下; / A>&下; / p>
    < H3>凯文&放​​大器;#8211; prezes< / H3 GT&;
    &所述p为H.;&下; A HREF =htt​​p://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-35ALT =prezes SRC =htt​​p://www.page.com/picture.jpg宽度=217HEIGHT =300/>&下; / A>&下; / p>
    < H3>&露西放大器;#8211; WICE prezes< / H3 GT&;
    &所述p为H.;&下; A HREF =htt​​p://www.page.com/picture.jpg>&下; IMG类=alignnone大小培养基WP-图像-36ALT =露SRC = http://www.page.com/picture.jpg宽度=225HEIGHT =300/>&下; / A>&下; / p>
    < H3>Zarząd< / H3 GT&;
    &所述p为H.;&下; A HREF =htt​​p://www.page.com/picture.jpg>&下; IMG类=alignnone WP-图像-37ALT =zarzad_KNSESRC =HTTP:/ /www.page.com/picture.jpg宽度=489HEIGHT =256/>&下; / A>&下; / p>
                        < D​​IV CLASS =清除>< / DIV>
                < / DIV><! - 。/入门 - >

首先,我想从解析文本标签在这股利。它,如果你帮我在这个分区解析图像也将是不错的(我改变了图片的URL,因为隐私)。我在jsoup新的,所以如果你给我写code,只是文字解析到Android的活动我将不胜感激。

修改
好吧,对于一开始我试图解析标题在你(SMR)教程中所示的。

下面是code:

 进口java.io.IOException异常;进口org.jsoup.Jsoup;
进口org.jsoup.nodes.Document;进口com.example.uwbnewapptest.R;进口android.app.Activity;
进口android.os.AsyncTask;
进口android.os.Bundle;
进口android.view.View;
进口android.widget.TextView;公共类KnseActivity延伸活动{    // TextView的称号;
    字符串URL =htt​​p://www.google.com
    @覆盖
    保护无效的onCreate(捆绑savedInstanceState){
        // TODO自动生成方法存根
        super.onCreate(savedInstanceState);
        的setContentView(R.layout.knse_main);
        //标题=(的TextView)findViewById(R.id.textView1);    }
    公共无效BT(视图v){
        新标题()的execute()。
    }     私有类题目扩展的AsyncTask<太虚,太虚,太虚> {
            字符串称号;            @覆盖
            保护无效doInBackground(虚空...... PARAMS){
                尝试{
                    //连接到网站
                    文献文件= Jsoup.connect(URL)获得();
                    //获取HTML文档标题
                    标题= document.title时();
                }赶上(IOException异常五){
                    e.printStackTrace();
                }
                返回null;
            }            @覆盖
            保护无效onPostExecute(虚空结果){
                //设置标题为的TextView
                TextView的txttitle =(的TextView)findViewById(R.id.textView1);
                txttitle.setText(职称);
            }        }}

但是当我运行的应用程序并点击按钮,我有一个错误

编辑2:

  06-21 16:18:01.808:E / AndroidRuntime(28063):致命异常:AsyncTask的#2
06-21 16:18:01.808:E / AndroidRuntime(28063):工艺:com.example.uwbnewapptest,PID:28063
06-21 16:18:01.808:E / AndroidRuntime(28063):了java.lang.RuntimeException:执行doInBackground发生错误()
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ 3.done(AsyncTask.java:300)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:355)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.setException(FutureTask.java:222)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.run(FutureTask.java:242)
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ SerialExecutor $ 1.run(AsyncTask.java:231)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:587)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.lang.Thread.run(Thread.java:841)
06-21 16:18:01.808:E / AndroidRuntime(28063):java.lang.NoClassDefFoundError的:产生的原因org.jsoup.Jsoup
06-21 16:18:01.808:E / AndroidRuntime(28063):在com.uwbapp.KnseActivity $ Title.doInBackground(KnseActivity.java:43)
06-21 16:18:01.808:E / AndroidRuntime(28063):在com.uwbapp.KnseActivity $ Title.doInBackground(KnseActivity.java:1)
06-21 16:18:01.808:E / AndroidRuntime(28063):在android.os.AsyncTask $ 2.call(AsyncTask.java:288)
06-21 16:18:01.808:E / AndroidRuntime(28063):在java.util.concurrent.FutureTask.run(FutureTask.java:237)
06-21 16:18:01.808:E / AndroidRuntime(28063):... 4个


解决方案

有多种方式使用jsoup来提取数据。检查 http://jsoup.org/cookbook/extracting-data/selector-syntax

在您的情况下得到的文本和图像的来源,你可以不喜欢

 文档的DOC = Jsoup.connect(URL)获得();
对于(DIV元素:doc.select(格)){
    的System.out.println(div.text());
    对于(IMG元素:div.select(IMG)){
        的System.out.println(img.attr(SRC));
    }
}

This is the part of my html code

                    <div class="entry themeform">
                        <h3>dr James &#8211; opiekun naukowy</h3>
    <p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-31" alt="grynia" src="http://www.page.com/picture.jpg" width="200" height="300" /></a></p>
    <h3>Kevin &#8211; prezes</h3>
    <p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-35" alt="prezes" src="http://www.page.com/picture.jpg" width="217" height="300" /></a></p>
    <h3>Lucy &#8211; wice prezes</h3>
    <p><a href="http://www.page.com/picture.jpg"><img class="alignnone size-medium wp-image-36" alt="Lucy" src="http://www.page.com/picture.jpg" width="225" height="300" /></a></p>
    <h3>Zarząd</h3>
    <p><a href="http://www.page.com/picture.jpg"><img class="alignnone  wp-image-37" alt="zarzad_KNSE" src="http://www.page.com/picture.jpg" width="489" height="256" /></a></p>
                        <div class="clear"></div

>
                </div><!--/.entry-->

Firstly, I want to parse text from tags in this div. It also would be nice if you help me with parsing the images in this div (I changed picture urls, because of privacy). I am new in jsoup, so I would be grateful if you write me a code, just for parsing the text to the android activity.

EDIT Ok, for the beginning I am trying to parse the title as it shown in yours (SMR) tutorial.

Here is the code:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.example.uwbnewapptest.R;

import android.app.Activity;
import android.os.AsyncTask;
import android.os.Bundle;
import android.view.View;
import android.widget.TextView;

public class KnseActivity extends Activity {

    //TextView title;
    String url="http://www.google.com";
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        // TODO Auto-generated method stub
        super.onCreate(savedInstanceState);
        setContentView(R.layout.knse_main);
        //title = (TextView) findViewById(R.id.textView1);

    }


    public void bt(View v){
        new Title().execute();
    }

     private class Title extends AsyncTask<Void, Void, Void> {
            String title;



            @Override
            protected Void doInBackground(Void... params) {
                try {
                    // Connect to the web site
                    Document document = Jsoup.connect(url).get();
                    // Get the html document title
                    title = document.title();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                return null;
            }

            @Override
            protected void onPostExecute(Void result) {
                // Set title into TextView
                TextView txttitle = (TextView) findViewById(R.id.textView1);
                txttitle.setText(title);
            }

        }

}

But when I run app and click on the button, I have an error

EDIT 2:

06-21 16:18:01.808: E/AndroidRuntime(28063): FATAL EXCEPTION: AsyncTask #2
06-21 16:18:01.808: E/AndroidRuntime(28063): Process: com.example.uwbnewapptest, PID: 28063
06-21 16:18:01.808: E/AndroidRuntime(28063): java.lang.RuntimeException: An error occured while executing doInBackground()
06-21 16:18:01.808: E/AndroidRuntime(28063):    at android.os.AsyncTask$3.done(AsyncTask.java:300)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:355)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.FutureTask.setException(FutureTask.java:222)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.FutureTask.run(FutureTask.java:242)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:231)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.lang.Thread.run(Thread.java:841)
06-21 16:18:01.808: E/AndroidRuntime(28063): Caused by: java.lang.NoClassDefFoundError: org.jsoup.Jsoup
06-21 16:18:01.808: E/AndroidRuntime(28063):    at com.uwbapp.KnseActivity$Title.doInBackground(KnseActivity.java:43)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at com.uwbapp.KnseActivity$Title.doInBackground(KnseActivity.java:1)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at android.os.AsyncTask$2.call(AsyncTask.java:288)
06-21 16:18:01.808: E/AndroidRuntime(28063):    at java.util.concurrent.FutureTask.run(FutureTask.java:237)
06-21 16:18:01.808: E/AndroidRuntime(28063):    ... 4 more

解决方案

There are various ways to extract data using jsoup. Check http://jsoup.org/cookbook/extracting-data/selector-syntax.

In your case to get the text and image sources you could do like

Document doc = Jsoup.connect(url).get();
for(Element div : doc.select("div")){
    System.out.println(div.text());
    for(Element img : div.select("img")){
        System.out.println(img.attr("src"));
    }
}

这篇关于如何解析简单的HTML code。与jsoup?安卓的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆