通过Android的URL格式刮痧数据/ jsoup [英] Scraping Data in URL format via Android / jsoup

查看:139
本文介绍了通过Android的URL格式刮痧数据/ jsoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从一但HTML表格每次我这样做,我得到的HREF标题数据,而不是URL时刮网址 - 有没有人如何可以解决/避免

I'm attempting to scrape a URL from a HTML table however every time I do so I get the HREF title data instead of the URL - does anyone how this can be resolved/avoided?

<table class="datagrid">
        <tr>
            <th>Number</th>
            <th>Name</th>
            <th>Sex</th>
            <th>Location</th>
        </tr>

            <tr>
                <td><a href="redirector.cfm?ID=93bd5121-7a3b-4a56-a576-f432e542047a&page=1&&amp;lname=&amp;fname=" title="501207593">501207593&nbsp;</a></td>
                <td>AARON, JUSTIN COLBY&nbsp;</td>
                <td>M&nbsp;</td>
                <td>Facility 1</td>
            </tr>

            <tr>
                <td><a href="redirector.cfm?ID=c5629a92-7113-487c-ba9b-1e62203ab08d&page=1&&amp;lname=&amp;fname=" title="501302750">501302750&nbsp;</a></td>
                <td>AARONSON, CARY HOWARD&nbsp;</td>
                <td>M&nbsp;</td>
                <td>Facility 2</td>
            </tr>

            <tr>
                <td><a href="redirector.cfm?ID=66d01768-5686-44eb-ac6a-16eb783f52d0&page=1&&amp;lname=&amp;fname=" title="501306284">501306284&nbsp;</a></td>
                <td>ABBOTT, LAUREA &nbsp;</td>
                <td>F&nbsp;</td>
                <td>Facility 3</td>
            </tr>

来源:

public class MainActivity extends Activity {

    TextView tv;
    String url = "http://google.com";
    String tr;
    Document doc;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tv = (TextView) findViewById(R.id.TextView01);
        new MyTask().execute(url);
    }

    private class MyTask extends AsyncTask<String, Void, String> {

        ProgressDialog prog;

        String title = "";

        @Override
        protected void onPreExecute() {
            prog = new ProgressDialog(MainActivity.this);
            prog.setMessage("Loading....");
            prog.show();
        }

        @Override
        protected String doInBackground(String... params) {
            try {
                doc = Jsoup.connect(params[0]).get();
                Element tableElement = doc.select(".datagrid").first();

                Elements tableRows = tableElement.select("tr");
                for (Element row : tableRows) {
                    Elements cells = row.select("td");
                    if (cells.size() > 0) {
                        title = cells.get(0).text() + "; "
                                + cells.get(1).text() + "; "
                                + cells.get(2).text() + "; "
                                + cells.get(3).text();
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
            return title;
        }

        @Override
        protected void onPostExecute(String title) {
            super.onPostExecute(title);
            prog.dismiss();
            tv.setText(title);
        }
    }
}

目前的结果:

501306284; ABBOTT,拉瑞尔; F ;设施3

501306284; ABBOTT, LAUREA; F ; Facility 3

期望的结果:

redirector.cfm I​​D = 66d01768-5686-44eb-ac6a-16eb783f52d0&放大器;页= 1&安培;&安培; L-NAME =安培; FNAME =称号=501306284; ABBOTT,拉瑞尔; F ;设施3

redirector.cfm?ID=66d01768-5686-44eb-ac6a-16eb783f52d0&page=1&&lname=&fname=" title="501306284; ABBOTT, LAUREA; F ; Facility 3

或更好,但...

预期的效果。

点击这里查看更多详情(小于-URL); ABBOTT,拉瑞尔; F ;设施3

Click HERE for more info (<-URL); ABBOTT, LAUREA; F ; Facility 3

推荐答案

您似乎刚开的文本

cells.get(0).text()

我觉得这是你正在尝试做的。

I think this is what you are trying to do

cells.get(0).child(0).attr("href")

查看本链接的文档

这篇关于通过Android的URL格式刮痧数据/ jsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆