使用Jsoup摆脱html源代码code数据 [英] Using Jsoup to get data from html source code

查看:198
本文介绍了使用Jsoup摆脱html源代码code数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要访问一个网址,并撤出它的一些信息。我使用Android的工作室。我有code,不抛出任何错误,但它显示任何信息。我相信这个问题大概就是我在寻找我。选择语句错误的参数。请记住,我是很新的的Java / Android的发展。
这里是我的code:

I need to access a url and pull some information from it. I am using Android Studio. I have code that does not throw any errors, but it is displaying no information. I believe the problem is probably that I am searching for the wrong parameter with my .select statement. Please keep in mind that I am very new to java/android development. Here is my code:

private class FetchAnton extends AsyncTask<Void, Void, Void> {

    String price;
    String url = "http://www.antoncoop.com/markets/cash.php";


    @Override
    protected Void doInBackground(Void... params) {
        try {

            Document document = Jsoup.connect(url).get();                     
            price = String.valueOf(document.select("quotes['KEH15']"));

        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {

        TextView priceTextView = (TextView) findViewById(R.id.priceTextView);
        priceTextView.setText(price);

    }

}

这是HTML部分的引号['KEH15']是指(滚动到右):

And here is the HTML section that the "quotes['KEH15']" refers to (scroll to the right):

</thead>
            <tbody>
                    <script language="javascript">

                        writeBidRow('Wheat',-60,false,false,false,0.5,'01/15/2015','02/26/2015','All','&nbsp;','&nbsp;',60,'even','c=2246&l=3519&d=G15',quotes['KEH15'], 0-0);
                        writeBidRow('Wheat',-65,false,false,false,0.5,'07/01/2015','07/31/2015','All','&nbsp;','&nbsp;',60,'odd','c=2246&l=3519&d=N15',quotes['KEN15'], 0-0);
                </script>

我需要得到重新presents的引号['KEH15']HTML到叫价字符串的槽值。当我运行该程序,我从默认的字符串TXT视图变成一片空白。所以我认为,code是工作,但文中观点正与空字符串更新。任何人都可以请帮我解决这个问题?

I need to get the value that is represents the "quotes['KEH15']" slot of the html into the string called price. When I run the program, my txt view changes from the default string into a blank. So I think the code is working, but the text view is being updated with a blank string. Can anyone please help me fix this problem?

感谢您的帮助。

基思

推荐答案

作为@ njzk2提到你需要一个JavaScript引擎来做到这一点。让我解释(因为你是一个初学者,我会保持它痛苦地详细的在这里)。
Jsoup只是一个解析器。这意味着

As @njzk2 mentioned you need a javascript engine to do that. Let me elaborate (since you are a beginner I'm going to keep it painfully detailed here). Jsoup is just a parser. What this means is


  • 这将使一个HTTP调用您提供的网址,将检索响应,HTTP响应。这种反应,其中一些其他的事情(头等等,了解更多关于 HTTP 如果你想详细信息)包括你以后的HTML。

  • 将通过创建给你所有您在本教程中读到这些不错的功能(CSS选择器和等)
  • 相应的Java对象,生成HTML的结构化再presentation
  • It will make an HTTP call to the url you provided and will retrieve a response, an HTTP response. This response, among some other things (headers etc, read more on HTTP if you want details), will include the HTML you are after.
  • It will generate a structured representation of that HTML by creating appropriate java objects that give you all those nice features that you read about in the tutorial (css selectors and such)

由于这是前面提到的Jsoup只是一个分析器。据检索信息,仅此而已。这意味着它无法执行code产生新的HTML片段。
这是一个实验。访问的URL(脸谱,Gmail时,计算器,不管你的作品,但你一定有很多JS背后)。当你在该页面preSS按Ctrl + U使用Chrome。它会打开一个新标签。此选项卡显示你到底是什么HTML从服务器收到任何JavaScript被执行死刑,并产生了新的HTML(如通知你在Facebook上,当你有一个消息)之前。现在回到页面并preSS F12代替。它会打开开发工具。在这里,你会看到不同的东西。这是由浏览器呈现实际的HTML。
当您使用Jsoup,那么你的程序有可用的是第一个HTML,任何JavaScript执行之前的一个,这是因为Jsoup无法执行JavaScript,因为仅仅是一个解析器。这不是一个浏览器。浏览器可以呈现更多的内容,因为它可以执行JavaScript的code,因为它有一个JavaScript引擎。

As it was mentioned earlier Jsoup is just a parser. It retrieves information, nothing more. Which means it can't execute code to produce new HTML pieces. Here is an experiment. Visit a url (facebook, gmail, stackoverflow, whatever works for you, but you are certain that has a lot of js behind it). When you are in that page press Ctrl+U with Chrome. It will open a new tab. This tab shows you exactly what HTML was received from the server, before any javascript was executed and produced new HTML (like the notifications you get on facebook when you have a message). Now go back to the page and press F12 instead. It will open the development tools. Here you are going to see something different. This is the actual HTML rendered by the browser. When you are using Jsoup, then what your program has available is the first HTML, the one before any javascript is executed and that's because Jsoup can't execute javascript, because is just a parser. It's not a browser. A browser can render the additional content, because it can execute javascript code, because it has a javascript engine.

有给你两个选择。


  1. 如果您想要执行的JavaScript是简单的东西,它没有做任何复杂的DOM操作,它只是产生一些字符串或诸如此类的话,我想你可以使用的的ScriptEngine 可以在Java 7中发现的,它可以处理JavaScript的执行。你要知道,它的JavaScript,而不是jQuery的。的ScriptEngine不是浏览器。检查的教程,看看你能更详细地完成。

  2. 如果的ScriptEngine缺少那么你就留下一个模拟浏览器(没有GUI浏览器)。无头浏览器是自动化任务的浏览器。检查硒的webdriver 。它们被大量使用在Web应用程序测试,网站等。我不知道,如果你可以在你的Andr​​oid应用程序,虽然使用它。它是足够大(这是完全正常的,因为它提供了一个可怕的很多),并具有一定的相关性,我相信,不与Android(相同的类不同的实施等),以及发挥。反正我没做过,所以我不是100%肯定这一点。你要看看你自己。虽然你可以做一个Web应用程序,完成所有的解析,并将它暴露你的应用程序使用Web服务。

  1. If the javascript you want to execute is something simple, and it doesn't do any "complex" DOM manipulation, it just generates some string or whatnot then I suppose you could use ScriptEngine that can be found in Java 7 and it can handle the execution of javascript. Mind you, it's javascript, not jQuery. ScriptEngine is not a browser. Check a tutorial to see what you can accomplish in greater detail.
  2. If ScriptEngine is lacking then you are left with a headless browser (a browser without GUI). A headless browser is a browser for automated tasks. Check selenium webdriver. They are used heavily in testing of web applications, sites etc. I don't know if you can use it in your android application though. It is big enough (which is perfectly normal, since it offers an awful lot) and has some dependencies that, I believe, do not play well with android (same classes different implementation etc). Anyway, I haven't done it, so I'm not 100% certain about this. You have to check it out yourself. Although you could make a web application, that does all the parsing, and it exposes a web service for your app to use.

这篇关于使用Jsoup摆脱html源代码code数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆