数据收集时解析问题 [英] Parsing Problems while Data Scraping

查看:46
本文介绍了数据收集时解析问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

认为我的解析有问题.我在Android Studio中使用JSoup和Java.我试图从本地html文件中抓取数据并显示在我的应用中.但是,当我运行该应用程序时,我想要的数据没有出现.我想显示"9:00"和"9:15"之类的时间.还有声音","P2016"和"P.Mann". html看起来像这样:

Think im having a problem with my parsing. Im using JSoup and Java in Android Studio. Im trying to data scrape info from a local html file and display on my app. But when i run the app the data i want doesnt appear. I want to display the times like "9:00" and "9:15". Also "Sound", "P2016" and "P.Mann". The html looks like this:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns='http://www.w3.org/1999/xhtml'>
    <head><title>timetable.html</title><meta http-equiv='content-  disposition', content='attachment;filename=timetable.html'>
    <meta http-equiv="Content-Type" content="application/octet-stream" />

   <style>body {background-color:white;} body,td { font-family: arial; }                     </style></head> 
   <data>
   <body>
   <table cellspacing='0' border='0' width='100%' >
  <col align='left' /><col align='center' /><col align='right' />
  </data>
  <tr>
  <td></td><td></td><td></td>
  </tr>
  </table>
  </td>
  </tr><tr>
  <td>
  <table cellspacing='0' border='0' width='100%' >
  <col align='left' /><col align='center' /><col align='right' />
  <tr>
  <td></td><td></td><td></td>
  </tr>
  </table>
  </td>
  </tr><tr>
  <td>
  <table cellspacing='0' border='0' width='100%' >
  <col align='left' /><col align='center' /><col align='right' />
  <tr>
  <td></td><td></td><td></td>
  </tr>
  </table>
  </td>
  </tr><tr>
  <td>
  <table cellspacing='0' border='0' width='100%' >
  <col align='left' /><col align='center' /><col align='right' />
  <tr>
  <td><table border='0' width='100%'>                                                                        <tr>  
  <td width='40%' align='left' valign='middle'><font face='arial' size='3'><b>The Year<font size='1'> </td><td    width='20%' align='center' valign='middle'><font face='arial' size='1'>ICOM</td><td width='40%' align='right' valign='middle'><font face='arial' size='2'><b>Weeks selected for output: 26 (22 Feb 2016-28 Feb 2016)</td></td><td></td><td></td>
    </tr>
    </table>
    </td>
    </tr>
    </table>
   <table  cellspacing='0'  border='1'>
   <tr>
   <td></td>

   <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:45</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:00</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:15</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:30</font></td>
<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:45</font></td>
      </tr>
      <tr >
      <td style="border-bottom:3px solid #000000;" rowspan='1'   bgcolor='#C0C0C0'>  

    <font color='#FFFFFF'>Mon</font></td>
<td style="border-bottom:3px solid #000000;"  colspan='12' rowspan='1' >

    <table  cellspacing='0' border='0' width='100%'>
    <col align='left' />
    <tr>
    <td align='left'><font color='#FF0000'>Sound</font></td>
    </tr>
    </table>
    <table  cellspacing='0' border='0' width='100%'>
    <col align='left' />
    <col align='right' />
    <tr>
    <td align='left'><font color='#000000'>P2016</font></td>
    <td align='right'><font color='#008000'>P.Man</font></td>
    </tr>
    </table>
    <table  cellspacing='0' border='0' width='100%'>
    <col align='left' />
    <tr>
    <td align='left'><font color='#000080'>22-29, 32-36</font></td>
    </tr>
    </table>

    </td>
<td style="border-bottom:3px solid #000000;" >&nbsp;</td>
<td style="border-bottom:3px solid #000000;" >&nbsp;</td>
<td style="border-bottom:3px solid #000000;" >&nbsp;</td>
<td style="border-bottom:3px solid #000000;" >&nbsp;</td>
<td style="border-bottom:3px solid #000000;"  colspan='4' rowspan='1' >

这是java的样子:

    import android.app.Activity;
    import android.os.Bundle;
    import java.io.File;
    import java.io.IOException;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    import android.widget.TextView;

 public class MainActivity extends Activity {

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);
    try {

        File input = new File("C:\\Users\\user\\Desktop\\Mobile Newest\\JSoup\\app\\src\\main\\assets\\filename.html");
        Document doc = Jsoup.parse(input, "UTF-8");
        Elements tableElements = doc.select("td");
        TextView textView = (TextView)findViewById(R.id.text_view);
        for (Element td : tableElements) {
            textView.setText(td.text());
            System.out.println(td.text());
        }

    } catch (IOException e) {
        e.printStackTrace();
    }

   }
 }

有人知道为什么它不出现在屏幕上吗?

Anyone have any idea why it doesnt come up on the screen?

推荐答案

问题是,每次调用textView.setText(td.text())时,textview都会被当前的td.text()替换,因此最后您只能看到html文件的 LAST td元素文本,该文本本质上是一个空格

The problem is, everytime you call textView.setText(td.text()), the textview is replaced by the current td.text(), so in the end you will only be able to see the LAST td element text of your html file, which is essentially a blank space

这篇关于数据收集时解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆