数据收集时解析问题 [英] Parsing Problems while Data Scraping
问题描述
认为我的解析有问题.我在Android Studio中使用JSoup和Java.我试图从本地html文件中抓取数据并显示在我的应用中.但是,当我运行该应用程序时,我想要的数据没有出现.我想显示"9:00"和"9:15"之类的时间.还有声音","P2016"和"P.Mann". html看起来像这样:
Think im having a problem with my parsing. Im using JSoup and Java in Android Studio. Im trying to data scrape info from a local html file and display on my app. But when i run the app the data i want doesnt appear. I want to display the times like "9:00" and "9:15". Also "Sound", "P2016" and "P.Mann". The html looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns='http://www.w3.org/1999/xhtml'>
<head><title>timetable.html</title><meta http-equiv='content- disposition', content='attachment;filename=timetable.html'>
<meta http-equiv="Content-Type" content="application/octet-stream" />
<style>body {background-color:white;} body,td { font-family: arial; } </style></head>
<data>
<body>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
</data>
<tr>
<td></td><td></td><td></td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
<tr>
<td></td><td></td><td></td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
<tr>
<td></td><td></td><td></td>
</tr>
</table>
</td>
</tr><tr>
<td>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
<tr>
<td><table border='0' width='100%'> <tr>
<td width='40%' align='left' valign='middle'><font face='arial' size='3'><b>The Year<font size='1'> </td><td width='20%' align='center' valign='middle'><font face='arial' size='1'>ICOM</td><td width='40%' align='right' valign='middle'><font face='arial' size='2'><b>Weeks selected for output: 26 (22 Feb 2016-28 Feb 2016)</td></td><td></td><td></td>
</tr>
</table>
</td>
</tr>
</table>
<table cellspacing='0' border='1'>
<tr>
<td></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>10:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>11:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>12:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>13:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>14:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>15:45</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:00</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:15</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:30</font></td>
<td bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>16:45</font></td>
</tr>
<tr >
<td style="border-bottom:3px solid #000000;" rowspan='1' bgcolor='#C0C0C0'>
<font color='#FFFFFF'>Mon</font></td>
<td style="border-bottom:3px solid #000000;" colspan='12' rowspan='1' >
<table cellspacing='0' border='0' width='100%'>
<col align='left' />
<tr>
<td align='left'><font color='#FF0000'>Sound</font></td>
</tr>
</table>
<table cellspacing='0' border='0' width='100%'>
<col align='left' />
<col align='right' />
<tr>
<td align='left'><font color='#000000'>P2016</font></td>
<td align='right'><font color='#008000'>P.Man</font></td>
</tr>
</table>
<table cellspacing='0' border='0' width='100%'>
<col align='left' />
<tr>
<td align='left'><font color='#000080'>22-29, 32-36</font></td>
</tr>
</table>
</td>
<td style="border-bottom:3px solid #000000;" > </td>
<td style="border-bottom:3px solid #000000;" > </td>
<td style="border-bottom:3px solid #000000;" > </td>
<td style="border-bottom:3px solid #000000;" > </td>
<td style="border-bottom:3px solid #000000;" colspan='4' rowspan='1' >
这是java的样子:
import android.app.Activity;
import android.os.Bundle;
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import android.widget.TextView;
public class MainActivity extends Activity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
try {
File input = new File("C:\\Users\\user\\Desktop\\Mobile Newest\\JSoup\\app\\src\\main\\assets\\filename.html");
Document doc = Jsoup.parse(input, "UTF-8");
Elements tableElements = doc.select("td");
TextView textView = (TextView)findViewById(R.id.text_view);
for (Element td : tableElements) {
textView.setText(td.text());
System.out.println(td.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
有人知道为什么它不出现在屏幕上吗?
Anyone have any idea why it doesnt come up on the screen?
推荐答案
问题是,每次调用textView.setText(td.text())
时,textview都会被当前的td.text()
替换,因此最后您只能看到html文件的 LAST td
元素文本,该文本本质上是一个空格
The problem is, everytime you call textView.setText(td.text())
, the textview is replaced by the current td.text()
, so in the end you will only be able to see the LAST td
element text of your html file, which is essentially a blank space
这篇关于数据收集时解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!