从Web导入Mathematica中的表 - 空单元格问题 [英] Importing tables in Mathematica from web - empty cell problem

查看:118
本文介绍了从Web导入Mathematica中的表 - 空单元格问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用:

data = Import [http:// weburl /,Data]

从一个站点导入数据。在该页面上有表格。这将创建嵌套列表,您可以轻松地以表格形式获取数据。例如:

Grid [data [[1]]]

会给出这样的结果:

I use:
data=Import["http://weburl/","Data"]
to import data from one site. On that page there are tables. This creates nested lists, and you can easily get the data in table form. For example:
Grid[data[[1]]]
would give something like this:

Player Age Shots Goals
  P1    24    10    2 
  P2    22     5    0
  P3    28    11    1
  ...

现在,问题出在这里。如果html表中的一个单元格为空,例如Age的条目,那么在html中,这将是这样的:< td>< / td> 。 Mathematica根本不包括它在列表中,甚至不包括,例如,Null值。相反,这一行只是由长度为3的列表表示,数据将被移动一列,因此您将获得镜头代替年龄和目标代替镜头和目标 会空的。
例如,一个年龄未知的P4(html表格中的空单元格),有10个镜头并且得分为0的目标将被导入长度为3而不是4的列表并移动一个:

Now, here is the problem. If one cell in the html table is empty, for example an entry for "Age", then in html this would look like this: <td></td>. Mathematica doesn't include take it in the list at all, not even as, for example, a "Null" value. Instead, this row would just be represented by a list of length 3 and data would be moved by one column, so you'd get "Shots" in place of "Age" and "Goals" in place of "Shots" and "Goals" would be empty. For example, a "P4" whos age is unknown (empty cell in html table), who had 10 shots and scored 0 goals would be imported as list of length 3 not 4 and moved by one:

Player Age Shots Goals
  P1    24    10    2 
  P2    22     5    0
  P3    10     0  
  ...

这会带来一个难题,因为如果你有一些空字段那么你无法从列表中分辨出它属于哪一列。在Mathematica中导入时,有没有办法在html表中的空单元格上放置Null?例如,列表中的P4元素如下所示:

data [[1,5]]

{P4,Null,10,0}

而不是:

{P4 ,10,0}

This poses a difficult problem, because if you have a few empty fields then you can't tell from the list to which column it belongs. Is there a way to put a "Null" on an empty cell in html tables when importing in Mathematica? For example, P4 element in list would look like this:
data[[1,5]]
{"P4","Null",10,0}
instead of:
{"P4",10,0}

推荐答案

正如lumeng指出的那样,你可以使用 FullData 以使HTML表格元素正确填写。以下是对此的简单说明。

As lumeng points out, you can use FullData to get the HTML table element to fill out properly. Here's a simpler illustration of this.

in = ImportString["\<<html><table>
   <tr>
   <td>(1,1)</td>
   <td>(1,2)</td>
   <td>(1,3)</td>
   </tr>
   <tr>
   <td>(2,1)</td>
   <td></td>
   <td>(2,3)</td>
   </tr>
   </table></html>\>",
   {"HTML", "FullData"}];
Grid[in[[1, 1]]]

如果你想要更完整的控制输出,我建议你导入页面为XML。这是一个例子。

If you want more complete control of the output, I'd suggest that you Import the page as XML. Here's an example.

in = ImportString["\<<html><table>
    <tr>
    <td>(1,1)</td>
    <td>(1,2)</td>
    <td>(1,3)</td>
    </tr>
    <tr>
    <td>(2,1)</td>
    <td></td>
    <td>(2,3)</td>
    </tr>
    </table></html>\>", "XML"];
Column[Last /@ Cases[in,
   XMLElement["td", ___], Infinity]]

您需要阅读一般的XML和Mathematica的版本,即 XMLObject 。不过,一旦你掌握了它,这是一种乐趣。

You'll need to read up a bit on XML in general and Mathematica's version, namely the XMLObject. It's a delight to work with, once you get the hang of it, though.

这篇关于从Web导入Mathematica中的表 - 空单元格问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆