如何在从美国人口普查 HTML 站点上抓取数据时降低因子水平 [英] How to drop factor levels while scraping data off US Census HTML site

查看:38
本文介绍了如何在从美国人口普查 HTML 站点上抓取数据时降低因子水平的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

预先感谢您的帮助.在美国人口普查网站(下图)上,我正在寻找第 4 个表格的第 6 行第 3 列中的元素.

这是我正在编写的代码:

complete_URL <-http://quickfacts.census.gov/qfd/states/01/01011.html"temp_TBL <- readHTMLTable(complete_URL, which=4)business_number_vector <- temp_TBL[6,3]打印(business_number_vector)

我得到的是:

[1] 417等级:417

我想要的是:

[1] 417

再次感谢您的帮助!

解决方案

实际上是 R-FAQ 7.10:

您应该能够通过 R-help() 系统查看常见问题解答.在我的机器上它被设置为 html:

http://127.0.0.1:23603/doc/manual/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

<块引用>

7.10 如何将因子转换为数字?

当将数字数据读入 R 时(通常是读入文件时),它们可能会作为因子出现.如果 f 是这样的因子对象,则可以使用

as.numeric(as.character(f))取回数字.更有效但更难记住的是

as.numeric(levels(f))[as.integer(f)]在任何情况下,不要为手头的任务直接调用 as.numeric() 或他们喜欢的东西(因为 as.numeric() 或 unclass() 给出了内部代码).

Thank you in advance for your help. On the US Census website (below), I am looking for an element in the 6th row, 3rd column of the 4th table.

Here's the code I am writing:

complete_URL <- "http://quickfacts.census.gov/qfd/states/01/01011.html"
temp_TBL <- readHTMLTable(complete_URL, which=4)
business_number_vector <- temp_TBL[6,3]
print(business_number_vector)

What I get is:

[1] 417
Levels: 417

What I'd like is:

[1] 417

Thank you again so much for your help!

解决方案

It's actually R-FAQ 7.10:

You should be able to see the FAQ with your R-help() system. On my machine it is set up as html:

http://127.0.0.1:23603/doc/manual/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

7.10 How do I convert factors to numeric?

It may happen that when reading numeric data into R (usually, when reading in a file), they come in as factors. If f is such a factor object, you can use

as.numeric(as.character(f)) to get the numbers back. More efficient, but harder to remember, is

as.numeric(levels(f))[as.integer(f)] In any case, do not call as.numeric() or their likes directly for the task at hand (as as.numeric() or unclass() give the internal codes).

这篇关于如何在从美国人口普查 HTML 站点上抓取数据时降低因子水平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆