类型错误:“邮政编码"对象不可下标 [英] TypeError: 'Zipcode' object is not subscriptable
问题描述
我正在使用 Python3 并且有一个看起来像的 Pandas df
I'm using Python3 and have a pandas df that looks like
zip
0 07105
1 00000
2 07030
3 07032
4 07032
我想使用 python 包 uszipcode
I would like to add state and city using the python package uszipcode
import uszipcode
search = SearchEngine(simple_zipcode=False)
def zco(x):
print(search.by_zipcode(x)['City'])
df['City'] = df[['zip']].fillna(0).astype(int).apply(zco)
但是,我收到以下错误
TypeError: 'Zipcode' object is not subscriptable
有人可以帮忙解决这个错误吗?提前致谢.
Can someone help with the error? Thank you in advance.
推荐答案
调用 search.by_zipcode(x)
返回一个 ZipCode()
实例,不是字典,所以应用 ['City']
到该对象失败.
The call search.by_zipcode(x)
returns a ZipCode()
instance, not a dictionary, so applying ['City']
to that object fails.
相反,使用较短别名的 .major_city
属性,.city
属性;你想返回那个值,而不是打印它:
Instead, use either the .major_city
attribute of the shorter alias, the .city
attribute; you want to return that value, not print it:
def zco(x):
return search.by_zipcode(x).city
如果您将使用 uszipcode
项目的全部目的是将邮政编码映射到州和城市名称,则不需要使用完整的数据库(下载 450MB).坚持使用只有 9MB 的简单"版本,将 simple_zipcode=False
参数省略给 SearchEngine()
.
If all you are going to use the uszipcode
project for is mapping zip codes to state and city names, you don’t need to use the full database (a 450MB download). Just stick with the ‘simple’ version, which is only 9MB, by leaving out the simple_zipcode=False
argument to SearchEngine()
.
接下来,这将真的很慢..apply()
在底层使用一个简单的循环,对于每一行 .by_zipcode()
方法将使用 SQLAlchemy 查询 SQLite 数据库,创建一个单一的结果对象匹配行中的所有列,然后返回该对象,以便您可以从中获取单个属性.
Next, this is going to be really really slow. .apply()
uses a simple loop under the hood, and for each row the .by_zipcode()
method will query a SQLite database using SQLAlchemy, create a single result object with all the columns from the matching row, then return that object, just so you can get a single attribute from them.
你最好直接查询数据库,使用 Pandas SQL 方法.uszipcode
包在这里仍然很有用,因为它可以为您下载数据库并创建 SQLAlchemy 会话,SearchEngine.ses
属性 可让您直接访问它,但我会从那里做:
You'd be much better off querying the database directly, with the Pandas SQL methods. The uszipcode
package is still useful here as it handles downloading the database for you and creating a SQLAlchemy session, the SearchEngine.ses
attribute gives you direct access to it, but from there I'd just do:
from uszipcode import SearchEngine, SimpleZipcode
search = SearchEngine()
query = (
search.ses.query(
SimpleZipcode.zipcode.label('zip'),
SimpleZipcode.major_city.label('city'),
SimpleZipcode.state.label('state'),
).filter(
SimpleZipcode.zipcode.in_(df['zip'].dropna().unique())
)
).selectable
zipcode_df = pd.read_sql_query(query, search.ses.connection(), index_col='zip')
创建一个 Pandas 数据框,其中所有唯一的邮政编码都映射到城市和州列.然后,您可以将您的数据框与邮政编码数据框结合起来一个>:
to create a Pandas Dataframe with all your unique zipcodes mapped to city and state columns. You can then join your dataframe with the zipcode dataframe:
df = pd.merge(df, zipcode_df, how='left', left_on='zip', right_index=True)
这会将 city
和 state
列添加到您的原始数据框中.如果您需要引入更多列,请将它们添加到 search.ses.query(...)
部分,使用 .label()
为它们提供合适的列输出数据帧中的名称(没有 .label()
,它们将以 simple_zipcode_
或 zipcode_
为前缀,具体取决于您所在的类使用).从记录的模型属性中选择,但要考虑到如果您需要访问 完整的 Zipcode
模型属性,您需要使用 SearchEngine(simple_zipcode=False)
来确保获得完整的 450MB 数据集,然后使用 Zipcode.
而不是 SimpleZipcode.
在查询中.
This adds city
and state
columns to your original dataframe. If you need to pull in more columns, add them to the search.ses.query(...)
portion, using .label()
to give them a suitable column name in the output dataframe (without a .label()
, they'll get prefixed with simple_zipcode_
or zipcode_
, depending on the class you are using). Pick from the model attributes documented, but take into account that if you need access to the full Zipcode
model attributes you need to use SearchEngine(simple_zipcode=False)
to ensure you get the full 450MB dataset at your disposal, then use Zipcode.<column>.label(...)
instead of SimpleZipcode.<column>.label(...)
in the query.
使用邮政编码作为 zipcode_df
数据帧中的索引,这将比在每一行上单独使用 SQLAlchemy 快得多 (zippier :-)).
With the zipcodes as the index in the zipcode_df
dataframe, that's going to be a lot faster (zippier :-)) than using SQLAlchemy on each row individually.
这篇关于类型错误:“邮政编码"对象不可下标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!