这种不同的行为怎么可能?TypeError:无法散列的类型:'Point' [英] How is this different behaviour possible? TypeError: unhashable type: 'Point'

查看:80
本文介绍了这种不同的行为怎么可能?TypeError:无法散列的类型:'Point'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在GeoDataFrame(df1)中找到并过滤与第二个GDF(df2)中的点接近的点,反之亦然.我使用这段代码:

I try to find and filter the Points in a GeoDataFrame (df1) which are close to Points in a second GDF (df2), and vise versa. I use this piece of code for it:

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

df1 = df1[df1.geometry.isin(ps1)]
df2 = df2[df2.geometry.isin(ps2)]

但是,我在最后一行出现错误: TypeError:不可散列的类型:'Point'

However, I get an error on the last line: TypeError: unhashable type: 'Point'

但是它上面的行像一个超级按钮一样工作,并且这两行的数据类型(df1/df2和ps1/ps2)完全相同.

But the line above it works like a charm, and the data types of both lines (df1/df2 and ps1/ps2) are exactly the same.

那怎么可能?以及如何解决?

How is that possible? And how can it be solved?

变量类型:

df1         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df1.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps1         :  <class 'list'>
val1        :  <class 'pandas.core.series.Series'>
df2         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df2.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps2         :  <class 'list'>

df1.dtypes
Out[301]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

df2.dtypes
Out[302]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

MWE:

import pandas as pd
from pandas import Timestamp
import geopandas as gpd
import numpy as np

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371000):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

df1 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.37896949048437,
  2: 52.378654032960824,
  3: 52.37818902922923},
 'lon': {0: 4.88585622453752,
  1: 4.886671616078047,
  2: 4.886413945242339,
  3: 4.885995520636016},
 'time': {0: Timestamp('2019-11-05 11:31:42+0000', tz='UTC'),
  1: Timestamp('2019-11-05 11:32:22+0000', tz='UTC'),
  2: Timestamp('2019-11-05 11:32:49+0000', tz='UTC'),
  3: Timestamp('2019-11-05 11:33:31+0000', tz='UTC')}})
df2 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.369466977365214,
  2: 52.36923115238693,
  3: 52.36898222465506},
 'lon': {0: 4.88585622453752,
  1: 4.9121331184582,
  2: 4.912723204441477,
  3: 4.913505393878495},
 'time': {0: Timestamp('2019-11-05 08:54:32+0000', tz='UTC'),
  1: Timestamp('2019-11-05 08:55:06+0000', tz='UTC'),
  2: Timestamp('2019-11-05 08:55:40+0000', tz='UTC'),
  3: Timestamp('2019-11-05 08:56:22+0000', tz='UTC')}})

df1 = gpd.GeoDataFrame(df1, geometry=gpd.points_from_xy(df1.lat, df1.lon))
df2 = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2.lat, df2.lon))

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

val1 = gpd.GeoDataFrame(df1)
val2 = gpd.GeoDataFrame(df2)
# print(type(df1))
# print(type(df2))
# print(type(ps1))
# print(type(ps2))
print('df1         : ', type(df1))
print('df1.geometry: ', type(df1.geometry))
print('ps1         : ', type(ps1))
val1 = df1.geometry.isin(ps1)
print('val1        : ', type(val1))

print('df2         : ', type(df2))
print('df2.geometry: ', type(df2.geometry))
print('ps2         : ', type(ps2))
val2 = df2.geometry.isin(ps2)
print('val2        : ', type(val2))
# df1 = df1[df1.geometry.isin(ps1)]
# df2 = df2[df2.geometry.isin(ps2)]

推荐答案

正如错误所述,Point不可散列(因为?).

As the error says, Point is not hashable (since this?).

事实证明,出于我忽略的原因, pandas.Series.isin 函数似乎要求数据是可哈希的.请参阅我刚刚发布的问题.

It turns out, for a reason I ignore, the pandas.Series.isin function seems to require the data to be hashable. See the question I just posted.

对于您的问题,一种解决方法是使用列表,然后将其再次转换为Series,例如:

As for your question, a workaround would be to use lists, and convert it again to Series, like:

val2 = pd.Series([v in ps2 for v in df2.geometry])

这篇关于这种不同的行为怎么可能?TypeError:无法散列的类型:'Point'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆