float()参数必须是字符串或数字,而不是'Timestamp' [英] float() argument must be a string or a number, not 'Timestamp'

查看:109
本文介绍了float()参数必须是字符串或数字,而不是'Timestamp'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使scilearn与datetime系列配合工作.

I can't make scilearn work with a datetime series.

找到了这篇文章,但对我没有帮助= Pandas:TypeError:float()参数必须为字符串或数字

found this post but did not help me = Pandas : TypeError: float() argument must be a string or a number

csv文件有2个带有日期的日期列,日期格式如下: 2017-07-21 06:19:53(string)

the csv file has 2 date columns with a date, dates are in the following format: 2017-07-21 06:19:53 (string)

我将字符串转换为datetime64 [ns],因此日期变成了一个长值,因此我可以对其进行计算. scilearn拒绝这种类型,并给出错误float()参数必须是字符串或数字,而不是'Timestamp'

i converted the string to an datetime64[ns], so the date became a long value and i could do calculations on it. scilearn refuses this type and gives the error float() argument must be a string or a number, not 'Timestamp'

还尝试使用pandas.to_datetime()算不上运气.

also tried with pandas.to_datetime() no luck.

我在scilearn中使用的模型是KMeans聚类模型. 打印dtype时,结果如下:

the model i use in scilearn is the KMeans clustering model. when printing the dtypes this is the result:

ip                      int64
date           datetime64[ns]
succesFlag              int64
app                     int64
enddate        datetime64[ns]
user_userid             int64
dtype: object

这是我的代码:

def getDataframe():
    df = pd.read_csv(filename)
    df['date']=df['date'].astype('datetime64[ns]',inplace=True)
    df['enddate']=df['enddate'].astype('datetime64[ns]',inplace=True)
    df['app']=df['app'].replace({
            "Azure": 0 ,
            "Peoplesoft":1,
            "Office":2 ,
            "DevOps":3 ,
            "Optima":4 ,
            "Ada-Tech": 5 
         },inplace=True)    
    df['ip']=df['ip'].apply(lambda x: int(ip4.ip_address(x))).to_frame('ip')
    print(df.dtypes)
    return df

人们期望KMeans聚类模型可以在我转换数值时使用数字值,但事实并非如此.

the expectation was that KMeans clustering model would work with numerical values as i converted them but it did not.

我怎么了?

推荐答案

我建议更改您的解决方案-一个但也要简化:

I suggest change your solution - a but simplify also:

  • 添加参数parse_dates用于将列转换为日期时间,然后转换为数字 unix日期时间
  • 用于转换删除inplace=True或使用更快的 map -它还为不匹配的值创建NaN,因此输出也为数字
  • add parameter parse_dates for converting columns to datetimes and then to numeric unix datetimes
  • for converting remove inplace=True or use faster map - it also create NaNs for non matched values, so output is numeric too
def getDataframe():
    df = pd.read_csv(filename, parse_dates=['date','enddate'])
    df[['date','enddate']] = df[['date','enddate']].astype(np.int64) // 10**9

    df['app']=df['app'].map({
            "Azure": 0 ,
            "Peoplesoft":1,
            "Office":2 ,
            "DevOps":3 ,
            "Optima":4 ,
            "Ada-Tech": 5 
         })    
    df['ip']=df['ip'].apply(lambda x: int(ip4.ip_address(x))).to_frame('ip')
    print(df.dtypes)
    return df

这篇关于float()参数必须是字符串或数字,而不是'Timestamp'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆