尝试将字符串转换为整数的 pandas 错误 [英] Pandas error trying to convert string into integer

查看:70
本文介绍了尝试将字符串转换为整数的 pandas 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要求:

DataFrame中的一个特定列是混合类型。它可以具有 123456 ABC12345 之类的值。

One particular column in a DataFrame is 'Mixed' Type. It can have values like "123456" or "ABC12345".

正在使用xlsxwriter将数据框写入Excel。

This dataframe is being written into an Excel using xlsxwriter .

对于 123456 这样的值,熊猫将其转换为 123456.0 (使其看起来像个浮点数)

For values like "123456", down the line Pandas converting it into 123456.0 ( Making it look like a float)

我们需要将其放入如果值是完全数字,则xlsx为123456(即+整数)。

We need to put it into xlsx as 123456 (i.e as +integer) in case value is FULLY numeric.

Effort:

代码段如下图所示

import pandas as pd
import numpy as np
import xlsxwriter
import os
import datetime
import sys
excel_name = str(input("Please Enter Spreadsheet Name :\n").strip())

print("excel entered :   "   , excel_name)
df_header = ['DisplayName','StoreLanguage','Territory','WorkType','EntryType','TitleInternalAlias',
         'TitleDisplayUnlimited','LocalizationType','LicenseType','LicenseRightsDescription',
         'FormatProfile','Start','End','PriceType','PriceValue','SRP','Description',
         'OtherTerms','OtherInstructions','ContentID','ProductID','EncodeID','AvailID',
         'Metadata', 'AltID', 'SuppressionLiftDate','SpecialPreOrderFulfillDate','ReleaseYear','ReleaseHistoryOriginal','ReleaseHistoryPhysicalHV',
          'ExceptionFlag','RatingSystem','RatingValue','RatingReason','RentalDuration','WatchDuration','CaptionIncluded','CaptionExemption','Any','ContractID',
          'ServiceProvider','TotalRunTime','HoldbackLanguage','HoldbackExclusionLanguage']
first_pass_drop_duplicate = df_m_d.drop_duplicates(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType',
                                   'LicenseRightsDescription','FormatProfile','Start','End','PriceType','PriceValue','ContentID','ProductID',
                                   'AltID','ReleaseHistoryPhysicalHV','RatingSystem','RatingValue','CaptionIncluded'], keep=False) 
# We need to keep integer AltID  as is

first_pass_drop_duplicate.loc[first_pass_drop_duplicate['AltID']] =   first_pass_drop_duplicate['AltID'].apply(lambda x : str(int(x)) if str(x).isdigit() == True else x)

我尝试过:

1. using `dataframe.astype(int).astype(str)` # works as long as value is not alphanumeric
2.importing re and using pure python `re.compile()` and `replace()` -- does not work
3.reading DF row by row in a for loop !!! Kills the machine as dataframe can have 300k+ records

每次,我得到的错误是:

Each time, error I get:


raise KeyError('%s not in index'%objarr [mask])

KeyError:'[102711. 102711. 102711。 102711.102711.102711.102711.102711.\n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.102711.\ n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711 102711.102711.n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.102n.5337.5337。 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337 。5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\ 5337. 5337. 2124. 2124. 2124. 2124. 2124. 2124.nn 2124. 2124. 6643. 6643. 6643. 6643. 6643. 6643.nn 6643. 6643. 6643. 6643. 6643。 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.]不在索引'

raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 2124. 2124. 2124. 2124. 2124. 2124.\n 2124. 2124. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.] not in index'

我是python / pandas的新手,非常感谢您的帮助和解决方案。

I am newbie in python/pandas , any help, solution is much appreciated.

推荐答案

我认为您需要 to_numeric

I think you need to_numeric:

df = pd.DataFrame({'AltID':['123456','ABC12345','123456'],
                   'B':[4,5,6]})

print (df)
      AltID  B
0    123456  4
1  ABC12345  5
2    123456  6

df.ix[df.AltID.str.isdigit(), 'AltID']  = pd.to_numeric(df.AltID, errors='coerce')

print (df)
      AltID  B
0    123456  4
1  ABC12345  5
2    123456  6

print (df['AltID'].apply(type))
0    <class 'float'>
1      <class 'str'>
2    <class 'float'>
Name: AltID, dtype: object

这篇关于尝试将字符串转换为整数的 pandas 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆