尝试将字符串转换为整数的 pandas 错误 [英] Pandas error trying to convert string into integer
问题描述
要求:
DataFrame中的一个特定列是混合类型。它可以具有 123456
或 ABC12345
之类的值。
One particular column in a DataFrame is 'Mixed' Type. It can have values like "123456"
or "ABC12345"
.
正在使用xlsxwriter将数据框写入Excel。
This dataframe is being written into an Excel using xlsxwriter .
对于 123456
这样的值,熊猫将其转换为 123456.0
(使其看起来像个浮点数)
For values like "123456"
, down the line Pandas converting it into 123456.0
( Making it look like a float)
我们需要将其放入如果值是完全数字,则xlsx为123456(即+整数)。
We need to put it into xlsx as 123456 (i.e as +integer) in case value is FULLY numeric.
Effort:
代码段如下图所示
import pandas as pd
import numpy as np
import xlsxwriter
import os
import datetime
import sys
excel_name = str(input("Please Enter Spreadsheet Name :\n").strip())
print("excel entered : " , excel_name)
df_header = ['DisplayName','StoreLanguage','Territory','WorkType','EntryType','TitleInternalAlias',
'TitleDisplayUnlimited','LocalizationType','LicenseType','LicenseRightsDescription',
'FormatProfile','Start','End','PriceType','PriceValue','SRP','Description',
'OtherTerms','OtherInstructions','ContentID','ProductID','EncodeID','AvailID',
'Metadata', 'AltID', 'SuppressionLiftDate','SpecialPreOrderFulfillDate','ReleaseYear','ReleaseHistoryOriginal','ReleaseHistoryPhysicalHV',
'ExceptionFlag','RatingSystem','RatingValue','RatingReason','RentalDuration','WatchDuration','CaptionIncluded','CaptionExemption','Any','ContractID',
'ServiceProvider','TotalRunTime','HoldbackLanguage','HoldbackExclusionLanguage']
first_pass_drop_duplicate = df_m_d.drop_duplicates(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType',
'LicenseRightsDescription','FormatProfile','Start','End','PriceType','PriceValue','ContentID','ProductID',
'AltID','ReleaseHistoryPhysicalHV','RatingSystem','RatingValue','CaptionIncluded'], keep=False)
# We need to keep integer AltID as is
first_pass_drop_duplicate.loc[first_pass_drop_duplicate['AltID']] = first_pass_drop_duplicate['AltID'].apply(lambda x : str(int(x)) if str(x).isdigit() == True else x)
我尝试过:
1. using `dataframe.astype(int).astype(str)` # works as long as value is not alphanumeric
2.importing re and using pure python `re.compile()` and `replace()` -- does not work
3.reading DF row by row in a for loop !!! Kills the machine as dataframe can have 300k+ records
每次,我得到的错误是:
Each time, error I get:
raise KeyError('%s not in index'%objarr [mask])
KeyError:'[102711. 102711. 102711。 102711.102711.102711.102711.102711.\n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.102711.\ n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711 102711.102711.n 102711.102711.102711.102711.102711.102711.102711.102711.nn 102711.102711.102711.102711.102711.102711.102711.102711.102n.5337.5337。 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337 。5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\ 5337. 5337. 2124. 2124. 2124. 2124. 2124. 2124.nn 2124. 2124. 6643. 6643. 6643. 6643. 6643. 6643.nn 6643. 6643. 6643. 6643. 6643。 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.]不在索引'
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 102711. 102711. 102711. 102711. 102711. 102711. 102711. 102711.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 5337. 5337. 5337. 5337. 5337. 5337.\n 5337. 5337. 2124. 2124. 2124. 2124. 2124. 2124.\n 2124. 2124. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.\n 6643. 6643. 6643. 6643. 6643. 6643. 6643. 6643.] not in index'
我是python / pandas的新手,非常感谢您的帮助和解决方案。
I am newbie in python/pandas , any help, solution is much appreciated.
推荐答案
我认为您需要 to_numeric
:
I think you need to_numeric
:
df = pd.DataFrame({'AltID':['123456','ABC12345','123456'],
'B':[4,5,6]})
print (df)
AltID B
0 123456 4
1 ABC12345 5
2 123456 6
df.ix[df.AltID.str.isdigit(), 'AltID'] = pd.to_numeric(df.AltID, errors='coerce')
print (df)
AltID B
0 123456 4
1 ABC12345 5
2 123456 6
print (df['AltID'].apply(type))
0 <class 'float'>
1 <class 'str'>
2 <class 'float'>
Name: AltID, dtype: object
这篇关于尝试将字符串转换为整数的 pandas 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!