Pandas脚本将数字修改为长浮点数,当它不应该修改该列/元素 [英] Pandas script modifying numbers to long float numbers when it shouldn't even be modifying that column/element

查看:171
本文介绍了Pandas脚本将数字修改为长浮点数,当它不应该修改该列/元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下面的熊猫脚本让我头疼,因为它不断修改我的数据,当它不应该,下面的例子可以重新创建100%完美的问题。 (让我永远找出导致这个问题的原因)



基本上,如果你将原始文件与修改的 testing2.csv ,您将看到如下数字: 0.357 从第一行变成: 0.35700000000000004 第2行数字 0.1128 不会改变...



应该 strong>正在修改这些数字,它们都应该是原样。



testing.py



  import re 
import pandas
#文本文件中的每个块都将是此列表的一个元素
matchers = [[]]
i = 0
with open('testing.txt')as infile:
for line in infile:
line = line.strip()
#块由空行分隔
如果len(line)== 0:
i + = 1
matchers.append([])
#假设在项目之间总是有两个空行
# lext行
infile.next()
continue
matchers [i] .append(line)


#这个正则表达式匹配变量号的学生在每个块
studentlike = re.compile('(\d +)(。+)(\d + / \d +)')
#这些是我们期望的字段的名称在每个块的结尾
datanames = ['Data','misc2','bla3']
#我们将构建一个包含每个学生的元素列表的表
table = ]
在匹配器中的匹配器:
#我们使用一个迭代器在块行上使索引更简单
it = iter(matcher)
#前两个元素是匹配值
m1,m2 = it.next(),it.next()
#然后有一些学生
学生= []
为可能在其中:
m = studentlike.match(possiblestudent)
如果m:
students.append(list(m.groups()))
else:
break
#学生来的数据元素,我们读入字典
#我们还添加在最后一个可能的学生行,因为不匹配学生re
dataitems = dict(item.split()for item最后,我们为学生中的学生构建表

#我们使用字典.get()方法为缺少的字段返回空格
tablebuffer([m1,m2] + student + [dataitems.get(d,'')for d in datanames])

textcols = ['MATCH2','MATCH1' ,'TITLE01','MATCH3','TITLE02','Data','misc2','bla3']
csvdata = pandas.read_csv('testing.csv')
textdata = pandas.DataFrame (table,columns = textcols)

#添加任何新列
newCols = textdata.columns - csvdata.columns
for c in newCols:
csvdata [c] = None

mergecols = ['MATCH2','MATCH1','MATCH3']
csvdata.set_index(mergecols,inplace = True,drop = False)
textdata.set_index (mergecols,inplace = True,drop = False)
csvdata.update(textdata)
csvdata.to_csv('testing2.csv',index = False)



testing.csv





testing.txt



  A)
DMATCH1
3 Tommy 144512/23332
1 Jim 90000/222311
1 Elz M 90000/222311
1 Ben 90000/222311
数据$ 50.90
misc2 $ 10.40
bla3 $ 20.20


MData(B / B)
DMATCH2
4 James Smith 2333/114441
4 Mike 90000/222311
4 Jessica Long 2333/114441
数据$ 50.90
bla3 $ 5.44

任何人都有任何想法如何解决这个问题?



提前感谢

- Hyflex

解决方案

尝试:)



  csvdata = pandas.read_csv 'testing.csv',dtype = {'TITLE5':'object','TITLE5.1':'object','TITLE5.2':'object','TITLE5.3':'object'})


I've got a pandas script below causing me a headache because it keeps modifying my data when it shouldn't be, the example below can re-create the issue 100% perfectly. (Took me forever to find out what was causing this problem)

Basically, if you compare the original file to the modified testing2.csv you'll see that numbers like: 0.357 from the first line turn into: 0.35700000000000004 yet on line 2 the number 0.1128 doesn't change at all...

It should NOT be modifying these numbers, they should all be as they are.

testing.py

import re
import pandas
# each block in the text file will be one element of this list
matchers = [[]]
i = 0 
with open('testing.txt') as infile:
    for line in infile:
        line = line.strip()
        # Blocks are seperated by blank lines
        if len(line) == 0:
            i += 1
            matchers.append([])
            # assume there are always two blank lines between items 
            # and just skip to the lext line
            infile.next()
            continue
        matchers[i].append(line)


# This regular expression matches the variable number of students in each block
studentlike = re.compile('(\d+) (.+) (\d+/\d+)')
# These are the names of the fields we expect at the end of each block
datanames = ['Data', 'misc2', 'bla3']
# We will build a table containing a list of elements for each student
table = []
for matcher in matchers:
    # We use an iterator over the block lines to make indexing simpler
    it = iter(matcher)
    # The first two elements are match values
    m1, m2 = it.next(), it.next()
    # then there are a number of students
    students = []
    for possiblestudent in it:
        m = studentlike.match(possiblestudent)
        if m:
            students.append(list(m.groups()))
        else:
            break
    # After the students come the data elements, which we read into a dictionary
    # We also add in the last possible student line as that didn't match the student re
    dataitems = dict(item.split() for item in [possiblestudent] + list(it))
    # Finally we construct the table
    for student in students:
        # We use the dictionary .get() method to return blanks for the missing fields
        table.append([m1, m2] + student + [dataitems.get(d, '') for d in datanames])

textcols = ['MATCH2', 'MATCH1', 'TITLE01', 'MATCH3', 'TITLE02', 'Data', 'misc2', 'bla3']
csvdata = pandas.read_csv('testing.csv')
textdata = pandas.DataFrame(table, columns=textcols)

# Add any new columns
newCols = textdata.columns - csvdata.columns
for c in newCols:
    csvdata[c] = None

mergecols = ['MATCH2', 'MATCH1', 'MATCH3']
csvdata.set_index(mergecols, inplace=True, drop=False)
textdata.set_index(mergecols, inplace=True,drop=False)
csvdata.update(textdata)
csvdata.to_csv('testing2.csv', index=False)

testing.csv

testing.txt

MData (N/A)
DMATCH1
3 Tommy 144512/23332
1 Jim 90000/222311
1 Elz M 90000/222311
1 Ben 90000/222311
Data $50.90
misc2 $10.40
bla3 $20.20


MData (B/B) 
DMATCH2
4 James Smith 2333/114441
4 Mike 90000/222311
4 Jessica Long 2333/114441
Data $50.90
bla3 $5.44

Anyone have any ideas how to fix this?

Thanks in advance
- Hyflex

解决方案

Try this :)

csvdata = pandas.read_csv('testing.csv', dtype={'TITLE5' : 'object', 'TITLE5.1' : 'object', 'TITLE5.2' : 'object', 'TITLE5.3' : 'object'})

这篇关于Pandas脚本将数字修改为长浮点数,当它不应该修改该列/元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆