Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素 [英] Pandas script modifying numbers to long float numbers when it shouldn't even be modifying that column/element

查看：171 发布时间：2017/2/26 16:07:24 python python-2.7 csv pandas

本文介绍了Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个下面的熊猫脚本让我头疼，因为它不断修改我的数据，当它不应该，下面的例子可以重新创建100％完美的问题。（让我永远找出导致这个问题的原因）

基本上，如果你将原始文件与修改的 testing2.csv ，您将看到如下数字： 0.357 从第一行变成： 0.35700000000000004 第2行数字 0.1128 不会改变...

 
 
 应该 strong>正在修改这些数字，它们都应该是原样。
 
 
  testing.py 
 
 
 
  import re 
 import pandas 
＃文本文件中的每个块都将是此列表的一个元素
 matchers = [[]] 
i = 0 
 with open（'testing.txt'）as infile：
 for line in infile：
 line = line.strip（）
＃块由空行分隔
如果len（line）== 0：
i + = 1 
 matchers.append（[]）
＃假设在项目之间总是有两个空行
＃ lext行
 infile.next（）
 continue 
 matchers [i] .append（line）
 
 
＃这个正则表达式匹配变量号的学生在每个块
 studentlike = re.compile（'（\d +）（。+）（\d + / \d +）'）
＃这些是我们期望的字段的名称在每个块的结尾
 datanames = ['Data'，'misc2'，'bla3'] 
＃我们将构建一个包含每个学生的元素列表的表
 table = ] 
在匹配器中的匹配器：
＃我们使用一个迭代器在块行上使索引更简单
 it = iter（matcher）
＃前两个元素是匹配值
 m1，m2 = it.next（），it.next（）
＃然后有一些学生
学生= [] 
为可能在其中：
m = studentlike.match（possiblestudent）
如果m：
 students.append（list（m.groups（）））
 else：
 break 
＃学生来的数据元素，我们读入字典
＃我们还添加在最后一个可能的学生行，因为不匹配学生re 
 dataitems = dict（item.split（）for item最后，我们为学生中的学生构建表
：
＃我们使用字典.get（）方法为缺少的字段返回空格
 tablebuffer（[m1，m2] + student + [dataitems.get（d，''）for d in datanames]）
 
 textcols = ['MATCH2'，'MATCH1' ，'TITLE01'，'MATCH3'，'TITLE02'，'Data'，'misc2'，'bla3'] 
 csvdata = pandas.read_csv（'testing.csv'）
 textdata = pandas.DataFrame （table，columns = textcols）
 
＃添加任何新列
 newCols = textdata.columns  -  csvdata.columns 
 for c in newCols：
 csvdata [c] = None 
 
 mergecols = ['MATCH2'，'MATCH1'，'MATCH3'] 
 csvdata.set_index（mergecols，inplace = True，drop = False）
 textdata.set_index （mergecols，inplace = True，drop = False）
 csvdata.update（textdata）
 csvdata.to_csv（'testing2.csv'，index = False）
  
 
 
  testing.csv 
 
 
  
   http://pastebin.com/raw.php?i=HxVE0nA0 （由于文件大小而上传）
 
 
 
 
  testing.txt 
 
 
 
  A）
 DMATCH1 
 3 Tommy 144512/23332 
 1 Jim 90000/222311 
 1 Elz M 90000/222311 
 1 Ben 90000/222311 
数据$ 50.90 
 misc2 $ 10.40 
 bla3 $ 20.20 
 
 
 MData（B / B）
 DMATCH2 
 4 James Smith 2333/114441 
 4 Mike 90000/222311 
 4 Jessica Long 2333/114441 
数据$ 50.90 
 bla3 $ 5.44 
  
任何人都有任何想法如何解决这个问题？
 
 
 提前感谢
 
  -  Hyflex 
解决方案
 
尝试：）
 
 
 
  csvdata = pandas.read_csv 'testing.csv'，dtype = {'TITLE5'：'object'，'TITLE5.1'：'object'，'TITLE5.2'：'object'，'TITLE5.3'：'object'}）
  
 
I've got a pandas script below causing me a headache because it keeps modifying my data when it shouldn't be, the example below can re-create the issue 100% perfectly. (Took me forever to find out what was causing this problem)

Basically, if you compare the original file to the modified testing2.csv you'll see that numbers like: 0.357 from the first line turn into: 0.35700000000000004 yet on line 2 the number 0.1128 doesn't change at all...

It should NOT be modifying these numbers, they should all be as they are.

testing.py

import re
import pandas
# each block in the text file will be one element of this list
matchers = [[]]
i = 0 
with open('testing.txt') as infile:
    for line in infile:
        line = line.strip()
        # Blocks are seperated by blank lines
        if len(line) == 0:
            i += 1
            matchers.append([])
            # assume there are always two blank lines between items 
            # and just skip to the lext line
            infile.next()
            continue
        matchers[i].append(line)


# This regular expression matches the variable number of students in each block
studentlike = re.compile('(\d+) (.+) (\d+/\d+)')
# These are the names of the fields we expect at the end of each block
datanames = ['Data', 'misc2', 'bla3']
# We will build a table containing a list of elements for each student
table = []
for matcher in matchers:
    # We use an iterator over the block lines to make indexing simpler
    it = iter(matcher)
    # The first two elements are match values
    m1, m2 = it.next(), it.next()
    # then there are a number of students
    students = []
    for possiblestudent in it:
        m = studentlike.match(possiblestudent)
        if m:
            students.append(list(m.groups()))
        else:
            break
    # After the students come the data elements, which we read into a dictionary
    # We also add in the last possible student line as that didn't match the student re
    dataitems = dict(item.split() for item in [possiblestudent] + list(it))
    # Finally we construct the table
    for student in students:
        # We use the dictionary .get() method to return blanks for the missing fields
        table.append([m1, m2] + student + [dataitems.get(d, '') for d in datanames])

textcols = ['MATCH2', 'MATCH1', 'TITLE01', 'MATCH3', 'TITLE02', 'Data', 'misc2', 'bla3']
csvdata = pandas.read_csv('testing.csv')
textdata = pandas.DataFrame(table, columns=textcols)

# Add any new columns
newCols = textdata.columns - csvdata.columns
for c in newCols:
    csvdata[c] = None

mergecols = ['MATCH2', 'MATCH1', 'MATCH3']
csvdata.set_index(mergecols, inplace=True, drop=False)
textdata.set_index(mergecols, inplace=True,drop=False)
csvdata.update(textdata)
csvdata.to_csv('testing2.csv', index=False)


testing.csv


http://pastebin.com/raw.php?i=HxVE0nA0 (Uploaded because of file size)


testing.txt

MData (N/A)
DMATCH1
3 Tommy 144512/23332
1 Jim 90000/222311
1 Elz M 90000/222311
1 Ben 90000/222311
Data $50.90
misc2 $10.40
bla3 $20.20


MData (B/B) 
DMATCH2
4 James Smith 2333/114441
4 Mike 90000/222311
4 Jessica Long 2333/114441
Data $50.90
bla3 $5.44
Anyone have any ideas how to fix this?

Thanks in advance

- Hyflex
 解决方案 
Try this :)

csvdata = pandas.read_csv('testing.csv', dtype={'TITLE5' : 'object', 'TITLE5.1' : 'object', 'TITLE5.2' : 'object', 'TITLE5.3' : 'object'})


                        
这篇关于Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素 [英] Pandas script modifying numbers to long float numbers when it shouldn't even be modifying that column/element

问题描述

testing.py

testing.csv

testing.txt

尝试：）

testing.py

testing.csv

testing.txt

Try this :)

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素 [英] Pandas script modifying numbers to long float numbers when it shouldn&#39;t even be modifying that column/element

问题描述

testing.py

testing.csv

testing.txt

尝试：）

testing.py

testing.csv

testing.txt

Try this :)

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Pandas脚本将数字修改为长浮点数，当它不应该修改该列/元素 [英] Pandas script modifying numbers to long float numbers when it shouldn't even be modifying that column/element

登录关闭