使用 pandas 仅将新行添加到现有的csv中? [英] Add only new rows to existing csv using pandas?

查看:87
本文介绍了使用 pandas 仅将新行添加到现有的csv中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我不知道如何正确编程.尝试了多种组合,由于逻辑错误,我走到了死胡同.我已经设法从网上获取数据,并使用熊猫将这些数据放入了一个csv文件中.该脚本将每15分钟运行一次以获取数据.

Ok I don't know how to program this correctly. Have tried many combinations and I have reached a dead end since logic is wrong. I have managed to fetch data from the web and placed this data inside a csv file using pandas. The script will be run every 15 min to fetch data.

在这种情况下,我在下面创建了一个称为数据"的虚拟对象.如果有新的更新,则数据有时会有所不同,并且有时看起来会相同,这取决于提供者所做的更改.

In this case down below I have created a dummy called "data". Data will be sometimes different if new updates are available and sometimes it will look the same depending on changes made by the provider.

但是,如果我再次运行该脚本,它将仅使用相同的数据填充csv文件.我不希望这样做,但是我只想添加行,如果有新的唯一数据,则追加.

However if I run the script again, it will only populate the csv file with same data. This I dont want but I only want to add rows, append if there is new unique data.

例如

import os
import requests
import pandas as pd
from datetime import datetime
import html5lib
import csv

data = [('Peter', 18, 7), ('Dick',22,2),
                        ('Riff', 15, 6), 
                        ('John', 17, 8), 
                        ('Michel', 18, 7), 
                        ('Sheli', 17, 5) ]
df = pd.DataFrame(data)

# if file exists....
if os.path.isfile('filename.csv'):
    #Old data
    oldFrame = pd.read_csv('filename.csv')
    
    #Concat
    df_diff = pd.concat([oldFrame, df],ignore_index=True).drop_duplicates(keep=False)

    #Write new rows to csv file
    df_diff.to_csv('filename.csv', mode='a', header=False)
    
else: # else it exists so append
    df.to_csv('filename.csv')

但是,这不起作用,给了我错误的数据,因此逻辑是错误的.该怎么做才能达到我想要的?有没有更好的使用方法?

However this does not work and gives me wrong data so logic is wrong. What to do to achive what I want? Is there any better method to use?

在好伙伴的帮助下更改了这样的脚本...

Have changed script like this by help from good fellows...

import os
import requests
import pandas as pd
from datetime import datetime
import html5lib
import csv

data = [('Adam', 18, 7), ('Magnus',22,2),('Lena',22,2),('Gringo', 18, 7)]
df = pd.DataFrame(data)
##
### if file exists....
if os.path.isfile('filename.csv'):
    #Old data
    oldFrame = pd.read_csv('filename.csv', header=None)
    
    #Concat
    df_diff = pd.concat([oldFrame, df], ignore_index=True).drop_duplicates()

    #Write new rows to csv file
    df_diff.to_csv('filename.csv', header=False)
    
else:

    # else it exists so append
    df.to_csv('filename.csv')
    print("File Created...")

已使用相同的数据"多次运行脚本.值...但是,数据帧的输出看起来像这样(如果调用oldFrame)

Have run the script many times with same "data" values.... However output of dataframe looks like this (if calling oldFrame)

>>> oldFrame
     0       1       2       3       4     5    6
0    0       0       0     NaN       0   1.0  2.0
1    1       1       1     0.0    Adam  18.0  7.0
2    2       2       2     1.0  Magnus  22.0  2.0
3    3       3       3     2.0    Lena  22.0  2.0
4    4       4       4     3.0  Gringo  18.0  7.0
5    5       5       5    Adam      18   7.0  NaN
6    6       6       6  Magnus      22   2.0  NaN
7    7       7       7    Lena      22   2.0  NaN
8    8       8       8  Gringo      18   7.0  NaN
9    9       9    Adam      18       7   NaN  NaN
10  10      10  Magnus      22       2   NaN  NaN
11  11      11    Lena      22       2   NaN  NaN
12  12      12  Gringo      18       7   NaN  NaN
13  13    Adam      18       7     NaN   NaN  NaN
14  14  Magnus      22       2     NaN   NaN  NaN
15  15    Lena      22       2     NaN   NaN  NaN
16  16  Gringo      18       7     NaN   NaN  NaN

由于数据相同,难道不应该更改csv吗?

Shouldn't the csv not be changed since the data is same?

推荐答案

读取现有文件时,它将第一行作为标题.

When you read the existing file, it takes the first row as header.

由于您没有使用标题,请指定不读取标题.

Since you are not using the header, specify not to read it.

替换

oldFrame = pd.read_csv('filename.csv')

作者

oldFrame = pd.read_csv('filename.csv', header=None)


另外,删除 drop_duplicates

df_diff = pd.concat([oldFrame, df],ignore_index=True).drop_duplicates()

mode ='a'(当 to_csv

df_diff.to_csv('filename.csv', header=False)

更新

请注意,我同时编辑了两个 to_csv 调用

最终脚本

import os
import requests
import pandas as pd
from datetime import datetime
import csv

data = [('Peter', 18, 7), ('Dick',22,2),
                        ('Riff', 15, 6), 
                        ('John', 17, 8), 
                        ('Michel', 18, 7), 
                        ('NEW', 2, 5), 
                        ('other', 2, 5), 
                        ('Sheli', 17, 5) ]
df = pd.DataFrame(data)

# if file exists....
if os.path.isfile('filename.csv'):
    #Old data
    oldFrame = pd.read_csv('filename.csv', header=None)

    #Concat
    df_diff = pd.concat([oldFrame, df],ignore_index=True).drop_duplicates()

    #Write new rows to csv file
    df_diff.to_csv('filename.csv', header=False, index=False)

else: # else it exists so append
    df.to_csv('filename.csv', header=False, index=False)

这篇关于使用 pandas 仅将新行添加到现有的csv中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆