pandas 数据框read_csv,指定列并将整行保留为字符串 [英] pandas dataframe read_csv, specify columns and keep whole line as a string

查看:73
本文介绍了 pandas 数据框read_csv,指定列并将整行保留为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在熊猫read_csv中,有一种方法可以指定例如.col1,col15,整行吗?

我正在尝试从文本文件中导入约700000行数据,该文本文件中有帽子"^"作为定界符,没有文本限定符和回车符作为行定界符.

I am trying to import about 700000 rows of data from a text file which has hats '^' as delimiters, no text qualifiers and carriage return as line delimiter.

在文本文件中,我需要第1列,第15列,然后是表/数据框的三列中的整行.

From the text file I need column 1, column 15 and then the whole line in three columns of a table/dataframe.

我已经搜索了如何在熊猫中做到这一点,但对它的逻辑了解不够深.我可以为所有26列导入很好,但这对我的问题没有帮助.

I've searched how to do this in pandas, but don't know it well enough to get the logic. I can import fine for all 26 columns, but that doesn't help my problem.

my_df = pd.read_csv("tablefile.txt", sep="^", lineterminator="\r",  low_memory=False)

或者我可以使用标准的python将数据放入表中,但是对于700000行,这大约需要4个小时.对我来说太长了.

Or I can use standard python to put the data into a table, but this takes about 4 hours for the 700000 rows. which is far too long for me.

count_1 = 0
for line in open('tablefile.txt'):
    if count_1 > 70:
        break
    else:
        col1id = re.findall('^(\d+)\^', line)
        col15id = re.findall('^.*\^.*\^(\d+)\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*', line)
        line = line.strip()

        count_1 = count_1 + 1

        cur.execute('''INSERT INTO mytable (mycol1id, mycol15id, wholeline) VALUES (?, ?, ?)''', 
        (col1id[0], col15id[0], line, ) )

        conn.commit()
    print('row count_1=',count_1)

在熊猫read_csv中,有一种方法可以指定例如.col1,col15,整线?

如上所述, col1 col15 是数字,而 wholeline 是字符串

As in above, col1 and col15 are digits and wholeline is a string

  • 我不想在导入后重建字符串,因为在此过程中我可能会丢失一些字符.

谢谢

提交到数据库的每一行都是燃烧时间.

Committing to the database for each line was burning time.

推荐答案

我将 conn.commit()放在for循环的外部.尽管我认为它的安全性较低,但是它可以将加载时间减少到几分钟.

I put the conn.commit() on the outside of the for loop. It reduced the load time to a few minutes, though I'm guessing it's less safe.

无论如何都感谢您的帮助.

Anyway thanks for the help.

这篇关于 pandas 数据框read_csv,指定列并将整行保留为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆