带有双引号的 pandas 数据 [英] pandas data with double quote
问题描述
我正在尝试读取.csv格式的大型数据集,该数据集使用熊猫库自动更新。
问题是在我的数据中,第一行是不带双引号的字符串,其他列是带双引号的字符串。对于我来说,手动调整.csv文件是不可能的。
I am trying to read a large dataset in .csv format which is update automatically using the pandas library. The problem is that in my data, the first row is a string without double quotation marks, and the other colums are strings with double quotation marks. It is not possible for me to adjust the .csv file manually.
简化的数据集看起来像这样
A simplified dataset would look like this
- A, B, C, D
- comp_a, tree, house, door
- comp_b,卡车,红色,蓝色
我需要将数据存储为单独的列,没有这样的引号:
I need the data to be stored as separate columns without the quotation marks like this:
- ABCD
- comp_a树屋门
- comp_b卡车红色蓝色
我尝试使用
import pandas as pd
df_csv = pd.read(path_to_file,delimiter=',')
这给了我完整的标题,作为最后一列的单个变量
which gives me the complete header as a single variable for the last column
- A , B, C, D
- comp_a树房屋门
- comp_b卡车红色 blue
与我需要的结果最接近的结果是使用以下
The closest result to the one i need was by using the following
df_csv = pd.read(path_to_file,delimiter=',',quoting=3)
可以正确识别每列,但会添加一堆额外的双引号。
which correctly recognizes each column, but adds in a bunch of extra double quotes.
- A B C D
- comp_a树房屋门
- comp_b卡车红色蓝色
设置引用值从0到2只是将整行读为单列。
Setting quoting to a value from 0 to 2 just reads an entire row as a single column.
有人知道我在读取.csv文件时如何删除所有引号吗?
Does anyone know how I can remove all quotation marks when reading the .csv file?
推荐答案
只需使用 pd.read_csv()
加载数据,然后使用 .replace('','',regex = True)
Just load the data with pd.read_csv()
and then use .replace('"','', regex=True)
在一行中应该是
df = pd.read_csv(filename, sep=',').replace('"','', regex=True)
设置列名称:
df.columns = df.iloc[0]
然后删除第0行:
df = df.drop(index=0).reset_index(drop=True)
这篇关于带有双引号的 pandas 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!