如何使用 pandas 清理坐标系统的CSV文件? [英] How to clean CSV file for a coordinate system using pandas?

查看:51
本文介绍了如何使用 pandas 清理坐标系统的CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个程序来将CSV文件转换为DXF(AutoCAD),但是CSV文件有时带有标头,有时没有,并且有些单元格不能为空,例如坐标,并且我还注意到排除后一些输入的值是nan或NaN,有必要删除它们,所以我为您提供答案,请分享您的意见以实施更好的方法.

I wanted to create a program to convert CSV files to DXF(AutoCAD), but the CSV file sometimes comes with a header and sometimes no and there are cells that cannot be empty such as coordinates, and I also noticed that after excluding some of the inputs the value is nan or NaN and it was necessary to get rid of them so I offer you my answer and please share your opinions to implement a better method.

示例输入

输出

推荐答案

解决方案

import string
import pandas


def pandas_clean_csv(csv_file):
    """
    Function pandas_clean_csv Documentation
    - I Got help from this site, it's may help you as well:
    Get the row with the largest number of missing data for more Documentation
    https://moonbooks.org/Articles/How-to-filter-missing-data-NAN-or-NULL-values-in-a-pandas-DataFrame-/
    """
    try:
        if not csv_file.endswith('.csv'):
            raise TypeError("Be sure you select .csv file")
        
        # get punctuations marks as list !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
        punctuations_list = [mark for mark in string.punctuation]
    
        # import csv file and read it by pandas
        data_frame = pandas.read_csv(
            filepath_or_buffer=csv_file,
            header=None,
            skip_blank_lines=True,
            error_bad_lines=True,
            encoding='utf8',
            na_values=punctuations_list
        )
        
        # if elevation column is NaN convert it to 0
        data_frame[3] = data_frame.iloc[:, [3]].fillna(0)
        
        # if Description column is NaN convert it to -
        data_frame[4] = data_frame.iloc[:, [4]].fillna('-')
        
        # select coordinates columns
        coord_columns = data_frame.iloc[:, [1, 2]]
        
        # convert coordinates columns to numeric type
        coord_columns = coord_columns.apply(pandas.to_numeric, errors='coerce', axis=1)
        
        # Find rows with missing data
        index_with_nan = coord_columns.index[coord_columns.isnull().any(axis=1)]
        
        # Remove rows with missing data
        data_frame.drop(index_with_nan, 0, inplace=True)
        
        # iterate data frame as tuple data
        output_clean_csv = data_frame.itertuples(index=False)
        
        return output_clean_csv
    
    except Exception as E:
        print(f"Error: {E}")
        exit(1)


out_data = pandas_clean_csv('csv_files/version2_bad_headers.csl')

for i in out_data:
    print(i[0], i[1], i[2], i[3], i[4])

在这里您可以下载我的测试CSV 文件

这篇关于如何使用 pandas 清理坐标系统的CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆