使用pandas或其他python模块读取特定列 [英] Read specific columns with pandas or other python module

查看:807
本文介绍了使用pandas或其他python模块读取特定列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv档案来自此网页
我想读取下载文件中的某些列(csv版本可以在右上角下载)。



我想说的是2列:


  • 59在标题中 star_name

  • ,由于某种原因,网页的作者有时决定移动列。



    最后,我想要这样的东西,记住值可能会丢失。

      data =以一种聪明的方式读取数据
    names = data ['star_name']
    ras = data ['ra']

    这将防止我的程序在列再次更改



    到目前为止,我已经尝试了使用 csv 模块的各种方法,重新使用 pandas 模块。没有任何运气。



    EDIT(添加了两行+我的数据文件的标题,对不起,但非常长。)

     #名称,质量,mass_error_min,mass_error_max,半径,radius_error_min,radius_error_max,orbital_period,orbital_period_err_min,orbital_period_err_max,semi_major_axis,semi_major_axis_error_min,semi_major_axis_error_max,偏心,eccentricity_error_min,eccentricity_error_max,angular_distance,倾角,inclination_error_min,inclination_error_max,tzero_tr,tzero_tr_error_min,tzero_tr_error_max,tzero_tr_sec,tzero_tr_sec_error_min,tzero_tr_sec_error_max,lambda_angle,lambda_angle_error_min,lambda_angle_error_max,impact_parameter,impact_parameter_error_min,impact_parameter_error_max,tzero_vr,tzero_vr_error_min,tzero_vr_error_max,K,K_error_min,K_error_max,temp_calculated,temp_measured,hot_point_lon,反照率,albedo_error_min ,albedo_error_max,log_g,publication_status,发现更新,欧米茄,omega_error_min,omega_error_max,tperi,tperi_error_min,tperi_error_max,detection_type,mass_detection_type,radius_detection_type,alternate_names,分子,star_name,RA,DEC,mag_v,mag_i,mag_j,mag_h,mag_k,star_distance ,star_metallicity,star_mass,star_radius,star_sp_type,star_age,star_teff,star_detected_disc,star_magnetic_field 
    11的COM b,19.4,1.5,1.5 ,,,, 326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005 ,0.011664 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6 ,1.6,径向速度,,,, 11 Com,185.1791667,17.7927778,4.74 ,,,, 110.6,-0.35,2.7,19​​.0,G8 III ,, 4742.0 ,,
    11 UMi b,10.5,2.47 ,2.47 ,,,, 516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887 ,,,,,,,,,,,,,,,,,,, ,,,, 1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,径向速度,,,,, 11 UMI,229.275,71.8238889,5.02 ,,,,, 119.5,0.04,1.8, 24.08,K4III,1.56,4340.0 ,,


    解决方案

    方法是使用 pandas 库。

      import大熊猫作为PD 
    栏= ['star_name','岭']

    DF = pd.read_csv('data.csv',skipinitialspace = TRUE,usecols =字段)
    #See the keys
    print df.keys()
    #查看'star_name'中的内容
    print df.star_name

    这里的问题是 skipinitialspace ,它删除了标题中的空格。因此,star_name变成了star_name


    I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner).

    Let's say I want 2 columns:

    • 59 which in the header is star_name
    • 60 which in the header is ra.

    However, for some reason the authors of the webpage sometimes decide to move the columns around.

    In the end I want something like this, keeping in mind that values can be missing.

    data = #read data in a clever way
    names = data['star_name']
    ras = data['ra']
    

    This will prevent my program to malfunction when the columns are changed again in the future, if they keep the name correct.

    Until now I have tried various ways using the csv module and resently the pandas module. Both without any luck.

    EDIT (added two lines + the header of my datafile. Sorry, but it's extremely long.)

    # name, mass, mass_error_min, mass_error_max, radius, radius_error_min, radius_error_max, orbital_period, orbital_period_err_min, orbital_period_err_max, semi_major_axis, semi_major_axis_error_min, semi_major_axis_error_max, eccentricity, eccentricity_error_min, eccentricity_error_max, angular_distance, inclination, inclination_error_min, inclination_error_max, tzero_tr, tzero_tr_error_min, tzero_tr_error_max, tzero_tr_sec, tzero_tr_sec_error_min, tzero_tr_sec_error_max, lambda_angle, lambda_angle_error_min, lambda_angle_error_max, impact_parameter, impact_parameter_error_min, impact_parameter_error_max, tzero_vr, tzero_vr_error_min, tzero_vr_error_max, K, K_error_min, K_error_max, temp_calculated, temp_measured, hot_point_lon, albedo, albedo_error_min, albedo_error_max, log_g, publication_status, discovered, updated, omega, omega_error_min, omega_error_max, tperi, tperi_error_min, tperi_error_max, detection_type, mass_detection_type, radius_detection_type, alternate_names, molecules, star_name, ra, dec, mag_v, mag_i, mag_j, mag_h, mag_k, star_distance, star_metallicity, star_mass, star_radius, star_sp_type, star_age, star_teff, star_detected_disc, star_magnetic_field
    11 Com b,19.4,1.5,1.5,,,,326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005,0.011664,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6,1.6,Radial Velocity,,,,,11 Com,185.1791667,17.7927778,4.74,,,,,110.6,-0.35,2.7,19.0,G8 III,,4742.0,,
    11 UMi b,10.5,2.47,2.47,,,,516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,Radial Velocity,,,,,11 UMi,229.275,71.8238889,5.02,,,,,119.5,0.04,1.8,24.08,K4III,1.56,4340.0,,
    

    解决方案

    An easy way to do this is using the pandas library like this.

    import pandas as pd
    fields = ['star_name', 'ra']
    
    df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
    # See the keys
    print df.keys()
    # See content in 'star_name'
    print df.star_name
    

    The problem here was the skipinitialspace which remove the spaces in the header. So ' star_name' becomes 'star_name'

    这篇关于使用pandas或其他python模块读取特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆