使用 pandas 或其他 python 模块读取特定列 [英] Read specific columns with pandas or other python module

查看:40
本文介绍了使用 pandas 或其他 python 模块读取特定列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自这个网页.我想读取下载文件中的一些列(可以在右上角下载csv版本).

I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner).

假设我想要 2 列:

  • 59 在标题中是 star_name
  • 60 在标题中是 ra.

但是,出于某种原因,网页的作者有时会决定移动列.

However, for some reason the authors of the webpage sometimes decide to move the columns around.

最后我想要这样的东西,记住值可能会丢失.

In the end I want something like this, keeping in mind that values can be missing.

data = #read data in a clever way
names = data['star_name']
ras = data['ra']

如果它们保持名称正确,这将防止我的程序在将来再次更改列时出现故障.

This will prevent my program to malfunction when the columns are changed again in the future, if they keep the name correct.

到目前为止,我已经尝试了各种使用 csv 模块的方法,并且非常讨厌 pandas 模块.两者都没有运气.

Until now I have tried various ways using the csv module and resently the pandas module. Both without any luck.

编辑(添加了两行 + 我的数据文件的标题.抱歉,它太长了.)

EDIT (added two lines + the header of my datafile. Sorry, but it's extremely long.)

<代码>#名称,质量,mass_error_min,mass_error_max,半径,radius_error_min,radius_error_max,orbital_period,orbital_period_err_min,orbital_period_err_max,semi_major_axis,semi_major_axis_error_min,semi_major_axis_error_max,偏心,eccentricity_error_min,eccentricity_error_max,angular_distance,倾斜度,inclination_error_min,inclination_error_max,tzero_tr,tzero_tr_error_min,tzero_tr_error_max,tzero_tr_sec,tzero_tr_sec_error_min,tzero_tr_sec_error_max,lambda_angle,lambda_angle_error_min,lambda_angle_error_max,impact_parameter,impact_parameter_error_min,impact_parameter_error_max,tzero_vr,tzero_vr_error_min,tzero_vr_error_max,K,K_error_min,K_error_max,temp_calculated,temp_measured,hot_point_lon,反照率,albedo_error_min,albedo_error_max,log_g,publication_status,发现, 更新, 欧米茄, omega_error_min, omega_error_max, tperi, tperi_error_min, tperi_error_max, detection_type, mass_detection_type,radius_detection_type,alternate_names,分子,star_name,ra,dec,mag_v,mag_i,mag_j,mag_h,mag_k,star_distance,star_metality,star_mass,star_radius,star_sp_type,star_age,star_teff,star_detected_disc,star_magnetic_field11 Com b,19.4,1.5,1.5,,,,,326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005,0.011664,,,,,,,,,,,,,,,,,,,,,,,,,,1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6,1.6,径向速度,,,,,11 Com,185.1791667,17.792.747,17.792.747,,,110.6,-0.35,2.7,19​​.0,G8 III,,,4742.0,,11 UMi b,10.5,2.47,2.47,,,,,516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,径向速度,,,,,11 UMi,229.275,3,809.,119.5,0.04,1.8,24.08,K4III,1.56,4340.0,,

推荐答案

一个简单的方法是使用这样的 pandas 库.

An easy way to do this is using the pandas library like this.

import pandas as pd
fields = ['star_name', 'ra']

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name

这里的问题是 skipinitialspace 删除了标题中的空格.所以'star_name'变成了'star_name'

The problem here was the skipinitialspace which remove the spaces in the header. So ' star_name' becomes 'star_name'

这篇关于使用 pandas 或其他 python 模块读取特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆