使用 pandas 或其他 python 模块读取特定列 [英] Read specific columns with pandas or other python module
问题描述
我有一个来自这个网页.我想读取下载文件中的一些列(可以在右上角下载csv版本).
I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner).
假设我想要 2 列:
- 59 在标题中是
star_name
- 60 在标题中是
ra
.
但是,出于某种原因,网页的作者有时会决定移动列.
However, for some reason the authors of the webpage sometimes decide to move the columns around.
最后我想要这样的东西,记住值可能会丢失.
In the end I want something like this, keeping in mind that values can be missing.
data = #read data in a clever way
names = data['star_name']
ras = data['ra']
如果它们保持名称正确,这将防止我的程序在将来再次更改列时出现故障.
This will prevent my program to malfunction when the columns are changed again in the future, if they keep the name correct.
到目前为止,我已经尝试了各种使用 csv
模块的方法,并且非常讨厌 pandas
模块.两者都没有运气.
Until now I have tried various ways using the csv
module and resently the pandas
module. Both without any luck.
编辑(添加了两行 + 我的数据文件的标题.抱歉,它太长了.)
EDIT (added two lines + the header of my datafile. Sorry, but it's extremely long.)
<代码>#名称,质量,mass_error_min,mass_error_max,半径,radius_error_min,radius_error_max,orbital_period,orbital_period_err_min,orbital_period_err_max,semi_major_axis,semi_major_axis_error_min,semi_major_axis_error_max,偏心,eccentricity_error_min,eccentricity_error_max,angular_distance,倾斜度,inclination_error_min,inclination_error_max,tzero_tr,tzero_tr_error_min,tzero_tr_error_max,tzero_tr_sec,tzero_tr_sec_error_min,tzero_tr_sec_error_max,lambda_angle,lambda_angle_error_min,lambda_angle_error_max,impact_parameter,impact_parameter_error_min,impact_parameter_error_max,tzero_vr,tzero_vr_error_min,tzero_vr_error_max,K,K_error_min,K_error_max,temp_calculated,temp_measured,hot_point_lon,反照率,albedo_error_min,albedo_error_max,log_g,publication_status,发现, 更新, 欧米茄, omega_error_min, omega_error_max, tperi, tperi_error_min, tperi_error_max, detection_type, mass_detection_type,radius_detection_type,alternate_names,分子,star_name,ra,dec,mag_v,mag_i,mag_j,mag_h,mag_k,star_distance,star_metality,star_mass,star_radius,star_sp_type,star_age,star_teff,star_detected_disc,star_magnetic_field11 Com b,19.4,1.5,1.5,,,,,326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005,0.011664,,,,,,,,,,,,,,,,,,,,,,,,,,1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6,1.6,径向速度,,,,,11 Com,185.1791667,17.792.747,17.792.747,,,110.6,-0.35,2.7,19.0,G8 III,,,4742.0,,11 UMi b,10.5,2.47,2.47,,,,,516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,径向速度,,,,,11 UMi,229.275,3,809.,119.5,0.04,1.8,24.08,K4III,1.56,4340.0,,
推荐答案
一个简单的方法是使用这样的 pandas
库.
An easy way to do this is using the pandas
library like this.
import pandas as pd
fields = ['star_name', 'ra']
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name
这里的问题是 skipinitialspace
删除了标题中的空格.所以'star_name'变成了'star_name'
The problem here was the skipinitialspace
which remove the spaces in the header. So ' star_name' becomes 'star_name'
这篇关于使用 pandas 或其他 python 模块读取特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!