statsmodel.api.Logit:valueerror数组不能包含infs或nans [英] statsmodel.api.Logit: valueerror array must not contain infs or nans
问题描述
我正在尝试使用statsmodel.api.Logit在Python中应用Logistic回归. 我遇到了错误 ValueError:数组不能包含infs或NaN.
I am trying to apply Logistic Regression in Python using statsmodel.api.Logit. I am running into the error ValueError: array must not contain infs or NaNs.
当我执行以下操作时:
data['intercept'] = 1.0
train_cols = data.columns[1:]
logit = sm.Logit(data['admit'], data[train_cols])
result = logit.fit(start_params=None, method='bfgs', maxiter=20, full_output=1, disp=1, callback=None)
数据包含超过15000列和2000行. 哪个data ['admit']是目标值,哪个data [train_cols]是要素列表. 任何人都可以给我一些解决此问题的提示吗?
The data contains more than 15000 columns and 2000 rows. which data['admit'] is the target value and data[train_cols] is the list of features. Can anyone please give me some hints to fix this problem?
推荐答案
默认情况下,Logit
不会检查数据中是否存在不可处理的不定式(np.inf
)或NaN(np.nan
).在大熊猫中,后者通常表示缺少条目.
By default, Logit
does not check your data for un-processable infinitities (np.inf
) or NaNs (np.nan
). In pandas, the latter normally signifies a missing entry.
要忽略缺少数据的行并继续进行其余操作,请像这样使用missing='drop'
:
To ignore rows with missing data and proceed with the rest, use missing='drop'
like so:
sm.Logit(data['admit'], data[train_cols], missing='drop')
请参见登录文档其他选项.
如果您不希望数据包含任何丢失的条目或不定式,则可能是错误地加载了它.查看data[data.isnull()]
以查看问题所在. (NB 阅读此以了解如何制作infs注册为空.)
If you do not expect your data to contain any missing entries or infinities, perhaps you loaded it incorrectly. Look at data[data.isnull()]
to see where the problem is. (N.B. Read this to see how to make infs register as null.)
这篇关于statsmodel.api.Logit:valueerror数组不能包含infs或nans的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!