为什么我用随机森林模型每次的score都不同差异很大-SofaSofa

随机森林模型回归，

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

data =pd.read_excel("data.xlsx","Sheet1")
data= data.dropna()
X=data.iloc[:,3:7]
y=data.iloc[:,10]
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.3)
rfr = RandomForestRegressor(n_estimators=20, max_features='auto',random_state=0)
#rfr.fit(X_train,y_train)
kf = KFold(n_splits=5, shuffle=True)
score_ndarray = cross_val_score(rfr, X, y, cv=kf)
print(score_ndarray)

400多个样本；

这个是5次的结果

[ 0.220815 0.6115134 -0.72378707 0.51356938 0.36471651]

constant007 2019-09-19 15:26

2个回答

你每次R方都不同，估计是因为你的模型很不稳健，与过拟合有关。你可以增加n_estimators，同时还需要设置最大深度，因为你的特征只有4个，max_depth设置为4就够了应该。

此外，cross_val_score默认的scoring是R方，你可以换成其他的试试，比如rmse

score_ndarray = cross_val_score(rfr, X, y, cv=kf, scoring='neg_mean_squared_error')

或者mae

score_ndarray = cross_val_score(rfr, X, y, cv=kf, scoring='neg_mean_absolute_error')

SofaSofa数据科学社区 DS面试题库 DS面经

得得得 2019-09-19 22:38

随机森林本来也是具有随机性的，所有每个fold结果相差很正常。如果你增加树的数量，应该每个fold的结果会相近一些。如果你数据量不够大的话，建议把fold数少一些，比如3fold。

SofaSofa数据科学社区 DS面试题库 DS面经

WinJ 2019-09-30 11:08

为什么我用随机森林模型每次的score都不同 差异很大

Warning

2个回答

Warning

Warning

为什么我用随机森林模型每次的score都不同差异很大