公共自行车使用量预测
标杆:公共自行车使用量预测
简单线性回归模型(Python)
该模型预测结果的RMSE为:39.132
# -*- coding: utf-8 -*-
# 引入模块
from sklearn.linear_model import LinearRegression
import pandas as pd
# 读取数据
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
submit = pd.read_csv("sample_submit.csv")
# 删除id
train.drop('id', axis=1, inplace=True)
test.drop('id', axis=1, inplace=True)
# 取出训练集的y
y_train = train.pop('y')
# 建立线性回归模型
reg = LinearRegression()
reg.fit(train, y_train)
y_pred = reg.predict(test)
# 若预测值是负数,则取0
y_pred = map(lambda x: x if x >= 0 else 0, y_pred)
# 输出预测结果至my_LR_prediction.csv
submit['y'] = y_pred
submit.to_csv('my_LR_prediction.csv', index=False)
决策树回归模型(Python)
该模型预测结果的RMSE为:28.818
# -*- coding: utf-8 -*-
# 引入模块
from sklearn.tree import DecisionTreeRegressor
import pandas as pd
# 读取数据
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
submit = pd.read_csv("sample_submit.csv")
# 删除id
train.drop('id', axis=1, inplace=True)
test.drop('id', axis=1, inplace=True)
# 取出训练集的y
y_train = train.pop('y')
# 建立最大深度为5的决策树回归模型
reg = DecisionTreeRegressor(max_depth=5)
reg.fit(train, y_train)
y_pred = reg.predict(test)
# 输出预测结果至my_DT_prediction.csv
submit['y'] = y_pred
submit.to_csv('my_DT_prediction.csv', index=False)
xgboost回归模型(Python)
该模型预测结果的RMSE为:18.947
# -*- coding: utf-8 -*-
# 引入模块
from xgboost import XGBRegressor
import pandas as pd
# 读取数据
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
submit = pd.read_csv("sample_submit.csv")
# 删除id
train.drop('id', axis=1, inplace=True)
test.drop('id', axis=1, inplace=True)
# 取出训练集的y
y_train = train.pop('y')
# 建立一个默认的xgboost回归模型
reg = XGBRegressor()
reg.fit(train, y_train)
y_pred = reg.predict(test)
# 输出预测结果至my_XGB_prediction.csv
submit['y'] = y_pred
submit.to_csv('my_XGB_prediction.csv', index=False)