这是题目。
然后题目的答案是
用python代码如下
X1 = np.array([1,1,1,1,1,2,2,2,2,2,3,3,3,3,3])
X2 = np.array([1,2,2,1,1,1,2,2,3,3,3,2,2,3,3])
X_train = list(zip(X1,X2))
y_train = [-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1]
X_test = [[2,1]]
MNBclf = naive_bayes.MultinomialNB()
MNBclf.fit(X_train,y_train)
print(MNBclf.predict(X_test))
print(MNBclf.predict_proba(X_test))
这是结果
[1]
[[ 0.40579081 0.59420919]]
为什么会和书里的不一样
1个回答
MultinormialNB应该用onehotcode。
import numpy as np
import sklearn.naive_bayes as nb
import matplotlib.pyplot as plt
from sklearn import preprocessing
X1 = np.array([1,1,1,1,1,2,2,2,2,2,3,3,3,3,3])
X2 = np.array([1,2,2,1,1,1,2,2,3,3,3,2,2,3,3])
X_train = list(zip(X1,X2))
enc = preprocessing.OneHotEncoder()
enc.fit(X_train)
X_train=enc.transform(X_train).toarray()
print(X_train)
y_train = np.array([-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1])
X_test = [[2,1]]
X_test = enc.transform(X_test).toarray()
print(X_test)
MNBclf = nb.MultinomialNB(alpha=1E-10)
# MNBclf = nb.GaussianNB()
MNBclf.fit(X_train,y_train)
print(MNBclf.predict(X_test))
print(MNBclf.predict_proba(X_test))
输出是
[-1]
[[0.75 0.25]]
如果不换成one hot,可以用nb.GaussianNB()
SofaSofa数据科学社区DS面试题库 DS面经