sklearn onehotencoder的具体用法-SofaSofa

求sklearn onehotencoder的具体用法，最好能有小例子，如何把pandas中有字符的一列做onehotencoding。

非常感谢！

offer雨 2018-09-11 11:12

2个回答

要先把文本转码成整数，然后再做独热。

我写了个函数

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

def LabelOneHotEncoder(data, categorical_features):
    d_num = np.array([])
    for f in data.columns:
        if f in categorical_features:
            le, ohe = LabelEncoder(), OneHotEncoder()
            data[f] = le.fit_transform(data[f])
            if len(d_num) == 0:
                d_num = np.array(ohe.fit_transform(data[[f]]))
            else:
                d_num = np.hstack((d_num, ohe.fit_transform(data[[f]]).A))
        else:
            if len(d_num) == 0:
                d_num = np.array(data[[f]])
            else:
                d_num = np.hstack((d_num, data[[f]]))
    return d_num

比如有个df，f2和f3这两列是字符，要对它们做独热处理

上面那个函数的使用方法如下

>>> LabelOneHotEncoder(df, ['f2', 'f3'])
array([[ 0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  1.,  0.,  0.,  1.,  0.,  0.],
       [ 2.,  0.,  0.,  1.,  0.,  1.,  0.,  0.],
       [ 3.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 4.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 5.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
       [ 6.,  0.,  0.,  1.,  0.,  0.,  0.,  1.],
       [ 7.,  0.,  0.,  0.,  1.,  0.,  0.,  1.]])

SofaSofa数据科学社区 DS面试题库 DS面经

abuu 2018-10-13 00:53

不知道算不算答非所问，pandas中的get_dummies方法可以更加方便地实现独热编码

df = pd.DataFrame([['green', 10.1],
                   ['red',   13.5],
                   ['blue',  15.3]], columns=['color', 'price'])

get_dummies 只对字符串列进行转换，数值列保持不变

pd.get_dummies(df)

pd.get_dummies(df).as_matrix()

如果要用sklearn的OneHotEncoder，要配合LabelEncoder

from sklearn.preprocessing import OneHotEncoder, LabelEncoder

le = LabelEncoder()
ohe = OneHotEncoder(categorical_features=[0]) # 转换第一列

df['color'] = le.fit_transform(df['color'])
ohe.fit_transform(df.values).toarray()

SofaSofa数据科学社区 DS面试题库 DS面经

xfyx 2018-10-19 17:29

sklearn onehotencoder的具体用法

Warning

2个回答

Warning

Warning