全部问题问题热门未解答所有标签标签收藏收藏我要提问

自定义CountVectorizera中默认的英语stop_words

统计/机器学习自然语言处理数据预处理 Python 浏览次数：5175 分享

二维码

手机扫描二维码

面试中常见简答题？

当我们用sklearn.feature_extraction.text.CountVectorizer对英文文本进行处理的时候，怎么自定义英语的stop_words？

谢谢

newcomer 2018-09-22 15:25

1个回答

直接设置stop_words这个参量就可以了

from sklearn.feature_extraction import text
my_stopwords = ['a', 'ab', 'abc']
text.CountVectorizer(stop_words= my_stopwords)

你也可以对系统默认的stop words添加自己的新词

my_stopwords = text.ENGLISH_STOP_WORDS.union(['a', 'ab', 'abc'])
text.CountVectorizer(stop_words= my_stopwords)

SofaSofa数据科学社区 DS面试题库 DS面经

matt 2018-09-26 12:06

谢谢！ - newcomer 2018-10-02 11:29

相关讨论

求python里得到n-grams的包？

如何对中文部分进行独热处理（one-hot）

python去掉中文文本中所有的标点符号

请问NLP中这种编码方式有没有什么术语？

怎么理解nlp里的good-turing smooth？

机器学习中文数据的训练集的预处理

怎么把英文字符串转为小写（python）？

jieba.cut中use_paddle是什么模式？

wordcloud安装报错error: Microsoft Visual C++ 14.0 is required

文本分类问题怎么做data augmentation？

随便看看

python怎么对list中的元素做连乘？

怎么理解库克距离(Cook's distance)?

【站务】我们回来了

怎么提取pandas dataframe中某一列每个字符串的前n个字符？

numpy.full这个函数有什么用？