If you import NLTK stop words using from nltk.corpus import stopwords. and try printing the words using stopwords.words('english') Then you would get the latest of all the stop words in the NLTK corpus. I tried that above and the following array is what I got. Hope this helps. nltk.tokenize.api module¶. Tokenizer Interface. class nltk.tokenize.api.StringTokenizer [source] ¶. Bases: nltk.tokenize.api.TokenizerI A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses). Jan 04, 2016 · Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up Dec 20, 2017 · How to remove stop words from unstructured text data for machine learning in Python. Stopword list from the nltk/corpora. GitHub Gist: instantly share code, notes, and snippets. rake-nltk¶. RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.