= Bag of Words Model = A '''bag of words model''' is essentially counting words per document. <> ---- == Data Structure == Rows are ''documents'' and columns are ''words'' or ''phrases''. Inevitably this is a sparse matrix. ---- == Implementation == First the words and phrases across all documents must be tokenized. Next, the count of each key word/phase is extracted from each document. Finally, the output matrix must be interpreted. Semantic meaning is lost so care must be taken to ensure that sourced documents have a common context. ---- CategoryRicottone