Add n-gram option for embedding training#81
Conversation
Delete debug pdb and add complete comments for new function Add 'ngram' argument to control this n-gram feature, set default to off Add phrases to enable ngrams for random walk sequences
|
Thanks for the pull request! I have a question here: considering that random walks are first-order Markov process, what benefits are we expected to get from using the n-grams? Is there any task or experimental results to prove that n-grams features are beneficial? |
|
Hi GTmac, Good question! More specific, when representing items with weighted directed graph (as the paper illustrated in Figure 2.), In fact, I'd like to experiment this, so I implement this and create this pull request to share.
Thanks for replying, reading and please advise :) |
#81
Feature implemented:
N-gram Phrases and Phraser by gensim, test successfully for both in-memory(data_size<max_memory) and out-of-core(data_size>max_memory) computation on local machine. Controlled via args.ngram, default set to 1 (unigram and thus original).
Test has been carried out, please find below log:
python setup.py test (after removing 'from deepwalk import deepwalk' in test_deepwalk.py)
tox test: