This is part 2 of a series of writeups I'm doing on my progress on my NaNoGenMo project, Markov's Fanfiction.
Once the stories have been collected, the next step is to use a Markov chain
library to create a collection of words vaguely resembling a novel. My Markov
library of choice for this project is markovify
. This is because markovify
will do its best to split the text into sentences, meaning I don't have to do
that work myself.
Once markovify is given the text input, we must configure the chain. A fine balance must be achieved between not needing to wait for 4 hours and not spitting out big chunks of the original stories. For this, I decided to set it up like so:
next_sentence = text.make_sentence(tries=1000,
max_overlap_ratio=0.8,
max_overlap_total=(2**64))
Specifically, the chain is expected to try up to 1000 times to create something
that can (at most) be 80% similar to an original text. Setting
max_overlap_total
to 2^64 was a simple measure to get the total overlap
wordcount out of the way, since the chain will take the smaller of the 2.
Combine that with a loop that takes in sentences until it hits the 50,000 word limit, and you have yourself a NaNoGenMo project.
The only thing I might add past this is the ability to split the book into chapters. But for now, it's good enough.