Self-Creating Franken Post
May 4th, 2010
3A Markov chain is a system whose next state depends on the current state; the idea of a text generator based on Markov chains works like this: the next word in a phrase is selected randomly based on the current phrase. For example, suppose the current phrase is “6 year” and in our previous blog posts, someone mentioned a “6 year old” or perhaps a “6 year plague” (probably not, but just an example). In this case, the word “old” was randomly selected to go after “6 year”. The entire text is generated based on such an algorithm.


Since it was crunch time, and I really had no idea of what to write about, I decided to create a Franken-post from all of the previous Border Stylo blog posts. First I downloaded the content via wget:
This grabbed all the HTML data, and ignored all of the image files, so I was able to download everything very quickly. Next I took all the output files from the /posts directory and ran them through html2text to strip all the markup:
The scrubit program is a small Ruby script I wrote that stripped out the header, footer, and non essential items from the page:
Afterwards, I took all the posts and concatenated them into one big file, stripping out the non-printing characters in the process:
This seed data was fed into a Markov chain text generator of order 6 and length 7,500 characters to produce some raw text, which I formatted for line breaks and length (it ended up being about 4,000 characters), and that is the text you see above.
(To view the entire thumbnail image, click here. Source)
Tagged with: markov chain, franken post
Related Posts
Author
3 Comments Leave a comment
Interesting. I’ve assessed student dissertations that read just like that. Now I get it.
Lots of fun – could be improved with integration with a service like After The Deadline to check grammar… the one thing that really gives it away is the false verbs, so if there was a way of reading over it and correcting all the wrong tenses and fixing other grammatical quirks…. you’d be unstoppable.
Leave a comment
Allowed Tags
_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@
Add code using a GIST
gist: gistid
Welcome to the world of spamming circa 1998.
Reply to comment