hahaha: Crowdsourced computational humor

About a month ago I made a post on the Wolfram Community sharing some code for a computational joke generator. Now I’ve finished a site that allows users to vote on the jokes they find funny. This post will provide a brief overview of the site’s design.

You can visit the site now at http://jokes.jesse.ws.

Changes from the original Mathematica code

Most of the Mathematica code from the forum post remained intact for generating the jokes in this site, but there are a few key differences. In the post I mention that the lists of rhymes/compound words are truncated to 2 elements to keep the joke list manageable. Since this site isn’t generating jokes on-the-fly, (more on that in the next section.) I want to generate all possible jokes using these templates. Instead of truncating the word lists, the new code maps across all possible combinations of the words. Here’s the changed code for joke type 1:

triads = {StringSplit[#[[1]]][[2]], StringSplit[#[[2]]][[2]], StringSplit[#[[1]]][[1]]} & /@
Flatten[Subsets[#, {2}] & /@ groups, 1];

comparisontemp[StringSplit[#[[1]]][[2]], StringSplit[#[[2]]][[2]],
StringSplit[#[[1]]][[1]]] & /@ RandomSample[pairs, 5]

And for type 2:

jokequarts =
Flatten[Table[
Table[Table[
Table[{syn1, syn2, pair[[1, 1]], pair[[2, 1]]}, {syn2,
pair[[2, 2, 2]]}], {syn1, pair[[1, 2, 2]]}], {pair,
Subsets[pairs, {2}]}], {pairs, rhymes}], 3];

Short[cleanjokequarts = Select[jokequarts, (#[[1]] != #[[2]])
&&
(#[[3]] != #[[4]])
&&
Quiet[StringTake[#[[1]], -3] != StringTake[#[[2]], -3]]
&&
(Quiet[StringTake[#[[1]], 3] != StringTake[#[[3]], 3]] &&
Quiet[StringTake[#[[2]], 3] != StringTake[#[[4]], 3]])
&&
(Length[StringPosition[#[[3]], #[[4]]]] == 0 &&
Length[StringPosition[#[[4]], #[[3]]]] == 0) &]]

The site

It became clear pretty early on that generating a new joke every time the user requests one would be unfeasibly slow. Instead, I generated the whole space of possible jokes and loaded them into a MySQL database.

Exporting the jokes from Mathematica is pretty simple:

Export["type1triads.csv", triads];

Export["type2quarts.csv", cleanjokequarts];

Then I create a database with two tables, using these schemas:

CREATE TABLE `type1` (
`id` bigint(11) unsigned not null auto_increment,
`word1` varchar(60) default null,
`word2` varchar(60) default null,
`word3` varchar(60) default null,
`rating` int(11) not null default '0',
`type` tinyint(4) unsigned not null default '1',
PRIMARY KEY (`id`)
)

CREATE TABLE `type2` (
`id` bigint(11) unsigned not null auto_increment,
`word1` varchar(60) default null,
`word2` varchar(60) default null,
`word3` varchar(60) default null,
`word4` varchar(60) default null,
`rating` int(11) not null default '0',
`type` tinyint(4) unsigned not null default '2',
primary key (`id`)
)

MySQL has a very useful built-in command for importing tabular data, so I use it to import the CSVs into the tables:

load data infile 'type1triads.csv' into table jokes.type1
fields terminated by ','
lines terminated by 0x0A
(word1, word2, word3)

load data infile 'type2quarts.csv' into table jokes.type2
fields terminated by ','
lines terminated by 0x0A
(word1, word2, word3, word4)

After the import, the tables look like this:

Joke type 2 table screenshot

Great, now we have a database of computer-generated jokes, most of which are terrible. Now what?

We make a site where users can vote on them, of course!

hahaha runs on Node.JS, using Express for routing and template rendering and Zurb Foundation (my go-to library) for interface styling. IBM Bluemix is used for hosting, just because it’s free (I would prefer Heroku, but they have the whole “app sleeping” thing.) And of course, it’s available on GitHub.

The vast majority of the jokes aren’t particularly funny, (or don’t make any sense at all) but my hope is that if enough users participate, some good ones will rise to the top. So please, visit the site and share it with your friends!

One thought on “hahaha: Crowdsourced computational humor

Leave a Reply

Your email address will not be published. Required fields are marked *