Programmer’s étude: re-emvoweler
Here’s an exercise from dailyprogrammer_ideas; I’m going to start posting them here instead.
Disemvoweling strips all the vowels out of a text: fr xmpl, lk ths. It’s used by texters, forum moderators, and certain programmers who seem to just hate vowels in variable names. Your mission, should you choose to accept it, is to take a disemvoweled input and return the most probable unmangled original text.
How hard this is depends on how accurate you’re trying to make it. Treating each word in isolation it’ll be easy, but you need context to tell ‘word’ from ‘weirdo’.
You’ll need some handy text to train your language model on, such as Great Expectations.
I’ll be a little surprised if anyone posts solutions right away (not that it necessarily takes long, but, you know, all three of you) — but let’s try collecting any answers at this gist. I’ll put mine there once I get to it!
As algorists assert, a tree
hath smaller trees that from it shoot,
and downward springs this canopy,
a-dangle from a groundless root.
For all they made it downward sprout,
be thankful ’tisn’t inside out.
March 9 2011
Words, words, words
amountainous auburnt certaint clanguage cracket flightning havenue hybridge interpretty marqueen ravenue smotherapy spellet sportrait tracket
coarsenal navalanche smartial
ambulancet aprong sharpoon smalleable stackle stoolbox
cosmosis deternity ephemerald neveryone
advertisingularity brighteous davidity diasporadic freudianetics heavenue karmageddon leviathank parfaith plightning spheresy wheresy worthodox
accruelty apartner checkle chequestrian consultan commercenary entrepreneuropsychology evictim holocausterity interestitution moccasino neverthelessee
adolescentennial approximaternity maestrogen othermometry participaternity
adjourney answerve archivalry argumentality conferencephalomyelitis neglecture obstructure simplement translatency
alibido flaccident hasterisk pelvish risqueeze wienergy
blocket camisoleil chenillegal clocket eskimono fabrick fabulousy gartery marathong inspectator nudisturbance ploverall sconcept staffeta thighest thumble tranquilt
convictim misconductivity obtaint suspectator
gaffect hypocriteria insultan schooligan slightning videologue
frenchilada holocaustrian illinoisome tantamountain texasperating
amplifierce charmony chorde mnemonica perhapsody porchestra
cinnamong connectar gardenial saffront saffrontier scentral scoffee splate swallowance tasterisk wheath
blightning easiesta sweather weatherosclerosis
adepth comfortunate competentative difficultivate equalm indulgentle intelligentle misunderstandard respectacle respectrometer savantage
ambassadorable dissidentifier hormonetary parliamentality preferendum princessant scampaign spendulum suggestapo supremedy votingle
chancestry elephantom parallele pedigreen spheredity wheredity
pimplement plaqueous problemish scratchet
Here’s the code. Unlike Wordmerge, it doesn’t expect a subject word; it looks at all pairs in the dictionary instead and ranks the candidate portmanteaus by a notion of interestingness, letting you survey a whole dictionary worth pretty quickly. On the other hand this is less capable, in that it knows nothing about pronunciation, only spelling. There’s code using the CMU pronouncing dictionary elsewhere in the same languagetoys repo, which I never got around to incorporating into this program.
January 1 2011
About these weird poems
(Or poem-like language artifacts.) The last couple of posts were built out of anagrams, every line an anagram of the title. It’s a fun challenge I fell short of, insofar as the results resemble mad ravings more than Homer.
I’d start with a phrase, like “Two thousand eleven A.D.”, and feed it to my generator. It emits a zillion anagrams sorted by naturalness — that is, by the cross entropy according to a bigram model of English (ordering the words within each anagram to minimize that score). The raw anagram generation is standard; the part applying the language model, I haven’t heard of anyone doing, though it can’t be a new idea; presumably the commercial anagram programs work the same vein. (I haven’t tried any.)
This reordering of the words and the lines helps a lot. I load it all into Emacs and scan the first pages, clipping any lines that look interesting. Some words may catch my eye in unsuitable anagrams, and I’ll search for them further. After a dozen or two clippings there’ll be a couple that want to go together — until it’s like a ridiculous crossword puzzle tantalizing with no solution, unless you’ll indulge a vague frantic wave in the general direction of a meaning. Or unless you have wordskill and fancier tools: focusing on the best few pages by the cross entropy still leaves the vast majority of interesting lines buried in the babble. I’d like to hear from serious anagrammatists how they do it.
So, I mentioned a subject for today. How’d it come out?
Two Thousand Eleven A.D.
Was haunted love noted?
A-wounded, let’s not have
A new love that sounded
As heaven would tend to —
To have owned and let us.
And those we don’t value,
As heaven would tend to:
None would have tasted
On haunted love, wasted
Soul want to have ended —
Would have to set an end,
As even death would not:
And thus we do not leave.
Hey, it scanned! Try again?
Two Thousand Eleven
We have not done lust.
We hadn’t even soul to
Let us down to heaven:
We have untold notes.
We don’t have one lust,
Slow even unto death.
Not even a whole stud
Owned us to the navel.
Do not value the news.
Don’t love the new USA.
Shut down. Leave note.
I don’t plan to write any more of these, hurray.
April 24 2010
To Me in a Curt Pity
I am pretty. I count.
Court I my patient.
A curt pity—no time.
Put in a time to cry.
Me, I try to can it up.
Or I cut my patient.
React to impunity.
I to my aunt I crept.
Copy it in a mutter.
“Pure intimacy, tot.”
Cut a pity no merit.
Many to picture it.
May not picture it.
A minute pit to cry.
Time, act on purity:
Ripe, it may not cut.
April 24 2010
In an old favorite room
I flavored into a moron;
In a room not for a devil,
I learn to avoid no form.
I am not in overload for
One normal, idiot favor:
Fan mail or devotion or—
Or I, for one, am not valid.
To avoid moral inferno
I’m overlord of a nation;
For a liar, do not move in
A role of iron and vomit.
Void not, aloof mariner!
Vomit in an aloof order:
In motion, adore flavor,
In into a marvel of odor.
Learn of Monrovia! Do it!
Over-load, minion of art,
Or avoid online format
To avoid a mine forlorn.