August 11

Programmer’s étude: re-emvoweler

Here’s an exercise from dailyprogrammer_ideas; I’m going to start posting them here instead.

Disemvoweling strips all the vowels out of a text: fr xmpl, lk ths. It’s used by texters, forum moderators, and certain programmers who seem to just hate vowels in variable names. Your mission, should you choose to accept it, is to take a disemvoweled input and return the most probable unmangled original text.

How hard this is depends on how accurate you’re trying to make it. Treating each word in isolation it’ll be easy, but you need context to tell ‘word’ from ‘weirdo’.

You’ll need some handy text to train your language model on, such as Great Expectations.

I’ll be a little surprised if anyone posts solutions right away (not that it necessarily takes long, but, you know, all three of you) — but let’s try collecting any answers at this gist. I’ll put mine there once I get to it!

If you’re wanting more, try Programming Praxis, dailyprogrammer, and CodingBat. This post titled in fond memory of the book by Charles Wetherell. :)


March 5

Swift complaint

As algorists assert, a tree
hath smaller trees that from it shoot,
and downward springs this canopy,
a-dangle from a groundless root.
For all they made it downward sprout,
be thankful ’tisn’t inside out.


March 9 2011

Words, words, words

On Hacker News they’re discussing a portmanteau generator. I wrote one too a few years ago and never got around to posting the results. Some of the better discoveries, loosely grouped by theme:

amountainous auburnt certaint clanguage cracket flightning havenue hybridge interpretty marqueen ravenue smotherapy spellet sportrait tracket

coarsenal navalanche smartial

ambulancet aprong sharpoon smalleable stackle stoolbox

localendar shalloween

cosmosis deternity ephemerald neveryone

advertisingularity brighteous davidity diasporadic freudianetics heavenue karmageddon leviathank parfaith plightning spheresy wheresy worthodox

accruelty apartner checkle chequestrian consultan commercenary entrepreneuropsychology evictim holocausterity interestitution moccasino neverthelessee

adolescentennial approximaternity maestrogen othermometry participaternity

adjourney answerve archivalry argumentality conferencephalomyelitis neglecture obstructure simplement translatency

alibido flaccident hasterisk pelvish risqueeze wienergy

blocket camisoleil chenillegal clocket eskimono fabrick fabulousy gartery marathong inspectator nudisturbance ploverall sconcept staffeta thighest thumble tranquilt

convictim misconductivity obtaint suspectator

gaffect hypocriteria insultan schooligan slightning videologue

frenchilada holocaustrian illinoisome tantamountain texasperating

amplifierce charmony chorde mnemonica perhapsody porchestra

cinnamong connectar gardenial saffront saffrontier scentral scoffee splate swallowance tasterisk wheath

blightning easiesta sweather weatherosclerosis

adepth comfortunate competentative difficultivate equalm indulgentle intelligentle misunderstandard respectacle respectrometer savantage

ambassadorable dissidentifier hormonetary parliamentality preferendum princessant scampaign spendulum suggestapo supremedy votingle

chancestry elephantom parallele pedigreen spheredity wheredity

pimplement plaqueous problemish scratchet

Here’s the code. Unlike Wordmerge, it doesn’t expect a subject word; it looks at all pairs in the dictionary instead and ranks the candidate portmanteaus by a notion of interestingness, letting you survey a whole dictionary worth pretty quickly. On the other hand this is less capable, in that it knows nothing about pronunciation, only spelling. There’s code using the CMU pronouncing dictionary elsewhere in the same languagetoys repo, which I never got around to incorporating into this program.


January 1 2011

About these weird poems

(Or poem-like language artifacts.) The last couple of posts were built out of anagrams, every line an anagram of the title. It’s a fun challenge I fell short of, insofar as the results resemble mad ravings more than Homer.

I’d start with a phrase, like “Two thousand eleven A.D.”, and feed it to my generator. It emits a zillion anagrams sorted by naturalness — that is, by the cross entropy according to a bigram model of English (ordering the words within each anagram to minimize that score). The raw anagram generation is standard; the part applying the language model, I haven’t heard of anyone doing, though it can’t be a new idea; presumably the commercial anagram programs work the same vein. (I haven’t tried any.)

This reordering of the words and the lines helps a lot. I load it all into Emacs and scan the first pages, clipping any lines that look interesting. Some words may catch my eye in unsuitable anagrams, and I’ll search for them further. After a dozen or two clippings there’ll be a couple that want to go together — until it’s like a ridiculous crossword puzzle tantalizing with no solution, unless you’ll indulge a vague frantic wave in the general direction of a meaning. Or unless you have wordskill and fancier tools: focusing on the best few pages by the cross entropy still leaves the vast majority of interesting lines buried in the babble. I’d like to hear from serious anagrammatists how they do it.

So, I mentioned a subject for today. How’d it come out?

Two Thousand Eleven A.D.

Was haunted love noted?
A-wounded, let’s not have
A new love that sounded
As heaven would tend to —
To have owned and let us.

And those we don’t value,
As heaven would tend to:
None would have tasted
On haunted love, wasted
Soul want to have ended —

Would have to set an end,
As even death would not:
And thus we do not leave.

Hey, it scanned! Try again?

Two Thousand Eleven

We have not done lust.
We hadn’t even soul to
Let us down to heaven:
We have untold notes.

We don’t have one lust,
Slow even unto death.
Not even a whole stud
Owned us to the navel.

Do not value the news.
Don’t love the new USA.
Shut down. Leave note.

I don’t plan to write any more of these, hurray.


April 24 2010

To Me in a Curt Pity

I am pretty. I count.
Court I my patient.
A curt pity—no time.
Put in a time to cry.
Me, I try to can it up.
Or I cut my patient.
React to impunity.
I to my aunt I crept.
Copy it in a mutter.
“Pure intimacy, tot.”
Cut a pity no merit.
Many to picture it.
May not picture it.
A minute pit to cry.
Time, act on purity:
Ripe, it may not cut.

A homage to Permutation City, made out of the same letters.

April 24 2010

Information Overload

In an old favorite room
I flavored into a moron;
In a room not for a devil,
I learn to avoid no form.

I am not in overload for
One normal, idiot favor:
Fan mail or devotion or—
Or I, for one, am not valid.

To avoid moral inferno
I’m overlord of a nation;
For a liar, do not move in
A role of iron and vomit.

Void not, aloof mariner!
Vomit in an aloof order:
In motion, adore flavor,
In into a marvel of odor.

Learn of Monrovia! Do it!
Over-load, minion of art,
Or avoid online format
To avoid a mine forlorn.

Venom. Tornado. Airfoil.

invalid maroon footer