sniffnoy: (SMPTE)
[personal profile] sniffnoy
So for comp ling, we're supposed to write a program to recognize compound words. Now we need a gold standard to test it against, so we each made a list of 100 words and identified whether or not they were compounds, and sent it in to be merged into one big lists.

Now I don't expect my program to do too well, but it's rather worrying when the standard itself has problems. Let's take a look...

(Note: compound words here are listed with spaces in them, i.e. "rail road", as that's the output format of our program.)

Disagreement: There is disagreement on whether "bypass" is a compound; apparently "bypassing" is, but "bypass" is not. I'd go with it being a compound, but either way there's something wrong here. Presumably different people sent in the words - but we were told, if you're not sure if something's a compound, don't send it in.

Bizarrities: "hacki staff" and "hacki work". I suspect these are supposed to be "hack staff" and "hack work"? Though I don't think "hackstaff" is a word...

Sheer failure: "headless" isn't a compound, it's a use of the "-less" suffix! Similarly with "shapeless". And "supervision".

WTF?: Line 765 reads "fastKader". Line 1058: "hallucinatingschools". And if those somehow were words, they would certainly be compounds...

You'd think this would be the easy part of the assignment.

-Harry

February 2026

S M T W T F S
1234567
891011121314
15161718192021
22 23 2425262728
Page generated Mar. 9th, 2026 01:46 pm
Powered by Dreamwidth Studios