So for comp ling, we're supposed to write a program to recognize compound words. Now we need a gold standard to test it against, so we each made a list of 100 words and identified whether or not they were compounds, and sent it in to be merged into one big lists.
Now I don't expect my program to do too well, but it's rather worrying when the standard itself has problems. Let's take a look...
(Note: compound words here are listed with spaces in them, i.e. "rail road", as that's the output format of our program.)
Disagreement: There is disagreement on whether "bypass" is a compound; apparently "bypassing" is, but "bypass" is not. I'd go with it being a compound, but either way there's something wrong here. Presumably different people sent in the words - but we were told, if you're not sure if something's a compound, don't send it in.
Bizarrities: "hacki staff" and "hacki work". I suspect these are supposed to be "hack staff" and "hack work"? Though I don't think "hackstaff" is a word...
Sheer failure: "headless" isn't a compound, it's a use of the "-less" suffix! Similarly with "shapeless". And "supervision".
WTF?: Line 765 reads "fastKader". Line 1058: "hallucinatingschools". And if those somehow were words, they would certainly be compounds...
You'd think this would be the easy part of the assignment.
-Harry
Now I don't expect my program to do too well, but it's rather worrying when the standard itself has problems. Let's take a look...
(Note: compound words here are listed with spaces in them, i.e. "rail road", as that's the output format of our program.)
Disagreement: There is disagreement on whether "bypass" is a compound; apparently "bypassing" is, but "bypass" is not. I'd go with it being a compound, but either way there's something wrong here. Presumably different people sent in the words - but we were told, if you're not sure if something's a compound, don't send it in.
Bizarrities: "hacki staff" and "hacki work". I suspect these are supposed to be "hack staff" and "hack work"? Though I don't think "hackstaff" is a word...
Sheer failure: "headless" isn't a compound, it's a use of the "-less" suffix! Similarly with "shapeless". And "supervision".
WTF?: Line 765 reads "fastKader". Line 1058: "hallucinatingschools". And if those somehow were words, they would certainly be compounds...
You'd think this would be the easy part of the assignment.
-Harry