coro results from the CPAN

coro
Games-Dissociate
view release on metacpan or search on metacpan
But I do recall Lojban having features that made quite clear the syntactic
constituency of any sentence, such as would disambiguate the two ways to
parse "pretty girls' school".  (I don't recall rigorously demonstrating to
myself that these features would disambiguate ALL kinds of ambiguities of
synactic ambiguity, but I took the author's word for it.)  I vaguely recall
the features being something like asserting that default attachment should
always be as low as possible in the syntax tree, but then providing ways to
specify higher attachment.

However, as I read the description of these features, I had the very strong
impression that while one could propose/hypothesize a language with these
features, the result would not be something anyone could learn.  I.e., I
felt sure that if I had to produce a sentence involving attachment, I'd
always have to stop and picture a syntax tree, and then I'd have to
carefully picture where attachment would go, and then trying to remember
how it is I'd need to specify that kind of attachment.  I could imagine
being able to do it, but /never/ being able to do it unconsciously, i.e.,
"naturally".

This led me to come up with a conjecture, in the spirit of mathematical
conjectures -- i.e., maybe "I don't see why this has to be true, but I
can't prove it's not", or maybe "I think this MUST be true, but I can't see
any way to prove it".  I call it the "Ambiguity Conjecture" -- or, in case
there's any other different like-named conjectures out there, "Burke's
Ambiguity Conjecture":


Burke's Ambiguity Conjecture
----------------------------

Natural languages exhibit many phenomena which make them very hard for them
to be parsed by machines.  These phenomena may include polysemy/homophony
of words (e.g., "bank" meaning either a place where your money is kept, or
the side of a river) or other morphemes (e.g.: that the subject marker and
the object marker in a given language, ususally distinct, may be
homophonous in some situation), or that a given surface form of a word can
represent more than one part of speech ("flies" as a noun, or verb; ditto
"saw").  Complex morphophology may also complicate parsing.  However, the
difficulty of these phenomena vary from language to language, and from
language type or language type.  A phenomenon that poses a significant
parseability obstacle in one language may be totally unproblematic or even
absent in others.

But I think that one problem area is common to all languages: syntactic
ambiguity.  This is the problem underlying the classic phrase "pretty
girls' school" -- where the sense and part of speech of all the words is
clear, but where it's unclear whether this should be parsed as 1 or as 2:

1) [[pretty girls]' school]  (the school for/with girls who are pretty)
2) [pretty [girls' school]]  (the school for/with girls which is pretty)

Agreement systems (as in the English paraphrases) in some languages might
disambiguate this particular case (i.e., if "pretty" showed agreement with
"school" or with "girls"), but such systems are not a general solution.  In
the languages I've read grammars of, I've never found anything approaching
a general solution to this.  So I conject:

Burke's Ambiguity Conjecture:
* All natural languages are subject to syntactic ambiguity.

And a corollary:
* Artificial languages constructed to make syntactic ambiguity impossible
will be so unnatural as to be unlearnable by humans.  (I.e., that aspect of
the language will be unlearnable.  The rest of the language might well be
quite learnable.)


Another way to say this is: the mechanism that the brain uses to generate
synactic sentences is /incapable of reliably recognizing/ (or learning to
reliably recognize) when it has produced a synactic structure that contains
ambiguity.  Or: that it's incapable of distinguishing the kinds of
high-versus-low attachment that are the sources of these kinds of ambiguity.

Formally speaking, this conjecture has problems -- but hey, it's just a
conjecture.  Most notably, it's basically asserting a negative: nowhere
does (nor could there) there exist a natural language is free of syntactic
ambiguity.  I don't think it can be /proven/ true, because no-one can
examine all existing or potential natural languages, nor can anyone prove
that there's /no/ way to naturally teach someone a language free of
synactic ambiguity.
However, it can be /disproven/ to various degrees: one could simply point
out a natural language (such as I've just happened never to come across)
that does distinguish low-versus-high attachment.  Hopefully such a
mechanism wouldn't be idiosyncratic to attachment in a particular
structure, but would be general to much or all of the language's syntax.



Now, suppose this conjecture is true.
Back in the real world, one way to get around it in NLP is to have the
interface to the user /refuse/ any sentences which it finds to be
syntactically ambiguous.  This might make interaction in natural English
impossible; but interaction in some constructed language designed to
minimize (if not eliminate) ambiguity might still be feasible.
Another way, not so different, is that when ambiguous sentences are
encountered, the computer should ask the user for disambiguation.

--
Sean M. Burke sburke@netadventure.net http://www.netadventure.net/~sburke/

};

ok dissociate( $input ), '/\w/';
ok dissociate( $input ), '/\w/';
ok dissociate( $input ), '/\w/';
ok dissociate( $input ), '/\w/';
ok dissociate( $input ), '/\w/';

print "# Okay, that's that.\n";
ok 1;
( run in 0.807 second using v1.01-cache-2.11-cpan-df04353d9ac )