| Abstract: |
Conditional random fields (Lafferty, McCallum, and Pereira, 2001) are
quite effective at sequence labeling tasks like shallow parsing (Sha
and Pereira, 2003) and named-entity extraction (McCallum and Li,
2003). CRFs are *log-linear*, allowing the incorporation of arbitrary
features into the model. To train on *unlabeled* data, we
require *unsupervised*estimation methods for log-linear models; few
exist. We describe a novel approach, contrastive estimation (CE). We
show that the new technique can be intuitively understood as
exploiting implicit negative evidence and is computationally efficient
(unlike log-linear EM). In fact, CE generalizes EM and a variety of
other objective functions. By engineering classes of implicit
negative evidence, CE can be adapted for specific applications.
We describe applications to two natural language learning
problems---POS tagging of unlabeled text with a dictionary (Merialdo,
1994) and dependency grammar induction (Klein and Manning, 2004)---and
show how contrastive estimation outperforms EM (with the same feature
sets). Further, contrastive estimation is more robust to loss of
domain knowledge (dictionary degradation or uninformative
initialization) and can recover by modeling additional, nonorthogonal
features.
This is joint work with Jason Eisner and was presented at ACL and
the IJCAI Workshop on Grammatical Inference Applications this summer. |