| Abstract: |
Rare category detection is an open challenge for active learning, especially
in the de-novo case (no labeled examples), but of significant practical
importance for data mining - e.g. detecting new financial transaction fraud
patterns, where normal legitimate transactions dominate. This paper develops
a new method for detecting an instance of each minority class via an
unsupervised local-density-differential sampling strategy. Essentially a
variable-scale nearest neighbor process is used to optimize the probability
of sampling tightly-grouped minority classes, subject to a local smoothness
assumption of the majority class. Results on both synthetic and real data
sets are very positive, detecting each minority class with only a fraction
of the actively sampled points required by random sampling and by Pelleg's
Interleave method, the prior best technique in the sparse literature on this
topic. |