6 Comments

Very clear explanations throughout the article. Thank you

Expand full comment

Thanks for the kind words, that's very motivating to hear!

Expand full comment

One interesting approach that I have come across for Active Learning is Label Dispersion. It's a good way of quantifying model uncertainity. TL;DR- Have a model predict an input's class a bunch of times. If it gets different outputs each time, your model is unsure. Turns out this works a lot better than using confidence.

The original paper introduced this idea in their paper- When Deep Learners Change Their Mind: Learning Dynamics for Active Learning- https://arxiv.org/abs/2107.14707

My breakdown of the paper- https://medium.com/mlearning-ai/evaluating-label-dispersion-is-it-the-best-metric-for-evaluating-model-uncertainty-e4a2b52c7fa1

This idea works for classification, but I've had success expanding it for regression as well. The process is simple- use an ensemble of diverse models, and their spread is the uncertainity of your prediction. You can take it a step further, and use probabilistic models + multiple inferences for more thoroughness.

Expand full comment

Thanks for sharing. I wonder if you could elaborate on "TL;DR- Have a model predict an input's class a bunch of times" a bit. Why would a model give different results labels for the same input each time? Are you including sources of randomness like Dropout in your model?

Expand full comment

It's been a while since I went through the paper, but IIRC, the paper does reference Bayesian Methods as a part of their strategy.

I can explain how I go through the process. Dropout and other sources of randomness are a big part of it. I also use an ensemble (and lots of probabilistic models) for the final outcomes. Works like peaches for classification and regression, and acts as a pretty good filter for figuring out what samples need to be labeled for active learning. We also combine this with a little bit of clustering of each sample as a distance metric. How much importance each of these components ends up depending on the problem (lots of trial and error). I wish I could give you something more rigorous, but that is pretty all I have figured out so far

Expand full comment

I see, that makes sense. K-fold ensembling is also a nice additional way for creating neural network ensembles for an approach like this.

Expand full comment