6 Comments
Feb 6, 2023Liked by Sebastian Raschka, PhD

Very clear explanations throughout the article. Thank you

Expand full comment
author

Thanks for the kind words, that's very motivating to hear!

Expand full comment

One interesting approach that I have come across for Active Learning is Label Dispersion. It's a good way of quantifying model uncertainity. TL;DR- Have a model predict an input's class a bunch of times. If it gets different outputs each time, your model is unsure. Turns out this works a lot better than using confidence.

The original paper introduced this idea in their paper- When Deep Learners Change Their Mind: Learning Dynamics for Active Learning- https://arxiv.org/abs/2107.14707

My breakdown of the paper- https://medium.com/mlearning-ai/evaluating-label-dispersion-is-it-the-best-metric-for-evaluating-model-uncertainty-e4a2b52c7fa1

This idea works for classification, but I've had success expanding it for regression as well. The process is simple- use an ensemble of diverse models, and their spread is the uncertainity of your prediction. You can take it a step further, and use probabilistic models + multiple inferences for more thoroughness.

Expand full comment
author

Thanks for sharing. I wonder if you could elaborate on "TL;DR- Have a model predict an input's class a bunch of times" a bit. Why would a model give different results labels for the same input each time? Are you including sources of randomness like Dropout in your model?

Expand full comment

It's been a while since I went through the paper, but IIRC, the paper does reference Bayesian Methods as a part of their strategy.

I can explain how I go through the process. Dropout and other sources of randomness are a big part of it. I also use an ensemble (and lots of probabilistic models) for the final outcomes. Works like peaches for classification and regression, and acts as a pretty good filter for figuring out what samples need to be labeled for active learning. We also combine this with a little bit of clustering of each sample as a distance metric. How much importance each of these components ends up depending on the problem (lots of trial and error). I wish I could give you something more rigorous, but that is pretty all I have figured out so far

Expand full comment
author

I see, that makes sense. K-fold ensembling is also a nice additional way for creating neural network ensembles for an approach like this.

Expand full comment