I hope you had a successful start to the new year, as did AI and deep learning research. In this edition of Ahead of AI #5, I wanted to showcase recent advancements in computer vision rather than simply covering the increasing popularity of large language models. This newsletter aims to revive ideas and take convolutional neural networks to new heights.

One interesting approach that I have come across for Active Learning is Label Dispersion. It's a good way of quantifying model uncertainity. TL;DR- Have a model predict an input's class a bunch of times. If it gets different outputs each time, your model is unsure. Turns out this works a lot better than using confidence.

The original paper introduced this idea in their paper- When Deep Learners Change Their Mind: Learning Dynamics for Active Learning- https://arxiv.org/abs/2107.14707

This idea works for classification, but I've had success expanding it for regression as well. The process is simple- use an ensemble of diverse models, and their spread is the uncertainity of your prediction. You can take it a step further, and use probabilistic models + multiple inferences for more thoroughness.

Thanks for sharing. I wonder if you could elaborate on "TL;DR- Have a model predict an input's class a bunch of times" a bit. Why would a model give different results labels for the same input each time? Are you including sources of randomness like Dropout in your model?

It's been a while since I went through the paper, but IIRC, the paper does reference Bayesian Methods as a part of their strategy.

I can explain how I go through the process. Dropout and other sources of randomness are a big part of it. I also use an ensemble (and lots of probabilistic models) for the final outcomes. Works like peaches for classification and regression, and acts as a pretty good filter for figuring out what samples need to be labeled for active learning. We also combine this with a little bit of clustering of each sample as a distance metric. How much importance each of these components ends up depending on the problem (lots of trial and error). I wish I could give you something more rigorous, but that is pretty all I have figured out so far

Very clear explanations throughout the article. Thank you

Thanks for the kind words, that's very motivating to hear!

One interesting approach that I have come across for Active Learning is Label Dispersion. It's a good way of quantifying model uncertainity. TL;DR- Have a model predict an input's class a bunch of times. If it gets different outputs each time, your model is unsure. Turns out this works a lot better than using confidence.

The original paper introduced this idea in their paper- When Deep Learners Change Their Mind: Learning Dynamics for Active Learning- https://arxiv.org/abs/2107.14707

My breakdown of the paper- https://medium.com/mlearning-ai/evaluating-label-dispersion-is-it-the-best-metric-for-evaluating-model-uncertainty-e4a2b52c7fa1

This idea works for classification, but I've had success expanding it for regression as well. The process is simple- use an ensemble of diverse models, and their spread is the uncertainity of your prediction. You can take it a step further, and use probabilistic models + multiple inferences for more thoroughness.

Thanks for sharing. I wonder if you could elaborate on "TL;DR- Have a model predict an input's class a bunch of times" a bit. Why would a model give different results labels for the same input each time? Are you including sources of randomness like Dropout in your model?

It's been a while since I went through the paper, but IIRC, the paper does reference Bayesian Methods as a part of their strategy.

I can explain how I go through the process. Dropout and other sources of randomness are a big part of it. I also use an ensemble (and lots of probabilistic models) for the final outcomes. Works like peaches for classification and regression, and acts as a pretty good filter for figuring out what samples need to be labeled for active learning. We also combine this with a little bit of clustering of each sample as a distance metric. How much importance each of these components ends up depending on the problem (lots of trial and error). I wish I could give you something more rigorous, but that is pretty all I have figured out so far

I see, that makes sense. K-fold ensembling is also a nice additional way for creating neural network ensembles for an approach like this.