E-commerce metadata provides a rich source of image-to-size pairs because Amazon vendors specify the product's dimensions and provide multiple images of their product in different contexts and with varying camera angles, lighting conditions, etc. This makes it possible to learn metric scale estimation by training on e-commerce data like the Amazon Berkeley Objects dataset.
A from-scratch implementation of ResNet's trained on CIFAR-10/100. Dataloading accelerated by FFCV. Loss landscapes visualized using "filter normalization".
Made using Jon Barron's source code. Long live The Barron!