How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

📰 Reddit r/MachineLearning

Learn to select hyperparameters and architectures for self-supervised representation learning with non-monotonic loss functions

advanced Published 24 May 2026

Action Steps

Apply non-contrastive SSL methods like BYOL, JEPA, or data2vec to your dataset
Evaluate linear probe or KNN results during training to assess transfer learning
Use RankMe to help address the issue of non-monotonic loss functions
Configure and tune hyperparameters to optimize model performance
Test and compare different architectures to find the best one for your task

Who Needs to Know This

ML practitioners and researchers can benefit from this knowledge to improve their self-supervised learning models and evaluate their performance effectively

Key Insight

💡 Non-contrastive SSL methods and tools like RankMe can help address the challenges of non-monotonic loss functions in self-supervised representation learning

Full Article

Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; it’s models all the way down. Maybe I’ve got supervised tasks for which I’d like to see transfer, and I can evaluate linear probe/KNN results during training, but that seems like a way to efficiently abuse researcher degrees of freedom. I know RankMe is meant to help address this: embed s

Read full article → ← Back to Reads