How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]
📰 Reddit r/MachineLearning
Learn to select hyperparameters and architectures for self-supervised representation learning with non-monotonic loss functions
Action Steps
- Apply non-contrastive SSL methods like BYOL, JEPA, or data2vec to your dataset
- Evaluate linear probe or KNN results during training to assess transfer learning
- Use RankMe to help address the issue of non-monotonic loss functions
- Configure and tune hyperparameters to optimize model performance
- Test and compare different architectures to find the best one for your task
Who Needs to Know This
ML practitioners and researchers can benefit from this knowledge to improve their self-supervised learning models and evaluate their performance effectively
Key Insight
💡 Non-contrastive SSL methods and tools like RankMe can help address the challenges of non-monotonic loss functions in self-supervised representation learning
Share This
🤖 Selecting hyperparameters and architectures for self-supervised representation learning just got easier! 📈
Full Article
Non-contrastive SSL methods like BYOL/JEPA/data2vec seem promising, but I have no idea what is being learned, or how well; it’s models all the way down. Maybe I’ve got supervised tasks for which I’d like to see transfer, and I can evaluate linear probe/KNN results during training, but that seems like a way to efficiently abuse researcher degrees of freedom. I know RankMe is meant to help address this: embed s
DeepCamp AI