A Treasure Trove of Plant Images: 96.1M Rows of iNaturalist Research-Grade Data

A Treasure Trove of Plant Images: 96.1M Rows of iNaturalist Research-Grade Data

I recently stumbled upon an incredible dataset of plant images on Reddit. It’s a massive collection of 96.1M rows of iNaturalist Research-Grade plant images, complete with species names, coordinates, licenses, and more. The best part? It’s been carefully cleaned and packed into a Hugging Face dataset, making it easier to use for machine learning projects.

The creator of the dataset, /u/Lonely-Marzipan-9473, was working with GBIF (Global Biodiversity Information Facility) data and found it to be messy and difficult to use for ML. They decided to take matters into their own hands and create a more usable dataset.

The dataset is a plant subset of the iNaturalist Research Grade Dataset and includes images, species names, coordinates, licenses, and filters to remove broken media. It’s a great resource for anyone looking to test vision models on real-world, noisy data.

What’s even more impressive is that the creator also fine-tuned Google Vit Base on 2M data points and 14k species classes. You can find the model on Hugging Face, along with the dataset.

If you’re interested in plant identification or machine learning, this dataset is definitely worth checking out. And if you have any questions or feedback, the creator is happy to hear from you.

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注