Lily-belle Sweet
@lilybellesweet.bsky.social
1K followers 510 following 64 posts
PhD student at UFZ - interested in explainable machine learning, agriculture and food security, compound climate events 🌾
Posts Media Videos Starter Packs
lilybellesweet.bsky.social
So excited to be organising this. Don't forget to register if you'd like to attend! We want to bring together people from different disciplines, so if your background is ML without the Ag, or vice versa, but you're interested in learning a bit about the other side - join us in Leipzig (or online)
lilybellesweet.bsky.social
📢 The 2nd AgML Workshop will be held at @ufz.de Leipzig from November 3–5th! Let's bring together researchers and practitioners in agriculture and machine learning to share knowledge & create open benchmark datasets and model evaluation protocols.
Register by July 31: events.hifis.net/e/agml2025
events.hifis.net
lilybellesweet.bsky.social
📢 The 2nd AgML Workshop will be held at @ufz.de Leipzig from November 3–5th! Let's bring together researchers and practitioners in agriculture and machine learning to share knowledge & create open benchmark datasets and model evaluation protocols.
Register by July 31: events.hifis.net/e/agml2025
events.hifis.net
lilybellesweet.bsky.social
If they aren't to be used for decisions, then what are they for? The NE blog says they'll be used to calculate emissions, plan conservation, make landscape management decisions, and that they mark 'a step change in our ability to make national, regional and local scale plans for England's peatlands'
lilybellesweet.bsky.social
This quote in particular!
lilybellesweet.bsky.social
📢 Tomorrow (27th May) at 16:00 CET, join our next online seminar on knowledge-guided machine learning for crop growth modelling! We will be hearing from Jingye Han, a researcher at Wageningen University. Open to all - get in touch for the Zoom link or join the AgML mailing list.
Knowledge-guided machine learning for crop growth modelling

The process-based crop modeling paradigm originated in the 1960s. Despite their widespread use, crop models are constrained by fixed representations of underlying processes and the difficulty of parameter estimation. Moreover, improvements to these models can typically only be made by a small group of experts, leading to inefficiencies. Data-driven approaches offer the potential to become the next-generation modeling paradigm for crops, enabling greater use of data while reducing reliance on manual effort. This talk will explore data-driven crop growth modeling, with an emphasis on how to incorporate domain knowledge to address overfitting issues caused by sparse data, and will offer perspectives on the future development of data-driven crop models.
lilybellesweet.bsky.social
The peat map issues are in the Times today (@catframpton.bsky.social and I are quoted) - www.thetimes.com/uk/environme...
lilybellesweet.bsky.social
More and more peat map issues identified. If only all ML maps used in academic research got the kind of painstaking sanity-checking that Cat's been doing
catframpton.bsky.social
And I have even found a SSSI bog, (brilliantly named as Max Bog!) assessed by natural England and looked after by @the-wildlifetrusts.bsky.social that is not mapped at all!

But the map shows it’s a sssi and that it’s a bog

Yet not peat?
But bogs are by their very nature made of peat aren’t they??
lilybellesweet.bsky.social
Also, this wasn't your question, but for extra context (because the climate change impact of AI has been brought up in the other thread a few times) - the type of model they're using here is really not energy-intensive. I think with this size dataset you could probably train it on your laptop :)
lilybellesweet.bsky.social
A big concern with maps is not just how they are produced, but how their limitations are communicated and what decisions they are used to make. I think the authors did a great job detailing the potential uncertainties of the model. Hopefully the map is used appropriately with those caveats in mind!
lilybellesweet.bsky.social
I don't agree with their statement towards the end that the model could potentially be used outside of the study area, because their evaluation isn't testing performance under those conditions, but for use inside that specific area it looks ok! And it reads like they're not actually planning that
lilybellesweet.bsky.social
... to similar products, and investigate what might be causing different accuracy between locations. Modelling temporal behaviour as well as spatial can be a bit tricky, so hopefully they evaluated that separately and carefully (didn't read that part in enough detail sorry!)
lilybellesweet.bsky.social
I had a quick look! Difference here is that they labelled their own data and carefully sampled the locations randomly over the study area. In this case, and when the model isn't planned to be used outside the study area, it's probably fine. The scores look way more reasonable & they compare them ...
Reposted by Lily-belle Sweet
bartoszbartk.com
The @ufz.de is currently offering funded fellowships for international researchers willing to visit and collaborate with us. I'm involved in this one, with a focus on "Socio-Environmental Modelling of Land and Water Systems": recruitingapp-5128.de.umantis.com/Vacancies/32...
Description of the fellowship offer (to be found on the linked website)
lilybellesweet.bsky.social
Thank you for letting me rant about model evaluation! My colleagues are sick of it by now.

I've never been to Dartmoor but it looks beautiful! Despite having possibly less peat (or in different places) than advertised
lilybellesweet.bsky.social
Yeah, and I am guessing also a lack of experience in applying ML on real, messy datasets? Because seeing an accuracy over 90% on a problem like this, where data quality is probably not super great, should have set off massive alarm bells imo
lilybellesweet.bsky.social
The issue is that the data is super clustered, and they've used random sampling, so they have no idea how good the model is outside of the clusters. Judging from Cat's examples in the thread (& she's not the only one to have spotted errors) it could be pretty bad. And it's being used to guide policy
lilybellesweet.bsky.social
Definitely. And in that case you just have to push for more data collection, or accept that you have no idea how accurate your model is in that location and ideally not publish any predictions there
lilybellesweet.bsky.social
If you train your model on some areas and then test that it works at another area really far away, you can be more confident that it works in the places in-between. Ofc it's a bit more complicated than that, because different places have different landscapes and trees etc, but that's the idea
lilybellesweet.bsky.social
Ideally the data would be distributed more evenly, but often you can't control that. But you can evaluate the model in a tougher way. The trick is to split the data so that the distance between training and test points is similar to the distance to the places you plan to use the model.
lilybellesweet.bsky.social
This is a great bit of writing from a paper by Meyer & Pebesma which discusses this in more detail www.nature.com/articles/s41...
Maps, and in particular global maps, create a strong feeling of satisfaction, suggesting we now know it all. They are however also used, enlarged, torn apart, read in detail, and may form the basis for local decisions of all kinds, or even form the inputs for follow-up models. If a global map does not come with clear instructions about its value, like a prescription for subsequent use, it is easy to abuse it. Wyborn and Evans4 rightly ask about “what changes are global maps, and their creators, trying to bring about in the world?”, and suggest a re-engagement with empirical studies of local and regional contexts while seeking co-construction with those having local knowledge. The fact that creating global maps of anything nowadays is so easy does not mean these maps are always useful.
lilybellesweet.bsky.social
It's also possible the model has actually learned really useful relationships and performs great everywhere! But unless you test specifically on data far away from data used to train, you have no idea if this is the case. And judging by the mistakes you've identified, it's not...
lilybellesweet.bsky.social
If the model predicts the same value as the closest datapoint in the training set, it will probably achieve really high accuracy (perhaps even 94%...). But as soon as you use the model on data far away from any location it's seen, that won't work any more.