Lightnews — Scholar-powered news

Lily-belle Sweet @lilybellesweet.bsky.social · Jun 24

I'm really looking forward to leading a student research project at the ELLIS Summer School on AI for Earth and Climate Sciences in Jena this September! What a great lineup of speakers, mentors and tutors www.ellis-jena.eu/summer-schoo... @ellisunitjena.bsky.social @ellis.eu

Summer School 2025

ELLIS Summer School: AI for Earth and Climate Sciences Jena (Germany), September 1–5, 2025The 2025 ELLIS Summer School on AI for Earth & Climate Sciences is a flagship event for PhD students, MS...

www.ellis-jena.eu

4

Lily-belle Sweet @lilybellesweet.bsky.social · Jun 24

So excited to be organising this. Don't forget to register if you'd like to attend! We want to bring together people from different disciplines, so if your background is ML without the Ag, or vice versa, but you're interested in learning a bit about the other side - join us in Leipzig (or online)

Lily-belle Sweet @lilybellesweet.bsky.social · Jun 19

📢 The 2nd AgML Workshop will be held at @ufz.de Leipzig from November 3–5th! Let's bring together researchers and practitioners in agriculture and machine learning to share knowledge & create open benchmark datasets and model evaluation protocols.
Register by July 31: events.hifis.net/e/agml2025

events.hifis.net

1

Lily-belle Sweet @lilybellesweet.bsky.social · Jun 19

📢 The 2nd AgML Workshop will be held at @ufz.de Leipzig from November 3–5th! Let's bring together researchers and practitioners in agriculture and machine learning to share knowledge & create open benchmark datasets and model evaluation protocols.
Register by July 31: events.hifis.net/e/agml2025

events.hifis.net

2 4

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

I don't disagree with the broader point, but the methods reportedly used to evaluate the models doesn't appear sufficient for the intended use case, and could lead to inflated performance scores. This is a common pitfall: www.nature.com/articles/s41... I'm pro-AI, but also pro-proper evaluation :)

Spatial validation reveals poor predictive performance of large-scale ecological mapping models - Nature Communications

Mapping ecological variables using machine-learning algorithms based on remote-sensing data has become a widespread practice in ecology. Here, the authors use forest biomass mapping as a study case to...

www.nature.com

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

If they aren't to be used for decisions, then what are they for? The NE blog says they'll be used to calculate emissions, plan conservation, make landscape management decisions, and that they mark 'a step change in our ability to make national, regional and local scale plans for England's peatlands'

1 1

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

This quote in particular!

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

📢 Tomorrow (27th May) at 16:00 CET, join our next online seminar on knowledge-guided machine learning for crop growth modelling! We will be hearing from Jingye Han, a researcher at Wageningen University. Open to all - get in touch for the Zoom link or join the AgML mailing list.

Knowledge-guided machine learning for crop growth modelling

The process-based crop modeling paradigm originated in the 1960s. Despite their widespread use, crop models are constrained by fixed representations of underlying processes and the difficulty of parameter estimation. Moreover, improvements to these models can typically only be made by a small group of experts, leading to inefficiencies. Data-driven approaches offer the potential to become the next-generation modeling paradigm for crops, enabling greater use of data while reducing reliance on manual effort. This talk will explore data-driven crop growth modeling, with an emphasis on how to incorporate domain knowledge to address overfitting issues caused by sparse data, and will offer perspectives on the future development of data-driven crop models.

1 4

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

The peat map issues are in the Times today (@catframpton.bsky.social and I are quoted) - www.thetimes.com/uk/environme...

1 7

Lily-belle Sweet @lilybellesweet.bsky.social · May 26

More and more peat map issues identified. If only all ML maps used in academic research got the kind of painstaking sanity-checking that Cat's been doing

Cat Frampton @catframpton.bsky.social · May 25

And I have even found a SSSI bog, (brilliantly named as Max Bog!) assessed by natural England and looked after by @the-wildlifetrusts.bsky.social that is not mapped at all!

But the map shows it’s a sssi and that it’s a bog

Yet not peat?
But bogs are by their very nature made of peat aren’t they??

1 1 5

Lily-belle Sweet @lilybellesweet.bsky.social · May 23

Purple: 'Only once'

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 21

Also, this wasn't your question, but for extra context (because the climate change impact of AI has been brought up in the other thread a few times) - the type of model they're using here is really not energy-intensive. I think with this size dataset you could probably train it on your laptop :)

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 21

A big concern with maps is not just how they are produced, but how their limitations are communicated and what decisions they are used to make. I think the authors did a great job detailing the potential uncertainties of the model. Hopefully the map is used appropriately with those caveats in mind!

1 2

Lily-belle Sweet @lilybellesweet.bsky.social · May 21

I don't agree with their statement towards the end that the model could potentially be used outside of the study area, because their evaluation isn't testing performance under those conditions, but for use inside that specific area it looks ok! And it reads like they're not actually planning that

1 1

Lily-belle Sweet @lilybellesweet.bsky.social · May 21

... to similar products, and investigate what might be causing different accuracy between locations. Modelling temporal behaviour as well as spatial can be a bit tricky, so hopefully they evaluated that separately and carefully (didn't read that part in enough detail sorry!)

1 1

Lily-belle Sweet @lilybellesweet.bsky.social · May 21

I had a quick look! Difference here is that they labelled their own data and carefully sampled the locations randomly over the study area. In this case, and when the model isn't planned to be used outside the study area, it's probably fine. The scores look way more reasonable & they compare them ...

1 1

Reposted by Lily-belle Sweet

Bartosz Bartkowski @bartoszbartk.com · May 16

The @ufz.de is currently offering funded fellowships for international researchers willing to visit and collaborate with us. I'm involved in this one, with a focus on "Socio-Environmental Modelling of Land and Water Systems": recruitingapp-5128.de.umantis.com/Vacancies/32...

Description of the fellowship offer (to be found on the linked website)

7 6

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

Thank you for letting me rant about model evaluation! My colleagues are sick of it by now.

I've never been to Dartmoor but it looks beautiful! Despite having possibly less peat (or in different places) than advertised

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

Yeah, and I am guessing also a lack of experience in applying ML on real, messy datasets? Because seeing an accuracy over 90% on a problem like this, where data quality is probably not super great, should have set off massive alarm bells imo

1 1

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

The issue is that the data is super clustered, and they've used random sampling, so they have no idea how good the model is outside of the clusters. Judging from Cat's examples in the thread (& she's not the only one to have spotted errors) it could be pretty bad. And it's being used to guide policy

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

Definitely. And in that case you just have to push for more data collection, or accept that you have no idea how accurate your model is in that location and ideally not publish any predictions there

1 1

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

If you train your model on some areas and then test that it works at another area really far away, you can be more confident that it works in the places in-between. Ofc it's a bit more complicated than that, because different places have different landscapes and trees etc, but that's the idea

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

Ideally the data would be distributed more evenly, but often you can't control that. But you can evaluate the model in a tougher way. The trick is to split the data so that the distance between training and test points is similar to the distance to the places you plan to use the model.

2

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

This is a great bit of writing from a paper by Meyer & Pebesma which discusses this in more detail www.nature.com/articles/s41...

Maps, and in particular global maps, create a strong feeling of satisfaction, suggesting we now know it all. They are however also used, enlarged, torn apart, read in detail, and may form the basis for local decisions of all kinds, or even form the inputs for follow-up models. If a global map does not come with clear instructions about its value, like a prescription for subsequent use, it is easy to abuse it. Wyborn and Evans4 rightly ask about “what changes are global maps, and their creators, trying to bring about in the world?”, and suggest a re-engagement with empirical studies of local and regional contexts while seeking co-construction with those having local knowledge. The fact that creating global maps of anything nowadays is so easy does not mean these maps are always useful.

1

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

It's also possible the model has actually learned really useful relationships and performs great everywhere! But unless you test specifically on data far away from data used to train, you have no idea if this is the case. And judging by the mistakes you've identified, it's not...

2

Lily-belle Sweet @lilybellesweet.bsky.social · May 20

If the model predicts the same value as the closest datapoint in the training set, it will probably achieve really high accuracy (perhaps even 94%...). But as soon as you use the model on data far away from any location it's seen, that won't work any more.

1