Fascinating stuff.
A high-resolution dataset of roads of the Roman Empire, this study claims to have identified thousands more miles of Roman roads than were previously known.
www.nature.com/articles/s41...
A high-resolution dataset of roads of the Roman Empire, this study claims to have identified thousands more miles of Roman roads than were previously known.
www.nature.com/articles/s41...
Itiner-e: A high-resolution dataset of roads of the Roman Empire - Scientific Data
Scientific Data - Itiner-e: A high-resolution dataset of roads of the Roman Empire
www.nature.com
November 8, 2025 at 9:05 PM
Fascinating stuff.
A high-resolution dataset of roads of the Roman Empire, this study claims to have identified thousands more miles of Roman roads than were previously known.
www.nature.com/articles/s41...
A high-resolution dataset of roads of the Roman Empire, this study claims to have identified thousands more miles of Roman roads than were previously known.
www.nature.com/articles/s41...
This will spark some great papers: an open digital dataset of roads in the Roman Empire
www.nature.com/articles/d41...
www.nature.com/articles/d41...
‘Google Maps’ for Roman roads reveals vast extent of ancient network
A high-resolution digital map nearly doubles the known length of the ancient road network.
www.nature.com
November 9, 2025 at 1:38 PM
This will spark some great papers: an open digital dataset of roads in the Roman Empire
www.nature.com/articles/d41...
www.nature.com/articles/d41...
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
November 10, 2025 at 5:30 PM
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
“authors & publishers who filed a lawsuit against the Sam Altman-led firm have secured access to internal Slack messages… discussing the mass deletion of a pirated books dataset… A NY district court ordered OpenAI to hand over the communications regarding data deletion”
futurism.com/artificial-i...
futurism.com/artificial-i...
OpenAI in Danger After Authors Suing It Gain Access to Its Internal Slack Messages
Authors and publishers, who are suing OpenAI, secured access to internal Slack messages and emails discussing the deletion of pirated books.
futurism.com
November 6, 2025 at 7:57 AM
“authors & publishers who filed a lawsuit against the Sam Altman-led firm have secured access to internal Slack messages… discussing the mass deletion of a pirated books dataset… A NY district court ordered OpenAI to hand over the communications regarding data deletion”
futurism.com/artificial-i...
futurism.com/artificial-i...
Researchers used archaeological & historical records,topographic maps & satellite imagery to create Itiner-e,a new high-resolution digital dataset & map of the Roman Empire’s roads around 150 AD. It charts 185,896 miles (299,171 km) of roads across almost 1,544,409 square miles (4,000,000 square km)
The Roman Empire’s Entire Road Network Just Got Mapped, and It's Mind-Blowing
A new study identified over 68,000 more miles of ancient Roman roads than were previously known.
gizmodo.com
November 8, 2025 at 5:00 PM
Researchers used archaeological & historical records,topographic maps & satellite imagery to create Itiner-e,a new high-resolution digital dataset & map of the Roman Empire’s roads around 150 AD. It charts 185,896 miles (299,171 km) of roads across almost 1,544,409 square miles (4,000,000 square km)
It's a statistical mirage. The 10th percentile spiked because millions of the lowest-paid workers got wiped out of the dataset in 2020. If you delete the lowest rungs of a ladder, the ‘bottom’ suddenly looks higher.
November 10, 2025 at 11:00 AM
It's a statistical mirage. The 10th percentile spiked because millions of the lowest-paid workers got wiped out of the dataset in 2020. If you delete the lowest rungs of a ladder, the ‘bottom’ suddenly looks higher.
😔
"even the lower end of the predicted range for 2025 would result in a 3-year average above 1.5°C, according to the ERA5 dataset. This would be the first time that this has occurred in the instrumental period."
@ecmwf.int
@copernicusecmwf.bsky.social
"even the lower end of the predicted range for 2025 would result in a 3-year average above 1.5°C, according to the ERA5 dataset. This would be the first time that this has occurred in the instrumental period."
@ecmwf.int
@copernicusecmwf.bsky.social
November 8, 2025 at 11:41 PM
😔
"even the lower end of the predicted range for 2025 would result in a 3-year average above 1.5°C, according to the ERA5 dataset. This would be the first time that this has occurred in the instrumental period."
@ecmwf.int
@copernicusecmwf.bsky.social
"even the lower end of the predicted range for 2025 would result in a 3-year average above 1.5°C, according to the ERA5 dataset. This would be the first time that this has occurred in the instrumental period."
@ecmwf.int
@copernicusecmwf.bsky.social
Its wild that theres omegaverse "AI" dataset training in the Cuomo campaign
November 5, 2025 at 10:20 PM
Its wild that theres omegaverse "AI" dataset training in the Cuomo campaign
Embark Studio already pulled the same "It's Ethical" excuse in their previous game (The Finals)
Even if they applied a dataset of their own actors (with consent), the software was still trained on stolen work originally.
GenAI cannot be ethical. The software is rotten by default.
Even if they applied a dataset of their own actors (with consent), the software was still trained on stolen work originally.
GenAI cannot be ethical. The software is rotten by default.
November 9, 2025 at 1:00 AM
Embark Studio already pulled the same "It's Ethical" excuse in their previous game (The Finals)
Even if they applied a dataset of their own actors (with consent), the software was still trained on stolen work originally.
GenAI cannot be ethical. The software is rotten by default.
Even if they applied a dataset of their own actors (with consent), the software was still trained on stolen work originally.
GenAI cannot be ethical. The software is rotten by default.
Grok a visiblement ete entraîné sur un dataset qui accumule beaucoup de fan de Tokyo ghoul
Elon has never experienced this and neither have his biggest fans but now they can create these visuals with the help of a computermachine
November 8, 2025 at 5:02 PM
Grok a visiblement ete entraîné sur un dataset qui accumule beaucoup de fan de Tokyo ghoul
ローマ帝国の道路の高解像度データセットらしい。
いいねえ。
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
いいねえ。
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
Itiner-e: A high-resolution dataset of roads of the Roman Empire - Scientific Data
Scientific Data - Itiner-e: A high-resolution dataset of roads of the Roman Empire
www.nature.com
November 8, 2025 at 3:39 PM
ローマ帝国の道路の高解像度データセットらしい。
いいねえ。
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
いいねえ。
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
8 months ago I had a vision: what if we had a unified, track-based dataset of all severe events?
Still a lot of improvements I can make to the algorithm itself, but otherwise, looks like we have one now. I can't wait to see what we can do with this 👀
Still a lot of improvements I can make to the algorithm itself, but otherwise, looks like we have one now. I can't wait to see what we can do with this 👀
November 7, 2025 at 7:42 PM
8 months ago I had a vision: what if we had a unified, track-based dataset of all severe events?
Still a lot of improvements I can make to the algorithm itself, but otherwise, looks like we have one now. I can't wait to see what we can do with this 👀
Still a lot of improvements I can make to the algorithm itself, but otherwise, looks like we have one now. I can't wait to see what we can do with this 👀
We are looking for #rstats community feedback on 3 new dplyr functions!
We're aiming to expand the `filter()` family:
- `filter()` to keep rows
- `filter_out()` to drop rows
- `when_any()` and `when_all()` as modifiers
Read more and leave feedback here:
github.com/tidyverse/ti...
We're aiming to expand the `filter()` family:
- `filter()` to keep rows
- `filter_out()` to drop rows
- `when_any()` and `when_all()` as modifiers
Read more and leave feedback here:
github.com/tidyverse/ti...
November 7, 2025 at 4:03 PM
We are looking for #rstats community feedback on 3 new dplyr functions!
We're aiming to expand the `filter()` family:
- `filter()` to keep rows
- `filter_out()` to drop rows
- `when_any()` and `when_all()` as modifiers
Read more and leave feedback here:
github.com/tidyverse/ti...
We're aiming to expand the `filter()` family:
- `filter()` to keep rows
- `filter_out()` to drop rows
- `when_any()` and `when_all()` as modifiers
Read more and leave feedback here:
github.com/tidyverse/ti...
“We conduct the largest pretraining poisoning experiments to date, pretraining models from 600M to 13B parameters on chinchilla-optimal datasets (6B to 260B tokens). We find that 250 poisoned documents similarly compromise models across all model and dataset sizes…”
arxiv.org/abs/2510.07192
arxiv.org/abs/2510.07192
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversar...
arxiv.org
November 7, 2025 at 12:39 PM
“We conduct the largest pretraining poisoning experiments to date, pretraining models from 600M to 13B parameters on chinchilla-optimal datasets (6B to 260B tokens). We find that 250 poisoned documents similarly compromise models across all model and dataset sizes…”
arxiv.org/abs/2510.07192
arxiv.org/abs/2510.07192
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
www.nature.com/articles/s41...
Itiner-e: A high-resolution dataset of roads of the Roman Empire - Scientific Data
Scientific Data - Itiner-e: A high-resolution dataset of roads of the Roman Empire
www.nature.com
November 7, 2025 at 7:06 PM
Itiner-e: A high-resolution dataset of roads of the Roman Empire | Scientific Data
www.nature.com/articles/s41...
www.nature.com/articles/s41...
New dataset: sea ice drift, tracked daily. Discover daily gridded data from satellite sensors showing how Arctic and Antarctic ice moves. Essential for tracking climate change impacts.
Access the new dataset: cds.climate.copernicus.eu/datasets/sat...
⬇️
Access the new dataset: cds.climate.copernicus.eu/datasets/sat...
⬇️
November 6, 2025 at 10:22 AM
New dataset: sea ice drift, tracked daily. Discover daily gridded data from satellite sensors showing how Arctic and Antarctic ice moves. Essential for tracking climate change impacts.
Access the new dataset: cds.climate.copernicus.eu/datasets/sat...
⬇️
Access the new dataset: cds.climate.copernicus.eu/datasets/sat...
⬇️
OpenAI pirated large numbers of books and used them to train models.
OpenAI then deleted the dataset with the pirated books, and employees sent each other messages about doing so.
A lawsuit could now force the company to pay $150,000 per book, adding up to billions in damages.
OpenAI then deleted the dataset with the pirated books, and employees sent each other messages about doing so.
A lawsuit could now force the company to pay $150,000 per book, adding up to billions in damages.
November 4, 2025 at 3:30 PM
OpenAI pirated large numbers of books and used them to train models.
OpenAI then deleted the dataset with the pirated books, and employees sent each other messages about doing so.
A lawsuit could now force the company to pay $150,000 per book, adding up to billions in damages.
OpenAI then deleted the dataset with the pirated books, and employees sent each other messages about doing so.
A lawsuit could now force the company to pay $150,000 per book, adding up to billions in damages.
Oh fun, when you do an export of a Google Docs (seems to be limited at that) it replaces all the links you put there with redirects that Google can use to track link use. On an isolated document it's already not great but you can build many relation graphs with the complete dataset they have.
November 8, 2025 at 8:55 PM
Oh fun, when you do an export of a Google Docs (seems to be limited at that) it replaces all the links you put there with redirects that Google can use to track link use. On an isolated document it's already not great but you can build many relation graphs with the complete dataset they have.
Suggest the Queensland parliament suspend blame here & discuss sensibly the solutions. How can our state reduce the two biggest crimes in the ABS dataset; DV & sexual assault? What preventative measures can our state take? It's a house of debate not blame. #qldpol
Queenslanders have said there's less police presence on our roads. Data backs that up. Officers are stretched. Prior to the election the Commissioner said they were attending large volumes of DV cases. Crisafulli has them on youth crime now as well. #qldpol
www.abc.net.au/news/2025-11...
www.abc.net.au/news/2025-11...
Police enforcement on Qld roads drops almost 50pc in five years
Queensland Police concedes less time is being spent enforcing road safety than before the COVID pandemic, amid a rising number of traffic tragedies.
www.abc.net.au
November 9, 2025 at 12:25 AM
Suggest the Queensland parliament suspend blame here & discuss sensibly the solutions. How can our state reduce the two biggest crimes in the ABS dataset; DV & sexual assault? What preventative measures can our state take? It's a house of debate not blame. #qldpol
Can LLMs accurately aggregate information over long, information-dense texts? Not yet…
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
November 7, 2025 at 5:07 PM
Can LLMs accurately aggregate information over long, information-dense texts? Not yet…
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
🎨 Recording now live!
Developing Your #DataVisualisation Skills in #ggplot2 with @nrennie.bsky.social
Watch the creative process exploring the #TidyTuesday cats dataset 🐈
📚 Materials & links:
🔗 nrennie.rbind.io/talks/rladies-rome-ggplot2
#RLadies #ggplot2 #TidyTuesday @fgazzelloni.bsky.social
Developing Your #DataVisualisation Skills in #ggplot2 with @nrennie.bsky.social
Watch the creative process exploring the #TidyTuesday cats dataset 🐈
📚 Materials & links:
🔗 nrennie.rbind.io/talks/rladies-rome-ggplot2
#RLadies #ggplot2 #TidyTuesday @fgazzelloni.bsky.social
Developing your data visualisation skills in ggplot2 – Nicola Rennie
We’ll be working through the process of developing a visualisation in ggplot2. Some knowledge of the tidyverse will be helpful, but all of the code in this session will be explained and available…
nrennie.rbind.io
November 8, 2025 at 8:54 PM
🎨 Recording now live!
Developing Your #DataVisualisation Skills in #ggplot2 with @nrennie.bsky.social
Watch the creative process exploring the #TidyTuesday cats dataset 🐈
📚 Materials & links:
🔗 nrennie.rbind.io/talks/rladies-rome-ggplot2
#RLadies #ggplot2 #TidyTuesday @fgazzelloni.bsky.social
Developing Your #DataVisualisation Skills in #ggplot2 with @nrennie.bsky.social
Watch the creative process exploring the #TidyTuesday cats dataset 🐈
📚 Materials & links:
🔗 nrennie.rbind.io/talks/rladies-rome-ggplot2
#RLadies #ggplot2 #TidyTuesday @fgazzelloni.bsky.social
Roman road network was twice as large as previously thought, new mapping project finds
Paper: Itiner-e: A high-resolution dataset of roads of the Roman Empire
doi.org/10.1038/s415...
Paper: Itiner-e: A high-resolution dataset of roads of the Roman Empire
doi.org/10.1038/s415...
Roman road network was twice as large as previously thought, new mapping project finds
The new digital map increases the Roman road network by nearly 100%.
www.livescience.com
November 7, 2025 at 3:21 AM
Roman road network was twice as large as previously thought, new mapping project finds
Paper: Itiner-e: A high-resolution dataset of roads of the Roman Empire
doi.org/10.1038/s415...
Paper: Itiner-e: A high-resolution dataset of roads of the Roman Empire
doi.org/10.1038/s415...
A new high resolution digital dataset and map — named Itiner-e — of roads throughout the Roman Empire around the year 150 CE is presented in research published in Scientific Data. The findings increase the known length of the Empire’s road system by over 100,000 kilometres. 🏺 🧪
Itiner-e: A high-resolution dataset of roads of the Roman Empire - Scientific Data
The Roman Empire’s road system was critical for structuring the movement of people, goods and ideas, and sustaining imperial control. Yet, it remains incompletely mapped and poorly integrated across sources despite centuries of research. We present Itiner-e, the most detailed and comprehensive open digital dataset of roads in the entire Roman Empire. It was created by identifying roads from archaeological and historical sources, locating them using modern and historical topographic maps and remote sensing, and digitising them with road segment-level metadata and certainty categories. The dataset nearly doubles the known length of Roman roads through increased coverage and spatial precision, and reveals that the location of only 2.737% are known with certainty. This resource is transformative for understanding how mobility shaped connectivity, administration, and even disease transmission in the ancient world, and for studies of the millennia-long development of terrestrial mobility in the region.
go.nature.com
November 6, 2025 at 11:21 PM
A new high resolution digital dataset and map — named Itiner-e — of roads throughout the Roman Empire around the year 150 CE is presented in research published in Scientific Data. The findings increase the known length of the Empire’s road system by over 100,000 kilometres. 🏺 🧪
Well this is fab - Itiner-e – The Digital Atlas of Ancient Roads: the most detailed open digital dataset of roads in the entire Roman Empire. Hasn't quite reached Scotland yet (plenty of Roman roads up here yet to be added), but fun to explore! itiner-e.org
itiner-e
itiner-e.org
November 7, 2025 at 12:39 PM
Well this is fab - Itiner-e – The Digital Atlas of Ancient Roads: the most detailed open digital dataset of roads in the entire Roman Empire. Hasn't quite reached Scotland yet (plenty of Roman roads up here yet to be added), but fun to explore! itiner-e.org