Lightnews — Scholar-powered news

LLM360

@llm360.bsky.social

1.2K followers 32 following 14 posts

Working on fully open-source LLMs and training data. We believe in community-owned AI.

https://www.llm360.ai

Posts Replies Media Videos

LLM360

@llm360.bsky.social

Building on FineWeb’s global deduplication findings, we introduce a strategic upsampling recipe which outperforms FineWeb using TxT360. Full details are in the Upsampling Experiment section of the release blog.

November 19, 2024 at 10:51 PM

LLM360

@llm360.bsky.social

🪟🛠️LLM360 is committed to making open source AI accessible, transparent, and reproducible.

High-quality data is the first step toward better open source models...and we are excited to join the party contributing the first globally deduplicated dataset containing 5.7T tokens!

November 19, 2024 at 10:51 PM

LLM360

@llm360.bsky.social

📢📢 Check out:

TxT360: a globally deduplicated dataset for LLM pretraining

🌐 99 Common Crawls
📘 14 Curated Sources
👨‍🍳 recipe to easily adjust data weighting and train the most performant models

Dataset:
huggingface.co/datasets/LLM...

Blog:
llm360-txt360.hf.space

Banner image showing the TxT360 project.

November 19, 2024 at 10:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news