Common Crawl Foundation
@commoncrawl.bsky.social
300 followers 51 following 63 posts
Common Crawl is a non-profit foundation dedicated to the Open Web.
Posts Media Videos Starter Packs
commoncrawl.bsky.social
Common Crawl’s Web Languages initiative has had many contributions since its introduction. We’re calling for native speakers of certain languages to review language contributions, to ensure that links we’re adding to our seed crawl are of good quality.

commoncrawl.org/blog/web-lan...
Common Crawl - Blog - Web Languages Needing Review by Native Speakers
Common Crawl’s Web Languages initiative has had many contributions since its introduction. We’re calling for native speakers of certain languages to review language contributions, to ensure that links...
commoncrawl.org
commoncrawl.bsky.social
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August, and September 2025. The host-level graph consists of 628.7 million nodes and 6.9 billion edges, and the domain-level graph consists of 184.6 million nodes and 5.4 billion edges.
Common Crawl - Blog - Host- and Domain-Level Web Graphs July, August, and September 2025
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August, and September 2025. The host-level graph consists of 628.7 million nodes and 6.9...
commoncrawl.org
commoncrawl.bsky.social
The era of traditional search engine optimization is rapidly evolving into "AIO" (AI optimization), where businesses must ensure their content exists in AI training datasets to remain discoverable as users increasingly turn to AI assistants for answers.

commoncrawl.org/blog/from-se...
Common Crawl - Blog - From SEO to AIO: Why Your Content Needs to Exist in AI Training Data
The era of traditional search engine optimization is rapidly evolving into
commoncrawl.org
commoncrawl.bsky.social
Publishers and brands are shifting from SEO to AIO. Many SEOs unknowingly block their sites from AI search by restricting CCBot in robots.txt. As Search 2.0 transforms discovery, ensuring content can train AI models becomes as crucial as traditional SEO.

commoncrawl.org/blog/ai-opti...
Common Crawl - Blog - AI Optimization Is Here: Are You Ready for Search 2.0?
Publishers and brands are shifting from SEO to AIO. Many SEOs unknowingly block their sites from AI search by restricting CCBot in robots.txt. As Search 2.0 transforms discovery, ensuring content can ...
commoncrawl.org
commoncrawl.bsky.social
"MOIC will also partner with Common Crawl, one of the largest free and open repositories of web crawled data. MOIC will fund work at Common Crawl, leveraging native speakers to annotate and seed European language data in the publicly available Common Crawl data set."
Unlocking data to advance European commerce and culture - Microsoft On the Issues
Microsoft launches 2 initiatives to open Europe’s languages and culture, building on AI, cloud, and digital sovereignty commitments.
blogs.microsoft.com