Common Crawl Foundation
@commoncrawl.bsky.social
310 followers 52 following 63 posts
Common Crawl is a non-profit foundation dedicated to the Open Web.
Posts Media Videos Starter Packs
Reposted by Common Crawl Foundation
wmdqs.bsky.social
If you were able to join us, let us know about your experience: docs.google.com/forms/d/e/1F...
Reposted by Common Crawl Foundation
wmdqs.bsky.social
Thank you everyone for coming to WMDQS (pronounced "whim ducks")!
Reposted by Common Crawl Foundation
Reposted by Common Crawl Foundation
wmdqs.bsky.social
WMDQS is underway! Come join us in Room 520A at @colmweb.org! #COLM2025
Reposted by Common Crawl Foundation
juliakreutzer.bsky.social
Looking forward to tomorrow's #COLM2025 workshop on multilingual data quality! 🤩
wmdqs.bsky.social
In collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
Reposted by Common Crawl Foundation
wmdqs.bsky.social
In collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
commoncrawl.bsky.social
Common Crawl’s Web Languages initiative has had many contributions since its introduction. We’re calling for native speakers of certain languages to review language contributions, to ensure that links we’re adding to our seed crawl are of good quality.

commoncrawl.org/blog/web-lan...
Common Crawl - Blog - Web Languages Needing Review by Native Speakers
Common Crawl’s Web Languages initiative has had many contributions since its introduction. We’re calling for native speakers of certain languages to review language contributions, to ensure that links...
commoncrawl.org
commoncrawl.bsky.social
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August, and September 2025. The host-level graph consists of 628.7 million nodes and 6.9 billion edges, and the domain-level graph consists of 184.6 million nodes and 5.4 billion edges.
Common Crawl - Blog - Host- and Domain-Level Web Graphs July, August, and September 2025
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August, and September 2025. The host-level graph consists of 628.7 million nodes and 6.9...
commoncrawl.org
commoncrawl.bsky.social
The era of traditional search engine optimization is rapidly evolving into "AIO" (AI optimization), where businesses must ensure their content exists in AI training datasets to remain discoverable as users increasingly turn to AI assistants for answers.

commoncrawl.org/blog/from-se...
Common Crawl - Blog - From SEO to AIO: Why Your Content Needs to Exist in AI Training Data
The era of traditional search engine optimization is rapidly evolving into
commoncrawl.org
commoncrawl.bsky.social
Publishers and brands are shifting from SEO to AIO. Many SEOs unknowingly block their sites from AI search by restricting CCBot in robots.txt. As Search 2.0 transforms discovery, ensuring content can train AI models becomes as crucial as traditional SEO.

commoncrawl.org/blog/ai-opti...
Common Crawl - Blog - AI Optimization Is Here: Are You Ready for Search 2.0?
Publishers and brands are shifting from SEO to AIO. Many SEOs unknowingly block their sites from AI search by restricting CCBot in robots.txt. As Search 2.0 transforms discovery, ensuring content can ...
commoncrawl.org