Lightnews — Scholar-powered news

Cara DeCarlo-Wirscham

@caradecarlo.bsky.social

Webcrawler too

November 11, 2025 at 5:50 PM

There!Are!Four!Lights!

@heliophobe.bsky.social

I see your mapquest and raise you a webcrawler

November 11, 2025 at 1:24 AM

Mineral Cup & Ore Cup

@mineralcup.bsky.social

The captcha only activates if it suspects a webcrawler-bot; if you have any cookie crumbs, it won’t pop. That includes just loitering around the site for a little bit.

But weeeeiiiird that it didn’t just display? SquareSpace works in mysterious ways, and breaks in even more mysterious ones. Sorry!

November 8, 2025 at 5:11 AM

Rustin Richtarik 🖖

@dieseldog97.bsky.social

I'd tell my high school self to invest in Google, Netflix and Amazon.

Sure, high school self has no idea what those things are and I'm too busy using Webcrawler, but it'll be worth it.

Jonathan Edward Durham @thisone0verhere.bsky.social · 4d

I read this and immediately thought “well that would actually be awesome because then Marty can go back to the 90s where he probably had a really awesome time as a teenager” and then I was like oh wait his PARENTS would have been teenagers in the 90s and then I just sat here for like 45 minutes

November 8, 2025 at 3:06 AM

The Aarogerigegege

@aao.bsky.social

Every time i search for everything now I have to put absolutely every single word or phrase in separate quotation marks to get anything remotely relevant. This takes me back 30 years to the pre-google search engines like Altavista or Webcrawler where you had to use booleans and whatnot

November 7, 2025 at 5:19 PM

Tobias Frauscher

@softwarerat.com

Erinnert mich an die aggressiven webcrawler die dafür verantwortlich sind, das Leute sich Sachen wie anubis(idk ob ich den Namen richtig im Kopf hab) Programieren/anschaffen müssen

November 4, 2025 at 10:58 AM

ExoVonDus

@exovondus.bsky.social

- acquieres, for what I know, don't like to gamble with what is legal or not. They'll use keywords, detection made by webcrawler, and if there is a doubt, they'll kick content. That mostly what happened to Itchio from what I understood. Ban of legal content due to fear of potential penalties.

November 3, 2025 at 7:54 PM

M15c4t0n1c's Meltdown Monitor

@m15c4t0n1c.bsky.social

Wer über Brienz informiert bleiben möchte, findet hier einen Webcrawler mit den neuesten Meldungen:

rss.app/embed/v1/lis...

October 31, 2025 at 1:09 AM

Ruth Adar

@rruth.bsky.social

For those of us who remember Webcrawler, there's a chuckle.

October 29, 2025 at 2:55 AM

Niko Stratis

@nikostratis.com

its an indictment on how financially challenging it is to run an outlet in the modern era that some places have to rely on design that feels like I'm browsing on WebCrawler in the computer lab at school in order to subsidize the good work they do.

October 27, 2025 at 2:01 PM

Mishell Baker thankfully

@mishellbaker.bsky.social

I've found I have to put a little more effort into getting the right results with Mojeek, but I'm trying it out because it's the only one that's not just piggybacking off Google or Microsoft and has its own webcrawler.

October 24, 2025 at 10:44 PM

Stoats

@st0ats.bsky.social

Mojeek is also an option for a search engine with it’s own webcrawler. Can’t speak to it’s use though, I gave up on it pretty quick. They made a visualization some years ago: www.searchenginemap.com

Ben Hart, Gender Archivist 🏳️‍⚧️ @pavlovforgoths.com · 22d

here's how you remove the AI from the latest Firefox update

October 21, 2025 at 6:25 PM

Acorn Baden

@acornbaden.bsky.social

CCC submissions are the smallest kinda data they get.They mostly use a webcrawler to get news articles or they gather social media posts. We've got one of the only crowd sourced crowd counts for yesterday. What did you do to try to get an effective or sufficient or non sus crowd count for yesterday?

October 19, 2025 at 5:07 PM

Jens

@jdede.norden.social.ap.brid.gy

Warum hat der höfliche Hinweis der robots.txt zur Datenerfassung auf der eigenen Webseite durch die allgegenwärtigen Webcrawler ausgedient?

Wie verwandeln die KI-Akteure das Web von einem kollaborativen Raum in eine reine Extraktionszone für Informationen? […]

[Original post on norden.social]

Ein Symbolbild einer robots.txt mit dem folgenden fiktiven Inhalt:

User-agent: Crawler1
Disallow: /

User-agent: Crawler2
Disallow: /

User-agent: *
Disallow: /default.html
Disallow: /tmp
Disallow: /private/index.html

October 16, 2025 at 8:45 PM

The Smartest Floridian

@stryc9fuego.bsky.social

Makes me wonder if it could be found with a different, older, less money-focused browser, like Webcrawler or something...

October 13, 2025 at 3:09 PM

Lord David of Bell-End

@mungobrown.bsky.social

First time I ever used the internet was in 1994 at uni. They had one room on campus with connected PCs.

I remember Ask Jeeves and Webcrawler, that’s about it.

October 12, 2025 at 1:29 PM

Mike Smith

@doodlebags.bsky.social

The rendered webcrawler swinging by #spiderman

October 2, 2025 at 10:11 PM

Michael Bench-Capon

@mikebenchcapon.bsky.social

Sometimes I imagine a search engine which has a vetting process for its webcrawler instead of just including everything it can lay its hands on

October 2, 2025 at 2:31 PM

The Smartest Floridian

@stryc9fuego.bsky.social

Tech company: "we're introducing [stupid bullshit] into your favorite app!"
Me: open webcrawler: >how to permanently disable [stupid bullshit]...

October 1, 2025 at 6:41 PM

Ed Wilson

@eddwilson.bsky.social

The first web search I remember doing (on my Lynx text browser, presumably using Webcrawler) was "Thomas Aquinas". I think the article came from Michigan university.

David Thompson @davesthompson.bsky.social · Sep 28

Fascinating @theguardian.com piece here from Tim Berners-Lee @timbl.bsky.social about where the Internet goes from here...
www.theguardian.com/technology/2...

Why I gave the world wide web away for free | Tim Berners-Lee

My vision was based on sharing, not exploitation – and here’s why it’s still worth fighting for

www.theguardian.com

September 28, 2025 at 4:21 PM

Where Chuck once stood, only I will remain.

@chux0r.infosec.exchange.ap.brid.gy

Cannot believe that search bias is still the fucking problem.

I mean, in the early aughts it was "algorithm bias" and now it's "intentional bigcorp, bigtech, and political bias". Having seen both, I'd happily go back to sifting through random Webcrawler or AltaVista results.

They were flawed […]

Original post on infosec.exchange

infosec.exchange

September 23, 2025 at 4:25 PM

Admiral of Aotearoa™️

@muskeyb8927.bsky.social

They think current AI is bordering on Skynet, when it’s really just a webcrawler and database scraper that’s great at plagiarism and getting math wrong.

September 18, 2025 at 11:44 PM