Jake Thomas
banner
jakthom.bsky.social
Jake Thomas
@jakthom.bsky.social
Messing around with boats and databases.

Sometimes data systems, sometimes security, sometimes ai/ml, sometimes a blend of it all.
@jackhcable.bsky.social discussing product security in the AI era 🔥
April 27, 2025 at 7:24 PM
@ethanrosenthal.com speaking 🔥
April 24, 2025 at 12:17 AM
NYC Yellow Taxi Trips 2024 is live in the hive 🐝 🍯

(includes all months currently available from nyc.gov)
December 1, 2024 at 10:20 PM
Foursquare places data is live in the hive 🐝 🍯

@hachej.bsky.social @seifert.blue
November 30, 2024 at 12:04 AM
Gothic arches are so stinkin' pretty
November 22, 2024 at 5:00 PM
@bsky.app is currently running at:

~28k follows per minute (p99)
~37k likes per minute (p99)
~5k reposts per minute (p99)
~350 signups per minute (p99)

#databs from a @prometheus.io exporter, running in a @github.com codespace:
November 21, 2024 at 3:58 PM
abuse.ch malware feed is now available in the hive:

attach 'https://hive.buz.dev/abuse_ch/catalog' as abuse_ch;

select * from abuse_ch.malware limit 10;
November 19, 2024 at 1:40 PM
Do the same with 2 lines of SQL:
November 19, 2024 at 5:41 AM
ipinfo datasets are now available (for free!) in hive.buz.dev!

From a @duckdb.org , running:

attach 'https://hive.buz.dev/ipinfo/catalog' as ipinfo;

will load the following tables:

- asn
- country
- country_asn

ip-enrich to your ❤️'s content...
November 18, 2024 at 8:32 PM
These represent the same data at all times now.

If you want the alternative tables (checkpoint tbl and soon-to-be-pre-modeled views) use the remote catalog 😉
November 18, 2024 at 7:23 PM
Alright folks, new changes are live:

1. Catalog (along w/ underlying data) is now updated every ~5min. System purges every 500k records if volume goes 📈 this time goes 📉

2. There's a secondary table in the catalog (bluesky.checkpoint) which represents the most current time_us
November 18, 2024 at 7:19 PM
November 18, 2024 at 2:09 PM
lol well that escalated quickly....
November 17, 2024 at 4:58 PM
TIL what happens when you smash cdn-powered object storage and wee little databases together 🤯🤯🤯🤯🤯🤯

thanks @cloudflare.social
November 17, 2024 at 4:55 PM
I like this. I like this a lot.

tl;dr:

full @bsky.app jetstream feed

landing in @cloudflare.social R2 (and available to you!)

accessible using two lines of @duckdb.org sql
November 17, 2024 at 8:24 AM
it's definitely neat data
November 15, 2024 at 5:45 AM
but also streaming is kinda unnecessary...
November 15, 2024 at 5:20 AM
Forget data warehousing! Let's instead....

run a @prometheusio.bsky.social metrics exporter

powered by @duckdb.org

in a @github.com codespace

backed by @bsky.app api data

to calculate @duckdb.org @bsky.app profile stats on the fly (in 250ms)

Using sql

github.com/jakthom/herc...
November 4, 2024 at 10:23 PM