Tim Allison
banner
tallison314159.bsky.social
Tim Allison
@tallison314159.bsky.social
Files, search, crawling, security. #ApacheTika among others...
On #ApacheTika, we're migrating all configuration to json for 4.x. If you use tika-server, please join the conversation on runtime configuration:

lists.apache.org/thread/jlt8j...
lists.apache.org
December 10, 2025 at 10:26 PM
In 4 hours (noon EST), I'm hosting a demo with office hours for #ApacheTika in belated celebration of World Digital Preservation Day #wdpd2025!

www.meetup.com/apache-tika-...

Please dm me for the meeting info.
Apache Tika -- What's New/Office Hours, Thu, Nov 13, 2025, 12:00 PM | Meetup
This will be an expansion of my presentation at the Digital Preservation Bake Off (Tools Demonstration) #iPres2025 and a late entry to celebrate World Digital Preservation
www.meetup.com
November 13, 2025 at 1:03 PM
Reposted by Tim Allison
Make your 2025 words-of-the-year nominations for the only vote that matters! bit.ly/2025WOTYNOMS
November 6, 2025 at 2:20 PM
Reposted by Tim Allison
New: Google says it has discovered at least 5 malware families that use AI to rewrite their code and generate new capabilities on the fly, suggesting AI-powered malware is finally starting to take off. cloud.google.com/blog/topics/...

Report also has interesting stories about state actors' AI use.
November 5, 2025 at 3:46 PM
If you're attending #iPres2025, make sure to check out @petervwyatt.bsky.social 's tutorial on Monday: "A forensic spotlight on PDF/A"!

twelve.eventsair.com/QuickEventWe...
iPRES 2025 - TUTORIAL 3: A forensic spotlight on PDF/A
twelve.eventsair.com
October 30, 2025 at 6:42 PM
Reposted by Tim Allison
Is AI fueling the old 'Dead Internet' conspiracy theory?
Yes! AI is building a fake internet just for you.

#ai #psychology #cybersecurity #society #internet

www.toxsec.com/p/ai-is-buil...
The Dead Internet - AI is Building a Fake Internet Just for You
How Generative AI is Fueling the "Dead Internet Theory," Creating an Authenticity Crisis, and Why AI Detection Can't Save Us.
www.toxsec.com
October 28, 2025 at 1:32 PM
In belated celebration of World Digital Preservation Day, I'm throwing a "What's new with Apache Tika/Office hours" meetup: November 13, noon EST.

Everyone interested in files is welcome to join!

#ApacheTika #wdpd2025 #digipres #fileForensics #reverseEngineering

www.meetup.com/apache-tika-...
Apache Tika -- What's New/Office Hours, Thu, Nov 13, 2025, 12:00 PM | Meetup
This will be an expansion of my presentation at the Digital Preservation Bake Off (Tools Demonstration) #iPres2025 and a late entry to celebrate World Digital Preservation
www.meetup.com
October 28, 2025 at 1:40 PM
Reposted by Tim Allison
We're officially announcing our speakers DistrictCon Year 1! Check out our incredible lineup: www.districtcon.org/speakers

This also includes our Day 1 & Day 2 Keynotes from Ian Levy and Dan Ridge.

And don't forget, GA tickets go on sale November 16! See you in January! 🪩
October 27, 2025 at 4:41 PM
Reposted by Tim Allison
It's your responsibility - but how do you even get started fixing search? A blog for Search Product Managers and other search leads thesearchjuggler.com/its-your-res...
It's your responsibility - but how do you even start fixing search? - Charlie Hull - The Search Juggler
How to get started fixing search - looking for zero result searches, low click queries and how to prioritise
thesearchjuggler.com
October 22, 2025 at 10:12 AM
So, the news for #ApacheTika and #ipres2025: I implemented fully recursive extraction of raw embedded files from the commandline.

issues.apache.org/jira/browse/...
October 22, 2025 at 10:28 AM
Reposted by Tim Allison
goddamn is there anything Wikipedia editors can’t do www.nytimes.com/2025/10/17/n...
Wikipedia Volunteers Avert Tragedy by Taking Down Gunman at Conference
www.nytimes.com
October 18, 2025 at 4:53 AM
Reposted by Tim Allison
Everyone tests in production. Some people just don’t know it yet
October 17, 2025 at 2:13 PM
Amazing work, as always, @seeinglogic.bsky.social ! #AIxCC
Want to see the vulnerabilities added for AIxCC?

Interested in seeing how well-known C & Java repositories differ in structure/distribution of code?

Check out the interactive repo visualizer we made for exploring the scale & detail of AIxCC challenges: archive.aicyberchallenge.com/repoviz/
October 12, 2025 at 11:47 AM
And, y, I'm late to the game, but I'm really excited for this course, @softwaredoug.bsky.social !
October 8, 2025 at 2:43 PM
Reposted by Tim Allison
F3: The Open-Source Data File Format for the Future

Packaging WASM code to read an evolving file format with the data. Interesting approach and a good idea to test the sandbox abilities of the execution engine. Also mentions of a lot of alternatives to parquet/ORC.
db.cs.cmu.edu
October 8, 2025 at 12:56 PM
Reposted by Tim Allison
A biological 0-day? Threat-screening tools may miss AI-designed proteins. arstechnica.com/science/2025...
A biological 0-day? Threat-screening tools may miss AI-designed proteins.
Ordering DNA for AI-designed toxins doesn’t always raise red flags.
arstechnica.com
October 4, 2025 at 12:13 PM
Reposted by Tim Allison
Anyone in #bugbounty looking to connect?
October 2, 2025 at 7:44 PM
Reposted by Tim Allison
New 7-8B OCR model release from AliBaba. Integrated structures data approach looks promising for specialized use cases with complex visual inputs. huggingface.co/Logics-MLLM/...
September 29, 2025 at 12:01 PM
Reposted by Tim Allison
Tomorrow I'll be talking about vector retrieval, continuing Cheat at Search Essentials. Full details on my blog article

softwaredoug.com/blog/2025/07...
Free course: Cheat at Search Essentials
A free introductory search course for anyone who wants better search without all the hard work
softwaredoug.com
September 25, 2025 at 2:56 PM
Reposted by Tim Allison
📣This #WebArchiveWednesday, plan your proposal for #iipcWAC26, “Sustainable #WebArchiving,” at KBR, Royal Library of Belgium! netpreserve.org/ga2026/CfP

🗓️ Deadline for proposals: OCT 15

#webarchives #DigitalPreservation #DigitalHumanities
September 24, 2025 at 6:48 PM
Reposted by Tim Allison
This Wednesday I'll be discussing how to Cheat at Query Understanding using LLMs with Jason Liu. If you want a taste of "Cheat at Search with LLMs", please come hang out!

maven.com/p/eebe98
Cheating at Query Understanding with LLMs
LLMs transformed query understanding from months-long NLP projects into simple prompting tasks. Students learn practical skills for modern search, RAG, and e-commerce systems. This positions you for h...
maven.com
September 21, 2025 at 3:05 PM
Reposted by Tim Allison
The annual award ceremony features miniature operas, scientific demos, and 24/7 lectures. www.wired.com/story/say-he...
Say Hello to the 2025 Ig Nobel Prize Winners
The annual award ceremony features miniature operas, scientific demos, and 24/7 lectures.
www.wired.com
September 20, 2025 at 10:06 AM