Tim Allison
banner
tallison314159.bsky.social
Tim Allison
@tallison314159.bsky.social
81 followers 140 following 33 posts
Files, search, crawling, security. #ApacheTika among others...
Posts Media Videos Starter Packs
Reposted by Tim Allison
We're officially announcing our speakers DistrictCon Year 1! Check out our incredible lineup: www.districtcon.org/speakers

This also includes our Day 1 & Day 2 Keynotes from Ian Levy and Dan Ridge.

And don't forget, GA tickets go on sale November 16! See you in January! 🪩
...even if the PDF is embedded in an email that was then added to a zip file.
We treat PDF's incremental updates as a special type of attachment. This means that with just `java -jar tika-app.jar -Z input-file.zip output_dir`

You'll be able to recover the earlier versions of a PDF if saved with incremental updates...
input.zip
So, the news for #ApacheTika and #ipres2025: I implemented fully recursive extraction of raw embedded files from the commandline.

issues.apache.org/jira/browse/...
Reposted by Tim Allison
Everyone tests in production. Some people just don’t know it yet
Amazing work, as always, @seeinglogic.bsky.social ! #AIxCC
Want to see the vulnerabilities added for AIxCC?

Interested in seeing how well-known C & Java repositories differ in structure/distribution of code?

Check out the interactive repo visualizer we made for exploring the scale & detail of AIxCC challenges: archive.aicyberchallenge.com/repoviz/
And, y, I'm late to the game, but I'm really excited for this course, @softwaredoug.bsky.social !
Reposted by Tim Allison
F3: The Open-Source Data File Format for the Future

Packaging WASM code to read an evolving file format with the data. Interesting approach and a good idea to test the sandbox abilities of the execution engine. Also mentions of a lot of alternatives to parquet/ORC.
db.cs.cmu.edu
We have PDF. What else do we need? 🤣🤣🤣😅
Reposted by Tim Allison
Anyone in #bugbounty looking to connect?
Reposted by Tim Allison
New 7-8B OCR model release from AliBaba. Integrated structures data approach looks promising for specialized use cases with complex visual inputs. huggingface.co/Logics-MLLM/...
Reposted by Tim Allison
📣This #WebArchiveWednesday, plan your proposal for #iipcWAC26, “Sustainable #WebArchiving,” at KBR, Royal Library of Belgium! netpreserve.org/ga2026/CfP

🗓️ Deadline for proposals: OCT 15

#webarchives #DigitalPreservation #DigitalHumanities
Reposted by Tim Allison
Great paper on finding and exploiting parser differentials between ZIP parsers to bypass signature validation, malware detection, or VSCode extension ID validation.

www.usenix.org/conference/u...