Lightnews — Scholar-powered news

thesiegfried.bsky.social

@thesiegfried.bsky.social

I need to refine my workflow so that I can have agents building the codebase while I spend my on project vision and team standards. I also need to identifying and ensure the creation of artifacts needed to enable those.... AKA communication, I suppose.

May 26, 2025 at 1:25 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Okay, and that wraps up my experiment. Worked pretty well. Hard to get the grid density high enough to get the precision I was looking for, but that might be resolved by enlarging the image. Overall though, I'm satisfied at this foray into leveraging images.

May 18, 2025 at 11:21 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Making progress. Instead of asking for a top-left/bottom-right style implementation I'm just asking for closest. I'm pairing that with a field in the output that describes the specific thing it found before listing the coordinate and getting much better results.

May 18, 2025 at 6:52 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Moving on to scaffolding coordinates... ar5iv.labs.arxiv.org/html/2402.12...

Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models

State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional capabilities in vision-language tasks. Despite their advanced functionalities, the performances of LMMs are still limited...

ar5iv.labs.arxiv.org

May 18, 2025 at 4:56 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Not having luck with coordinates. I think the llm is trying to 'think' through it and confusing itself, but I'm not sure. The further from top-left it gets the crazier it's coordinates are. I'm revamping it and asking it to just list all grid labels that overlay the identified object.

May 18, 2025 at 4:24 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Still failing. I had thought this would be much more accurate but still having some trouble. Interesting. Trying with o3 now and considering changing the directions so it just notes any cells that have doors instead of requesting coordinates.

May 18, 2025 at 3:55 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Hrm. It worked okay, but the llm loses track of the grid as it gets further from the labels. Trying again with grid labels embedded in cell.

May 18, 2025 at 3:28 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

So this doesn't make them useless at all, but the bounding boxes would be a big help. I'm trying a manual overlay now to see how that goes.

May 18, 2025 at 3:00 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

Openai's 4o and o3 do a wonderful job reading the blueprint. Even heavily annotated. Unfortunately they are VERY BAD at returning bounding box coordinates, which makes sense. That's not really their deal. I've tried all the tricks I found line with pre-padding, passing the size as context. No dice.

May 18, 2025 at 2:58 PM

thesiegfried.bsky.social

@thesiegfried.bsky.social

First of all, Yolov8 has a pretty good open source model for blueprint recognition. It does seem a bit particular about the size of the image and the image has to be pretty free of annotations. This makes sense, but seems to mean it needs a lot more training to handle 'IRL' blueprints.

May 18, 2025 at 2:57 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news