thesiegfried.bsky.social
@thesiegfried.bsky.social
I need to refine my workflow so that I can have agents building the codebase while I spend my on project vision and team standards. I also need to identifying and ensure the creation of artifacts needed to enable those.... AKA communication, I suppose.
May 26, 2025 at 1:25 PM
Okay, and that wraps up my experiment. Worked pretty well. Hard to get the grid density high enough to get the precision I was looking for, but that might be resolved by enlarging the image. Overall though, I'm satisfied at this foray into leveraging images.
May 18, 2025 at 11:21 PM
Making progress. Instead of asking for a top-left/bottom-right style implementation I'm just asking for closest. I'm pairing that with a field in the output that describes the specific thing it found before listing the coordinate and getting much better results.
May 18, 2025 at 6:52 PM
Not having luck with coordinates. I think the llm is trying to 'think' through it and confusing itself, but I'm not sure. The further from top-left it gets the crazier it's coordinates are. I'm revamping it and asking it to just list all grid labels that overlay the identified object.
May 18, 2025 at 4:24 PM
Still failing. I had thought this would be much more accurate but still having some trouble. Interesting. Trying with o3 now and considering changing the directions so it just notes any cells that have doors instead of requesting coordinates.
May 18, 2025 at 3:55 PM
Hrm. It worked okay, but the llm loses track of the grid as it gets further from the labels. Trying again with grid labels embedded in cell.
May 18, 2025 at 3:28 PM
So this doesn't make them useless at all, but the bounding boxes would be a big help. I'm trying a manual overlay now to see how that goes.
May 18, 2025 at 3:00 PM
Openai's 4o and o3 do a wonderful job reading the blueprint. Even heavily annotated. Unfortunately they are VERY BAD at returning bounding box coordinates, which makes sense. That's not really their deal. I've tried all the tricks I found line with pre-padding, passing the size as context. No dice.
May 18, 2025 at 2:58 PM
First of all, Yolov8 has a pretty good open source model for blueprint recognition. It does seem a bit particular about the size of the image and the image has to be pretty free of annotations. This makes sense, but seems to mean it needs a lot more training to handle 'IRL' blueprints.
May 18, 2025 at 2:57 PM