Lightnews — Scholar-powered news

simon

@trillymopena.bsky.social

A whole lot of managering going on

October 28, 2025 at 5:05 AM

TrustedReveler

@trustedreveler.bsky.social

I think Phil is not good at the managering

February 23, 2025 at 10:20 PM

girl machine (inanimate mannequin) 🔞

@braixentoy.bsky.social

tasque managering here too because why not

July 28, 2025 at 6:36 AM

girl machine (inanimate mannequin) 🔞

@braixentoy.bsky.social

straight up managering that thang

August 3, 2025 at 12:10 PM

Luke m

@hoopsmccann.bsky.social

Did Neville and Rob Gale go to the same school of managering where they teach you to constantly be yelling instructions at your team the entire match, seems like it would really annoying as a player

August 17, 2025 at 3:27 AM

Gene Locicero (he/they)

@code-kobold.bsky.social

I was in Providence today for my haircut (bc I cannot find a barber in Boston I like) but also at Heartleaf for Indie Bookstore day! I got to say hi to Penny and got a whole bunch of new books

Penny the Manager (bookstore cat) of Heartleaf Books in Providence, RI. She is a grey cat with a purple collar sitting on a shelf under a table. Her back is to us, she was hiding away from people to take her break from managering, it's very hard work

My book haul stacked up in a stack with larger books at the bottom and smaller ones on top. From top to bottom: On Tyranny by Timothy Snyder, Log Off by Catherine Cross, Protect Bookstores by Denny Caine, Feminist Antifascism by Ewan Majewska, The Color of Law by Richard Rothstein, Disability Intimacy by Alice Wong, Braiding Sweet grass by Robin Wall Kimmerer, Let This Radicalize You by Kelly Hayes and Mariame Kaba, Scarred by L. Ayu Saraswati, Monumental Lies by Robert Bevan, and Careless People by Sarah Wynn-Williams.
It was a tall stack it was pretty close to my chin but not quite

Another picture of the haul form the bookstore. I have 2 "I met Penny at Heartleaf Books" bookmarks with Penny pictured. A cute sticker of a cat with a book. A Rhode Island license plate sticker with BOOKS on it. And 3 I ❤️ my indie bookstore stickers for indie bookstore day

April 26, 2025 at 10:50 PM

🇵🇷Javi No Takes🇵🇷

@melhyphen.bsky.social

That’s what my FFXIV hours look like if you add them up, but that was across like 6 years, and it’s an MMO.

This is a lot of Football Managering.

December 11, 2024 at 12:28 AM

bentasker.co.uk Feed

@mastodon.bentasker.co.uk

New #blog: Categorising My Daily Todo List with Deepseek-r1
Author: Ben Tasker

https://www.bentasker.co.uk/posts/blog/software-development/ai-todo-list-categorisation.html?utm_source=rss

#ai #deepseek #genai #llm #managering #ollama #softwaredevelopment #victoriametrics

Categorising My Daily Todo List with Deepseek-r1

I've made no _particular_ secret of the fact that I am **not** a fan of large language models: they're overhyped, under deliver and were built off the back of a ripping off of others work at a scale that (for anyone else) would have resulted in jail and/or bankruptcy (even before the more incontrovertible acts of copyright infringement are considered). But, I also work in a industry which is going through a phase of significant AI use (even if a lot of companies do come to regret being over-eager). As a generalist, still decades from retirement, I _have_ to keep my hand in. If nothing else, doing so means that I'm better positioned to explain _why_ and _when_ they seem to be ill-suited. Moving on, I've finally got around to reading the book that I mentioned while procrastinating recently. In one section, the author talks about reviewing how their time is spent: > Every quarter, I spend a few hours categorizing my calendar from the past three months to figure out how I’ve invested my time. This is useful for me to reflect on the major projects I’ve done, and also to get a sense of my general allocation of time. This struck me as a _particularly_ sensible thing to start doing. However, I needed a slightly different approach: although it plays a part, my calendar is not the main organiser of my day because I work from a daily todo list in Obsidian. Although small/quick tasks might sometimes get omitted, they're still far more likely to appear in my TODO list than in my calendar. The trade off, of course, was that I would only able to collect the numbers of task completed rather than a measure of time spent. Pivoting back to AI, I decided that categorising TODO list items **should** be a piece of cake for any reasoning model. This post discusses how I developed metrics about my todo list along with where (and why) I found the LLM unreliable. * * * ### Requirements I had a simple (but strict) set of requirements 1. **The LLM must be local** : I'm a manager and my todo list sometimes contains references to team members. I'm not comfortable sending _any_ of that to one of the hosted providers1 2. **The script should be cron-able** : the whole point is that it should just run without me needing to do anything 3. **It must use my existing format** : Shuffling the TODO list structure around is OK, but changing the way that I write TODOs is likely to be unrealistic - it's supposed to be somewhere to quickly note a need to do something, rather than a document which requires a linter 4. **Tasks should be sorted into a finite set of categories** : LLMs are quite inconsistent in their output, so it'd be unwise to rely on the LLM to also define the categories: I'd end up with tasks spread over "discussions", "email" and "correspondence" 5. **The output should be simple counts** : The aim here wasn't to create a second copy of my TODO list, but instead to express it in terms of simple metrics * * * ### TODO List Structure My list uses a simple markdown template: # Daily Notes: {{date}} ### TODO - [ ] Backlog item ### Today - [ ] Incomplete task - [x] Complete task - [ ] Morning Sync (09:00) **Calls** **Morning Routine** - [ ] Slack - [ ] Email .. etc .. **Follow Up Tomorrow** The `TODO` section contains my backlog which is rolled into the current days notes by the Daily todos rollover plugin (along with any incomplete items under `Today`). As I work through my morning routine, I populate the `Today` section with calls broken out under their own subheading. * * * ### Processing Because they're stored as markdown, the lists are easy to process and the headings make it easy to see which section an item falls under. I've used a similar format in previous roles and previously wrote a simple burndown stats script which built metrics by walking through line by line: for line in fh.readlines(): # Process the line if "### TODO" in line: in_section = "main" print("Started") continue if "**Calls**" in line: in_section = "calls" continue if "**Morning Routine**" in line: in_section = "morning-routine" continue elif in_section and line.startswith('- [ ]'): counters["items"] += 1 elif in_section and line.startswith('- [x]'): counters["items"] += 1 counters["complete"] += 1 For this project, I added a function call to the conditional block for completed items: category = categorise_item(line, in_section, ai, categories) * * * ### Adding AI I span up an ollama docker container and loaded Deepseek-r1 into it ollama run deepseek-r1:7b Use of the 7 billion token model was a compromise: the solution had to run on a laptop (I actually started with the much smaller `phi3:mini-4k` but it _really_ struggled to give accurate responses). I have a python wrapper that I previously wrote to provide a command-line REPL for interacting with Ollama3, so I imported that. Although there's a bit of class initialisation to do, it makes sending a message simple: response = ai.chat_with_model("Hello, are you there?") Next, I needed to write a prompt to tell the LLM what to do with the messages that would be sent to it: You are a task categorisation specialist and are being employed in order to identify what categories of work each of the user's completed todo list items falls into. The user will provide you with a file in markdown format which consists of a list of checkboxes with todo list items next to them. Using the rules below, you are to categorise any task items that the user passes you. **File Structure** * An item starting with - [ ] indicates a task that has not yet been completed, ignore these. * An item starting with - [x] indicates that the task was completed, you should categorise the item * Items that follow the text Calls are calls - these need further classification (see the calls selection below) * Items under Morning Routine are routine items, these should be classified as "daily admin" You may encounter a heading with the text Defunct. If so, ignore everything that follows it. **Calls** There are two types of call classificiations: call-meetings and call-11 If the todo item includes the text 1:1, it's a call-11, otherwise it should be classified as call-meeting. **Categories** ALL completed items must be categorised but must ONLY be categorised into one of the following categories * call-meeting * call-11 * daily admin * incident response * PR work * documentation and planning * other To test, I passed it a single item2 - [x] Swear at AI The response and the reasoning seemed in line with what I wanted > Okay, let's tackle this problem step by step. The user has provided a task list with several items marked as completed (indicated by - [x]). My job is to categorize each of these items into one of the specified categories based on the given rules. > > First, I need to go through each item and determine its category. Let's look at each item: > > 1. "- [x] Swear at AI" This task clearly states a negative action but doesn't provide any specific context like calls or routine tasks. Since it's under "Other" as per the categories list, this should be classified as "other". > > > I don't see any other items provided in the example besides this one. Each item must be categorized without inventing new ones that aren't present in the user's message. > > So, the only category needed here is "other". > > { "category": "other" } I threw it a few more items and all came back correctly. * * * #### Problems However, issues soon began to creep in. Despite having been passed a list of acceptable categories, the model invented a new one: security. Expanding my prompt to include explicit rules (you MUST NOT return a category not listed above) helped a little, but not much. To improve the chances of getting a successful response, I adjusted my script to add the ability to trigger a reprompt: def get_ai_category(item, ai, section, ffs=False): ''' Ask the AI to categorise We often need to strip the response though, because many reasoning models include their reasoning too ''' msg = [] if ffs: msg.append("Reminder: you MUST stick to the categories provided in the prompt. You're a very naughty LLM, your previous response made changes - please try again but stick specifically to the rules in the prompt") msg.append(item) response = ai.chat_with_model("\n".join(msg)).split("\n") However, it didn't help much: the LLM often replied with the same thing the second time. Taking a closer look at the responses revealed something even more frustrating though: sometimes it selected the correct category but then fucked up the spelling. For example, instead of `documentation and planning` it returned the following * document and planning * documentaton and planning * documenta... * documentations and planning Sometimes the response would contain a valid category name but prefixed by some other text *category*: documentation and planning *final decision*: calls-11 I messed aroumd with making it respond with JSON, but nothing _really_ helped. In the end, I created a function called `unfuck_ai_response` and started adding string processing to account for its common mistakes. As a side note, the AI also **staunchly** refused to allow me to use non-words as category names. I defined a category with the name `managering` but it insisted on returning `managerial`. That behaviour makes sense, because _actual words_ are obviously going to have more weight, but it doesn't help with the feeling that AI is a fun sponge. * * * #### Taking Load Off the AI Even with the new function, there were still far too many items that weren't being categorised correctly. Quite a few of the regular items in my todo list can be categorised by looking for specific strings, so I adjusted the script to start handling those simpler items itself: if in_section == "calls": if "1:1" in item: return "call-11" elif "adhoc" in item: return "call-unplanned" else: return "call-meeting" I adjusted the script so that all todo items categorised at `other` would be written to a file before triggering a full run across all of my notes. Even if it didn't make sense to adjust the script for every `other` item, the list allowed me to spot patterns and build the ruleset out. * * * ### Collecting Metrics With the script now capable of emitting semi-reliable metrics, I needed to collect the data somewhere. For this project, I decided to give VictoriaMetrics a try: it accepts a variety of input formats, including Influx line protocol, so making the script generate LP gave me the ability to try VictoriaMetrics whilst retaining the ability to pivot back to InfluxDB if needed. * * * #### Standing VictoriaMetrics up Using the VictoriaMetrics docs as a guide, I was able to come up with a `docker-compose` snippet to stand a container up: victoria_metrics: restart: always image: victoriametrics/victoria-metrics:v1.122.0 container_name: victoria_metrics command: - "-selfScrapeInterval=5s" - "-storageDataPath=victoria-metrics-data" - "-retentionPeriod=100y" - "-disablePerDayIndex" ports: - 8428:8428 volumes: - /home/ben/docker_files/victoria/data:/victoria-metrics-data Writing line protocol in was easy: curl \ -d 'metric,foo=bar field1=3.4' \ http://127.0.0.1:8428/api/v2/write?db=bar So I updated my script: print("Submitting to VictoriaMetrics") res = requests.post( f"{vm_url}/api/v2/write?db=workload_stats", data = '\n'.join(lp_buffer) ) print(res.status_code) print(res.text) Note: VictoriaMetrics doesn't have the concept of separate databases and the name of the DB provided instead gets added as a label. After the next script run, I was able to query data: curl -s 'https://vm.83n.uk/api/v1/series?match[]=todo_list_completed' { "status":"success", "data":[{ "__name__":"todo_list_completed", "db":"workload_stats", "hostname":"flibble" }] } I could also delete that series - something that I used _quite a bit_ whilst continuing to refine the script's output curl -s 'https://vm.83n.uk/api/v1/admin/tsdb/delete_series?match[]=todo_list_completed' * * * #### Visualising I installed the VictoriaMetrics plugin for Grafana and started to build a dashboard Historically, I've written quite a lot of Flux on this blog, but VictoriaMetrics uses MetricsQL. It's a PromQL-alike language, which itself is similar to LogQL, so I didn't experience any real issues writing queries to drive the graphs: todo_list_items_count{db="workload_stats", category!="error", category!="scheduled-for-tomorrow"} The visual query builder is also pretty good. * * * #### Metrics: use of AI The TODO processing script also writes metrics about itself into VictoriaMetrics, so the dashboard includes a section specifically for script run stats. One graph indicates the proportion of TODO items categorised by the script vs categorised by AI: Although I'd been careful to only add general rules (rather than targetting specific items), the ruleset embedded within the script accounted for two thirds of all categorisations. I also wanted to understand how often AI was misbehaving, so I adjusted the script to calculate an AI reliability score: # Calculate a reliability score for the AI # for every retry we had to send the count goes down. if source == "ai": # So the logic here is that we want to generate # a percentage reliability score based on # comparison to the max retry depth. # # We add 1 onto max_retry_depth to account for # the first request (which doesn't get treated # as a retry) # # So, if max_retry_depth is 4, the max becomes 5 # and scores would look like this: # # * we didn't retry at all: 100% # * we retried once: 80% # * we retried 4 times: 20% # * we burned through the retries: 0% # score = (1 - (RETRY_DEPTH / (max_retry_depth + 1))) * 100 For each TODO list run, the script reports mean, min and max reliability scores. It seems that it _generally_ behaves, but when it misbehaves, it does so quite stubbornly: * * * ### Conclusion I now have a script which can _roughly_ categorise my TODO items from each day ready for visualisation in a dashboard to shows where the majority of my efforts are spent. My initial hopes for AI making it quick and easy were quickly, if predictably, dashed: the LLM does a "good enough" job most of the time, but only because substring matching within the script handles ~66% of items. One of the **huge** benefits of having opted for a local model is that I don't have to worry about results suddenly drifting as the result of a provider deploying an update which results in the model responding differently. That consistency has its limit though: the dashboard has looked slightly different every time that I've triggered a full re-run - the LLM is rarely consistent in the categories that it selects. Equally, though, there probably isn't any _reasonable_ way that I could have script logic correctly categorise the 1/3rd that ends up being passed off to the LLM. Although I could either limit the language used or include a category alongside items, the likely result of that added burden is that the list would fail to include so much of my workday. * * * 1. It doesn't _matter_ what their terms say. Data protection by design and by default, if it can be done without sending to OpenAI/Google/Musk then it should be ↩ 2. It's possible that I've skipped a few iterations in this post and was growing fed up at this point ↩ 3. In case you haven't picked up on this: the privacy benefits mean that local AI will always be > remote AI ↩

www.bentasker.co.uk

July 31, 2025 at 7:30 AM

Valery North

@valerynorth.bsky.social

Kicking off my (usually) weekly bout of Football Managering in about 10 minutes or so - would love to have some company as I try to steer a lowly Vanarama League North side to promotion! www.twitch.tv/valerynorth

valerynorth - Twitch

Genderfluid enby leftie/socialist writer/musician/game creator in my mid-40s. Watch a lot of sports, play a lot of video games, play a lot of musical instruments. Love TTRPGs, especially slightly quir...

www.twitch.tv

April 16, 2024 at 6:23 PM

John Young

@johnyounger.bsky.social

Tell me this is good managering then.

May 11, 2025 at 3:29 PM

Ed

@ed3d.net

I work on a team where, when you ask PMs if they've talked to developers about the semantics and behavior of the developer tool they are product managering, they say "yes" because they talked to downstream developers' PMs

fucking maddening

September 10, 2024 at 12:31 AM

Jono

@rattleduser.bsky.social

Into these sister fingering, kopparberg avoiding, blue seat dressing, self harm managering, bin sniffing, paedo harbouring, massive advertise board using, 19 premier league rival letter writing cheats

December 1, 2024 at 12:04 PM

The Jupiterka

@thejupiterka.bsky.social

U nas menadžer bio na godišnjem 3 sedmice i kaže vraća se u srijedu što je sutra.

Lik je laganica jutros u ofisu bio tačno u 8 i gledao nas kako jedno po jedno dolazimo tek poslije 9.

I smijao nam se - sleepy heads, coming in so late, hm?

Managering done right ✅

November 7, 2023 at 10:17 AM

Andrew Dragon LL32 🐲 -- 🟦🟨🟥

@andrewdragonll32.bsky.social

My art journy this year , not much, altho i could have a Miji bingo if soemthing funny comes up hehe

This year got my big commission that i proud of and other piece to even be small

Myselft,i prefer do it with my own hands
SP in editing and drawing and litter in managering

September 19, 2025 at 3:35 PM

abby ❤️‍🔥

@abbyonahill.bsky.social

it’s taking me back to junior high stage managering… she would be a supporting character who was shocked to not get the lead. Aevery show i would be telling her to be quiet backstage and put her props back where they go!!

July 22, 2025 at 1:54 AM

Sir

@buddhaowl.bsky.social

It must be a lamptard thing managering dirty players derby where the same when he was in charge #lufc

February 5, 2025 at 8:27 PM

Boltedfruit

@boltedfruit.bsky.social

The Corroded Coffin guys reunite to go back on tour with them and Eddie is working double time but everyone is happy and a big family and they take care of each other and make a lot of money the end.

Steve is just number one fan/managering off the side like "that's my family!" with the biggest grin

February 12, 2025 at 10:10 PM

katie dzwierzynski

@kdzwierzynski.bsky.social

is cousell a better manager than ross?

yes but also i've spent years mocking counsell for being a crybaby about rain delays, at least when i mocked ross i actually liked him and just didn't like his managering

November 6, 2023 at 11:37 PM

Noraa Cap

@noraacap.bsky.social

shes tasqué managering it

September 3, 2025 at 5:04 PM

Max Legroom

@crabitha.slutmo.de

ugh i'm mister managering so hard and so effectively today it makes me fucking sick

August 28, 2023 at 7:20 PM

Ishkur

@ishkur.bsky.social

Seagull managering.

He did this all the time as a developer, too. He'd show up to the site, inspect things like he knows what he's looking at, make demands as if he's some genius architect, berate the foreman, give one of those "no excuses, get it done" commands, and then leave.

August 7, 2025 at 6:35 PM

Bridget Kromhout

@bridgetkromhout.bsky.social

Making an offer on the way to the airport to go give a talk about hiring: next-level managering by @marisabrandt.😂 #velocityconf

November 14, 2024 at 1:41 PM

🎸Kel Grayson💋

@kelgrayson.bsky.social

LYDIA: *rolls her eyes*
CHRIS: Question is, are YOU a diva? Are you gonna refute every bit of gossip that goes around about your managering style? *teases* Are you going to deny the absolute reaming you give anyone that catches you without your 'face' on? *chuckles* Or the stomp in your stride
8/?

April 30, 2025 at 2:18 PM

JaysonDoesSpooky

@jaysondoesmusic.net

I've done so much managering today

May 12, 2025 at 1:07 PM

mount gocks

@lindsay40k.bsky.social

Whilst working at a Warhammer store, every new manager had to do what we called ‘managering’, finding something to change. One of them was concerned that a pot of demo paint was going missing a week. His solution was to superglue the paint pots to the tables. Next manager had to write off the tables

February 18, 2025 at 7:11 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news