Dr James Ravenscroft
jamesravey.me
Dr James Ravenscroft
@jamesravey.me
Software Engineering Leader, ML/NL-Proc specialist, big fan of coffee, food and reading
Crucial Track for August 19, 2025: "Saudade" by Zayaz

#CrucialTracks #TuneTuesday #SongsFromTheFuture I recently stumbled across Zayaz and decided to pick up his discography on bandcamp. Zero regrets.

https://app.crucialtracks.org/profile/jamesravey/20250819
August 19, 2025 at 4:07 PM
Crucial Track for July 13, 2025: "Where You Lead" by Carole King

In a move that surprises noone "song that reminds you of your favourite person" is the song I had at my wedding...

https://app.crucialtracks.org/profile/jamesravey/20250713
July 13, 2025 at 4:07 PM
Crucial Track for June 13, 2025: "Permanent Vacation" by Aerosmith

Today's #CrucialTracks entry is helping me prepare for my holiday

https://app.crucialtracks.org/profile/jamesravey/20250613
June 13, 2025 at 3:36 PM
Crucial Track for June 8, 2025: "Killing In the Name" by Rage Against the Machine

Today's #CrucialTracks entry is a song for the moment.

https://app.crucialtracks.org/profile/jamesravey/20250608
June 8, 2025 at 4:45 PM
Crucial Track for June 7, 2025: "Hypersonic Missiles" by Sam Fender

Today's #CrucialTracks entry: a modern day protest song by the geordie Bruce Springsteen.

https://app.crucialtracks.org/profile/jamesravey/20250607
June 7, 2025 at 8:19 AM
Crucial Track for June 6, 2025: "Life's Been Good" by Joe Walsh

Today's #CrucialTracks entry - what a young naive pre-teen thought was a song about how awesome it is to be famous

https://app.crucialtracks.org/profile/jamesravey/20250606
June 6, 2025 at 12:58 PM
Crucial Track for June 5, 2025: "Don't Look Back" by Boston

Today's #CrucialTrack is the larger than life phenomenon I call Boston's wall of sound

https://app.crucialtracks.org/profile/jamesravey/20250605
June 5, 2025 at 6:29 AM
Crucial Track for May 31, 2025: "Different Strings" by Rush

#CrucialTrack for today is a Rush Deep Cut

https://app.crucialtracks.org/profile/jamesravey/20250531
May 31, 2025 at 8:42 PM
Crucial Track for May 29, 2025: "Dança Ma Mi Criola" by Tito Paris

#CrucialTrack featuring a song and artist very few people in my life have heard of

https://app.crucialtracks.org/profile/jamesravey/20250529
May 29, 2025 at 9:49 PM
Crucial Track for May 19, 2025: "Seasons (Waiting On You)" by Future Islands

A song that grew on me over time.

Came for the novelty of the funny dancing man, stayed for the solid bop and warm synth noises.

https://app.crucialtracks.org/profile/jamesravey/20250519
May 19, 2025 at 8:08 AM
Reposted by Dr James Ravenscroft
New inks have arrived before the pen but I can always try them in my sheaffer

#FountainPens #Journalling
May 10, 2025 at 7:17 AM
Crucial Track for May 5, 2025: "Starship Syncopation" by Cory Wong, Metropole Orkest & Jules Buckley

For today's Crucial Tracks - a song that helps me concentrate

https://app.crucialtracks.org/profile/jamesravey/20250505
May 5, 2025 at 5:50 AM
Reposted by Dr James Ravenscroft
Spent more time than I’d have liked trying to work out how to get docker builds working inside Forgejo actions. I’ve added my notes to my digital garden. https://notes.jamesravey.me/Software/Forgejo#docker-in-docker

#docker #softeng #gitea #forgejo
April 28, 2025 at 8:19 AM
I wrote about my recent experiences talking to my computer. Not conversing with it but talking at it... Put another way I'm a born again speech-to-text fanatic. It lowers the barrier and for blogging and journalling for me. I #blogging #stt #journalling #whisper brainsteam.co.uk/2025/4/14/vo...
Voice Input Is Awesome
From frustrating early attempts to today's surprisingly seamless voice dictation, I've come a long way in my relationship with talking to computers – and it's changed the way I write.
brainsteam.co.uk
April 14, 2025 at 2:57 PM
Reposted by Dr James Ravenscroft
Went for a walk this morning before work and took photos of some pretty tree blossoms

#Personal #Gardening #Nature
April 14, 2025 at 11:38 AM
Thoroughly enjoying @aptshadow.bsky.social 's Service Model which I didn't know anything about when I picked it up. It's the perfect send up of modern bureaucratic life from a robot's POV. Its giving Hitchhiker's Guide meets IRobot. #scifi #bookstadon
April 10, 2025 at 9:30 PM
The latest version of @continue.dev supports fully self-hosted agentic development with #Ollama and #VSCode but if you use a #LiteLLM proxy for model access you won't be able to use it just yet. github.com/continuedev/...
Support of "Agents" for models with "openai" provider · Issue #5044 · continuedev/continue
Validations I believe this is a way to improve. I'll try to join the Continue Discord for questions I'm not able to find an open issue that requests the same enhancement Problem Currently extension...
github.com
April 10, 2025 at 10:59 AM
Reposted by Dr James Ravenscroft
Adding Voice to your self-hosted AI Stack
I’ve recently found that AI and small language models particularly useful for doing boring jobs like transcribing handwritten notes and speech to text. OpenAI is probably most well known for their GPT series large language models. However, one of their biggest contributions that has consistently flown under the radar for most people is their whisper speech-to-text model. Whisper is a really great model that is open and free to use and can run with a relatively small memory footprint. I’ve found that Whisper is incredibly useful for allowing me to just dictate my notes, thoughts and feelings. If you’re so inclined, you can also use technologies like Whisper to verbally converse with a large language model. Some people use these tools for brainstorming and talking to the model while they’re out and about. ChatGPT, allows you to do this, but of course everything you say is being shared with OpenAI. In this post, I’m going to show you how to set up a speech-to-text and text-to-speech pipeline as part of your self-hosted AI infrastructure, building on my previous article, which you can find here. ## Prerequisites This post assumes that you already have OpenWebUI, LiteLLM, and Ollama setup, just like the setup that I described in my earlier blog post on the subject. I’m also going to assume that you have a GPU with enough VRAM to run these new additional models, as well as the large language model that you want to talk to. You’ll be able to have an audio conversation with a model in a completely self-hosted setup without ever sending any data to OpenAI or other companies. To give you an idea of what’s possible, my full stack with speech to text and text to speech and a Llama 3.1 8 billion parameter model all runs on a single NVidia 3060 graphics card with 12GB of VRAM. If you’re looking to talk with larger, more capable models, like Gemma 27b for example, you might need a larger graphics card or a separate machine to run the language model. ## Updated Stack Architecture In this post, we’re going to introduce a new component into the existing stack. This component is called Speaches (formerly, faster-whisper-server). It provides speech-to-text via Whisper models and text-to-speech capabilities via Kokoro-82M and piper. Since LiteLLM also supports audio models, we are going to hook speeches up to LiteLLM and we should be able to serve both STT and TTS capabilities through our Caddy reverse proxy out to users on the internet. We can also optionally hook up these capabilities to OpenWebUI, which will allow us to talk to locally hosted language models using our voice. graph TD subgraph "Server" subgraph "Docker Containers" OW[OpenWebUI] SP[Speaches] OL[Ollama] LL[LiteLLM] end end subgraph "Internet" USER[Internet Users] end %% External connections USER -->|HTTPS| CADDY[Caddy] CADDY -->|Reverse Proxy| OW CADDY -->|Reverse Proxy| LL %% Internal connections OW -->|API Calls| LL SP -->|API| LL OL -->|API| LL %% Connection styling classDef docker fill:#1D63ED,color:white; classDef internet fill:#27AE60,color:white; classDef proxy fill:#F39C12,color:white; class OW,SP,OL,LL docker; class USER internet; class CADDY proxy; ## Adding Speaches to Docker Compose If you’ve already followed my previous post, you should have a Docker Compose YAML with all the services that already exist on your system set up and defined. We are going to add a new service definition for speaches to this file: services: # ... # your other services like ollama... # ... speaches: container_name: speaches restart: unless-stopped ports: - 8014:8000 healthcheck: test: ["CMD", "curl", "--fail", "http://0.0.0.0:8000/health"] interval: 30s timeout: 10s retries: 3 start_period: 5s # NOTE: slightly older cuda version is available under 'latest-cuda-12.4.1' tag image: ghcr.io/speaches-ai/speaches:latest-cuda environment: - WHISPER__COMPUTE_TYPE=int8 - WHISPER__TTL=-1 - LOOPBACK_HOST_URL=http://192.168.1.123:8014 volumes: - ./hf-cache:/home/ubuntu/.cache/huggingface/hub deploy: resources: reservations: devices: - capabilities: ["gpu"] **Key details from this step:** * We are pulling the `latest-cuda` build. If you find that you have problems, check that your cuda runtime is not old/out of date. You can check this by running `nvidia-smi -q | grep 'CUDA'`. They do offer builds for older runtimes. * The `LOOPBACK_HOST_URL` is used to tell the app running inside the container what the host machine’s IP address is. * I’m passing `WHISPER__COMPUTE_TYPE=int8` to quantize the models to 8 bit. You can try other options but you may find that it takes up more memory and inference takes longer. * `WHISPER__TTL=-1` forces the server to keep the model loaded in memory all the time. This is usually desirable if you have enough VRAM since loading the model can take a few seconds. If the model is in VRAM I usually get realtime transcription. It’s lightning fast. * I mapped port `8014` on my host machine to port `8000` that the app runs on inside the container. You can use any free TCP port, it doesn’t have to be 8014. * We persist the huggingface cache directory to disk so that we don’t have to re-download the models every time the container restarts. ## First Run Once you’ve added the service, we can simply execute `docker compose up -d speaches` to get it running for the first time. We can test the transcription service by uploading an audio file. Try recording a short voice clip or converting a short video from youtube using a service like this one curl http://<server_ip>:8014/v1/audio/transcriptions -F "file=@/path/to/file/audio.wav" The first time you do this it could take a little while since speaches will have to download the models from Huggingface. Then you’ll get some JSON output containing the transcript. Here’s an example from when I ran this command using a simpsons audio clip I found on youtube. > curl http://myserver.local:8014/v1/audio/transcriptions -F "file=@/home/james/Downloads/Bart Simpson Ay Caramba.wav" {"text":"Barg, you really shouldn't be looking through other people's things. Find anything good? I said it before and I'll say it again. Ay, carumba! Elise, bang bang! Aw, Barg, that's a blackhead gun! Eww!"}% If you plan to use TTS, you will need to follow some extra steps documented on the speaches website next: export KOKORO_REVISION=c97b7bbc3e60f447383c79b2f94fee861ff156ac # Download the ONNX model (~346 MBs) docker exec -it speaches huggingface-cli download hexgrad/Kokoro-82M --include 'kokoro-v0_19.onnx' --revision $KOKORO_REVISION # Download the voices.bin (~5.5 MBs) file docker exec -it speaches curl --location --output /home/ubuntu/.cache/huggingface/hub/models--hexgrad--Kokoro-82M/snapshots/$KOKORO_REVISION/voices.bin https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin If you would prefer to use the piper series of models, you can run **one** of the following commands to download voice models for it: # Download all voices (~15 minutes / 7.7 GBs) docker exec -it speaches huggingface-cli download rhasspy/piper-voices # Download all English voices (~4.5 minutes) docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json' # Download all qualities of a specific voice (~4 seconds) docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json' # Download specific quality of a specific voice (~2 seconds) docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json' we can test that it worked by running a request against the speech endpoint: curl http://myserver.local:8014/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3 ## Adding Models to LiteLLM Next we need to add the audio models to LiteLLM. We’re going to edit the existing `config.yaml` file and add the two new models: model_list: # ... # your other models go here... # ... - model_name: whisper litellm_params: model: openai/Systran/faster-whisper-large-v3 api_base: http://192.168.1.123:8014/v1 model_info: mode: audio_transcription - model_name: Kokoro-82M litellm_params: model: openai/hexgrad/Kokoro-82M api_base: http://192.168.1.123:8014/v1 - model_name: piper litellm_params: model: openai/hexgrad/Kokoro-82M api_base: http://192.168.1.123:8014/v1 Once you restart litellm, you should now be able to run the same tests from the previous section but using your litellm endpoint and credentials instead. Testing Speech to Text with LiteLLM: curl https://litellm.yoursite.example/v1/audio/transcriptions \ -H "Authorization: Bearer sk-your-token" \ -F model=whisper \ -F "file=@/home/james/Downloads/Bart Simpson Ay Caramba.wav" Testing Text to Speech with LiteLLM: curl https://litellm.yoursite.example/v1/audio/transcriptions \ -H "Authorization: Bearer sk-your-token" \ -H "Content-Type: application/json" \ --data '{"model":"Kokoro-82M", "input": "Hello World! ROFLMAO", "voice":"bf_isabella", "language":"en_gb"}' \ -o test.wav ## Connecting OpenWebUI to LiteLLM To add voice capability to OpenWeb UI, log in as the admin user (by default the first one you would have set up when you installed OpenWebUI, click on your username in the bottom left hand corner of the screen. Go to Admin Panel. Navigate to the Settings Tab and then to Audio. Here you can populate the STT and TTS settings. In * You can use the same endpoint for both - it is your litellm instance URL with `/v1` appended e.g. `https://litellm.mydomain.example/v1` * You can use the same litellm API key for both - I like having different keys for different apps so that I can see usage across my software stack but you can also just use litellm’s admin user password as a key if you prefer. * * For the STT model enter the corresponding name from the yaml - `whisper` in the example above * For TTS model enter the model name you used in the litellm config either `piper` or `Kokoro-82M` in the example above * In the TTS settings you also need to specify a voice. A full list of available voices can be found by going to the speaches demo gradio app (likely running on `http://yourserver.local:8014` ) and looking at the Text-To-Speech tab * My personal preference is `bf_isabella` with `Kokoro-82M` or the `en_GB-alba-medium` voice and `piper` model. I tend to stick to piper at the moment due to a weird quirk/bug I found with the voice prosody and pronunciation (see below) ## Testing Calls To talk to a model, go to OpenWebUI, select the model you want to interact with and click the call icon. In this mode, OpenWebUI will attempt to “listen” to your microphone and pass the audio to the whisper endpoint. When you stop talking, whisper will indicate a break and what you said so far will be processed by the language model. A response is generated and passed to the TTS endpoint before it is played back to you. I noticed that this doesn’t always work perfectly in Firefox Mobile, it seems to get stuck and not play back the response. However, Chromium based browsers seem to get this right. ## Prosody and Language Quirk/Bug As of writing there is a weird quirk with speaches where the `Kokoro-82M` UK english voices will default to american prosody/pronunciation of words if you do not specifically set the language as part of your request. For example, try running the following command against your own server and you’ll see that the model pronounces “common” as “carmen” and problem as “pra-blem” which sounds weird in an English accent. curl https://litellm.yoursite.example/v1/audio/transcriptions \ -H "Authorization: Bearer sk-your-token" \ -H "Content-Type: application/json" \ --data '{"model":"Kokoro-82M", "input": "This is quite a common problem", "voice":"bf_isabella"}' \ -o test.wav Unfortunately OpenWebUI does not currently have an option for passing the user’s language preference to the model which means that the pronunciation is always off for me. There are a couple possible solutions I can think of: 1. Have OpenWebUI pass a `language` param when it is making TTS API calls to litellm. 2. Have Speaches map the language of the voice automatically. Voices in Kokoro have a naming convention with the country of origin and a gender attached (e.g. american female voices prefixed `af`, british male voices prefixed `bm` and so on.) Speaches could infer the language from that. Alternatively, language could be stored in the metadata somewhere. I might make some pull requests if I’m feeling cute later. ## Conclusion I’m pretty blown away by how accurate and realistic these small models that run on a single consumer GPU can be. It’s really useful to be able to have a full speech-to-text and text-to-speech stack running locally and not have to worry about privacy. If you wanted to, you could swap out the language model in this stack for an externally hosted one like Claude Sonnet or a groq-hosted open-ish model like Llama 3.3 70b. You could even make ironic use of GPT-4o. In future articles I’ll be covering some other use cases for these tools including external voice transcription apps and tools that can plug in to your whisper API and my self-hosted home assistant stack which I am using to replace Alexa with fully self-hosted home automation tooling.
brainsteam.co.uk
April 6, 2025 at 9:17 AM
Reposted by Dr James Ravenscroft
Weeknote 13
Today is Mother’s day and temporal confusion day when the clocks go forward for reasons nobody really understands or cares about any more. Yesterday we drove up to Purton on the outskirts of Swindon to meet my mum and her partner for an early Mother’s day lunch. We went to The Bell at Purton last year too. It is quite a nice little pub, tucked away behind the A419 and far enough away from Swindon that you can barely hear the chaos of the magic roundabout. In the evening we went to see Novocaine, a silly film about a guy who can’t feel pain and has to endure a lot of physical abuse to save his girlfriend. It was funny, a little dry for my tastes but Jack Quad plays a personable everyman. This week was fairly uneventful on the work front. I met one of my reports for the first time as he had been off on paternity leave since I joined the company. I also had a few interesting conversations about how people in my team and wider department are using or failing to use AI development tools. I also started exploring SLM server benchmarking tools for measuring throughput, latency etc of models. I also led discussions with the team about using small/specialised models for an NEP-like use case instead of using huge frontier models which feel a bit overkill. _**The little gnome that my father-in-law hid in our garden at Christmas, resting against the back of our tree.**_ I’ve been doing some gardening this week too. Nothing too fancy, just tidying the raised beds ready to put some vegies in and a bit of pruning. I put down some lawn weed + feed stuff last week and decided to buy myself an electric lawn scarifier to rake up dead moss rather than doing this manually. I also found a little drunk gnome that my father and uncle made hidden in our garden at Christmas time resting against the back of our tree. I’ve found that I’ve been quite good at journaling this week. I’ve been writing in my physical notebook most days and then using my local vision/language model setup to transcribe. It’s quite interesting to see my thoughts in posterity. I am aiming to keep it up. I’ve been reading Tiny Experiments and taking notes about that too. The journaling is a bit of a side effect from this book tbh. If I keep track of my thoughts better, I should be better able to track the results of my tiny experiments. The gardening bed has one of my experiments this week. Another was doing High Intensity Interval Training twice before work. I managed HIIT on Wednesday but despite doing a warm down routine and stretching, I felt quite sore and achy the rest of the week and didn’t manage another session. I will try again this week and see if my fitness has improved. Next week I’m also planning to be in London twice as usual. I’ve got the dentist tomorrow which is always a joy and aside from that I’m hoping for another relatively quiet week. My main goal is to get the lawn sorted and maybe try to plant some veggies.
brainsteam.co.uk
March 30, 2025 at 3:46 PM
Reposted by Dr James Ravenscroft
Having a go at setting up home assistant with ESPHome since it’s high time I chucked Alexa out

#HomeAssistant #ESPHome
March 22, 2025 at 10:02 AM
Reposted by Dr James Ravenscroft
Weeknote 11/2025
Week 11 of 2025 has happened already. This week I was in London on Monday and Thursday and got the opportunity to spend some time with colleagues from TR’s Zug office who had travelled over to London for work. On Thursday, my manager even took us out for sushi which was an unexpected bonus. I started the week by doing battle with AWS Lambda Authorizer functions which it took me a long time to wrap my head around. I made some notes about my experience in my digial garden. After this, I also started to experiment with importing some data into neo4j and found that it wouldn’t work on my Mac M4 machine until I fiddled with some java settings. I’m still enjoying my new role and trying to find the right balance between my management responsibilities and opportunities for individual contributions. At home we had our new laminate flooring put down in the living room and finally got rid of the cheap off-white carpet that the previous owner had put down which was tattered and had threads pulled in it from a few years of use. It is particularly funny when the cats The new floor is lovely and should prove much easier to clean and hard wearing. We are planning on getting a couple of rugs to put down and make the room a bit cosier eventually. One particularly amusing thing is that the cats have been playing on the new floor and they are having a great time. They keep skittering across the floor like Roadrunner and Coyote, running on the spot before they get any purchase. On Friday we went to see Black Bag which is a british cerebral spy movie starring Cate Blanchett and Michael Fassbender as a couple who both work for MI6. It focusses on their ability to compartmentalise their personal and work lives as they are embroiled in a conspiracy. It was a really good, fairly twisty film. It’s a little bit of a slow-burner but it does a great job of building tension and suspense. Following their annoying and anti-consumer move to prevent you from downloading Kindle books, Amazon this week announced thta they are turning off local processing of voice commands for some Alexa devices. Although, as Terence Eden points out, this is something of a nothing-burger for most Alexa users since most devices have always sent the voice commands to Amazon servers. However, it reminded me about my ambition to drop Amazon and spin up some local voice assistant software, namely Home Assistant. I finally got started this weekend, installing Home Assistant on my Raspberry Pi and setting up a few basic automations. It’s been interesting to get to know how the system works and connect it to my existing “smart” devices. The main use cases I have for Alexa are turning my lights on and off, setting timers for cooking and playing music through bluetooth speakers. It looks like Home Assistant isn’t brilliant for music stuff but I will be exploring setting up a music player satelite with a Pi Zero and Squeezelite. I’m also continuing to use Hoarder to create a personal web archive. One feature that hoarder doesn’t yet have is the ability to export epubs of articles that I’ve captured for later reading. I still use Wallabag and KOReader on my Kobo device for this purpose and managing both is a bit of a pain. This week, I created HoardBag which is a python-based cron script that periodically checks a list in my Hoarder instance and syncs new items over to Wallabag for later reading. The script copies the captured content from Hoarder straight over to Wallabag without the need for any further scraping or web requests which means it works nicely with paywalled content captured using the Hoarder SingleFile plugin. I’ve got the script set up to run once every 15 minutes on my server. This will allow me to keep track of all the articles that I’ve captured and easily access them later from my Kobo e-ink reader when I’m ready. I initialy struggled for a while trying to get my python tests to run in CI due to mis-configured Forgejo runner labels. The official documentation currently recommends setting up the runner so that `ubuntu-latest` uses a nodejs docker image but this does not work with setup-python or other python action steps. Eventually I realised that aliasing `ubuntu-latest` to catthehacker’s ubuntu act images would do what I needed. I made notes on this in my digital garden. I’m still (slowly) reading the first of Michael Connelly’s Bosch books, ‘The Black Echo’ but I’m excited to get stuck into Anne Laure’s Tiny Experiments when I have the headspace. Next week, I hope that the weather will continue to be fine and sunny and I can get some garden work done. I’ll be in London on my usual schedule for work and on the weekend we are planning to head to Twickenham stadium to see Sailawaze Live, a cruise expo that Mrs R won tickets for a couple of weeks ago. Hopefully we’ll get plenty of free samples and to try out some of the activities including archery and zip lining.
brainsteam.co.uk
March 16, 2025 at 2:23 PM
AI code assistants can introduce hidden security risks. I observed that 4 frontier models add Hard to spot but potentially catastrophic HTTPS vulnerabilities when fixing "broken" code. #infosec #AI #CodeSafety #curl brainsteam.co.uk/2025/2/12/ai...
Getting AI Assistants to generate insecure CURL requests
Testing AI code assistants willingness to generate insecure CURL requests
brainsteam.co.uk
February 12, 2025 at 1:20 PM
Reposted by Dr James Ravenscroft
Estimates on the number of professional software engineers range around 20-35 million, globally.

With GenAI, anyone can instruct an agent to code. That doesn’t make them a professional: but eventually it might force them to hire one when complexity gets out of hand!
December 21, 2024 at 8:49 AM
Reposted by Dr James Ravenscroft
Replacing bus drivers with "self-driving" buses that actually require 2 drivers on board for safety seems like a little glimpse into the future for any companies thinking of replacing software developers with "cheaper" "A.I." devs.

www.youtube.com/watch?v=97wr...
The AI Employee Era Has Begun
YouTube video by ThePrimeTime
www.youtube.com
December 19, 2024 at 1:52 PM
Reposted by Dr James Ravenscroft
My body is a machine that turns small tasks into weeks of anxiety
December 15, 2024 at 6:14 PM