#OCamlPlanet
Giving hub.cl an upgrade
Giving hub.cl an upgrade * published 2025-09-07 For a few years now we've been running hub.cl.cam.ac.uk, a Jupyterhub instance, for the first year course "Foundations of Computer Science". It serves as a hosting site for the lecture notes, which come in the form of Jupyter notebooks, and as a playground where students can try OCaml, and it also is used to run the assessed exercises that are a mandatory part of the course. Since I spent some time setting it up back in 2018 or so, its aggregated some cruft over the years, and has also fallen somewhat behind the bleeding edge of the Jupyter software stack. So I thought this year, as I'm actually lecturing the course, I'd give it a bit of loving care and attention. We were still on Jupyterhub 1.5.3 whereas the current release is 5.3.0 - so there was quite a bit of work to do. I brief play with putting things on the latest version seemed to break quite a lot of things, so I thought it might be better to go back to the drawing board and start the config again from scratch. So with some help from Claude, I've now managed to hugely simplify the whole config of Jupyterhub, and even given it a makeover to try to match the style of www.cst.cam.ac.uk as well. The improvements include: * Using caddy as a reverse proxy for TLS termination, meaning I don't have to manually renew the letsencrypt cert every 3 months * Unifying the configuration of the two container images used for students and instructors * Upgrading to much newer jupyterhub, notebook and nbgrader images * Simplifying the configuration required to make it work on a new server - persistent user directories are now docker volumes rather than bindmounts on the local filesystem * Updating the authentication method to use Raven via OAuth2 rather than the unmaintained jupyterhub-raven-auth which I'd had to maintain a patch. * Rebasing my patch to nbgrader to verify all of the output of the cells when grading answers As ever, this took longer than I'd anticipated, but I'm mostly there now. There are a few more steps to try: * trial the new patch for using ocaml-jupyter with OCaml 5.x * see how to upgrade to notebook v7, as I've stuck with v6 in order to keep the extensions we're using going. Continue reading here
dlvr.it
September 8, 2025 at 2:38 PM
OCaml Weekly 2025 w32 to w35
I have been working on a few different OCaml-related projects over the last few weeks. This is also coinciding with me moving across the UK, which has made finding time to write weeklies and posts a little tricky. Nevertheless, here are some of the things I have been thinking about and working on! I managed to publish one signficant post this month: a retrospective on Irmin. Eio Increasingly, I'm feeling the dream of a unified framework for asynchronous IO slipping through the OCaml community's fingers. It is perhaps not such a bad thing, and I think with the right library authoring we can at least get to a place where it isn't so bad, for example providing read functions as opposed to using an opinionated IO library directly. That being said, I am a very happy user of Eio when those choices do not matter, as is the case in building your own application (e.g. Shelter). To this end, I have spent a good bit of time upstreaming support for various missing pieces in Eio's API including: * Setuid and setgid fork action's for the process API. * Set process group support for job control in the process API. * Responding to Buf_write.of_flow request, and tinkering with the example there. I think this does highlight the awkwardness of making code portable across concurrency mechanisms, particularly with Eio's structured concurrency. * I did some investigating into EINTR bug which seems to be stemming from a known-issue on Uring in that writes are not buffered which usually does not matter except perhaps when there are parallel writes to stdout. * Spent some time thinking about the fiber local storage across domains issue, I've passed on some thoughts to folks working on this. Vpnkit You might recall I was interested in using vpnkit. Hannes has done an amazing amount of work (patching and releasing) a series of packages to get this into a place that is much better and could be considered soon for merging. This defunctorisation is actually very useful for the Eio port I wrote a long time ago. Papers and Talks at ICFP Somehow, I have ended up on lots of papers and talks at ICFP and the co-located events in October. The vaguely OCaml-related ones include: * Essentially a Vpnkit Experience Report. * An extended abstract on generating a corpus of ill-typed Hazel programs was accepted into TyDe workshop. * Relatedly, the work that project supported was accepted into HATRA which was the Part II project I supervised: Decomposable Type Highlighting for Bidirectional Type and Cast Systems. * And two PROPL talks! Outreachy We have come to the end of another Outreachy round! I will write more on this soon in its own separate post. But for now I am very grateful to this round's mentors gridbugs and mdales, and also our fantastic interns. If you are interested, please do watch our demo day presentations. The next round is fast approaching and we still need to work out the logistics. But I had a good conversation with mdales about possible Geocaml projects that I intend to submit!
dlvr.it
September 6, 2025 at 6:35 PM
Mosaic Terminal User Interface
In testing various visual components, terminal resizing, keyboard handling and the use of hooks, I inadvertently wrote the less tool in Mosaic. Below are my notes on using the framework. use_state is a React-style hook that manages local component state. It returns a tuple of (value, set, update) where: * count - the current value * set_count - sets to a specific value (takes a value) * update_count - transforms the current value (takes a function) Thus, you might have let (count, set_count, update_count) = use_state 0;; count (* returns the current value - zero in this case ) set_count 5 ( set the value to 5 ) update_count (fun x -> x + 1) ( adds 1 to the current value *) In practice, this could be used to keep track of the selected index in a table of values: let directory_browser dir_info window_height window_width set_mode = let open Ui in let selected_index, set_selected_index, _ = use_state 0 in use_subscription (Sub.keyboard_filter (fun event -> match event.Input.key with | Input.Up -> set_selected_index (max 0 (selected_index - 1)); None | Input.Down -> set_selected_index (min (num_entries - 1) (selected_index + 1)); None | Input.Enter -> set_mode (load_path entry.full_path); Some () | _ -> None)); Any change in the value of a state causes the UI component to be re-rendered. Consider this snippet, which uses the subscription Sub.window to update the window size, which calls set_window_height and set_window_width. let app path = let mode, set_mode, _ = use_state (load_path path) in let window_height, set_window_height, _ = use_state 24 in let window_width, set_window_width, _ = use_state 80 in (* Handle window resize *) use_subscription (Sub.window (fun size -> set_window_height size.height; set_window_width size.width)); (* Return a Ui.element using window_height and window_width *) directory_browser dir_info window_height window_width set_mode let () = run ~alt_screen:true (fun () -> app path) In my testing, this worked but left unattached text fragments on the screen. This forced me to add a Cmd.clear_screen to manually clear the screen. Cmd.repaint doesn’t seem strictly necessary. The working subscription was: use_subscription (Sub.window (fun size -> set_window_height size.height; set_window_width size.width; dispatch_cmd (Cmd.batch [ Cmd.clear_screen; Cmd.repaint ]))); It is also possible to monitor values using use_effect. In the example below, the scroll position is reset when the filename is changed. The effect is triggered only when the component is rendered and when the value differs from the value on the previous render. use_effect ~deps:(Deps.keys [Deps.string content.filename]) (fun () -> set_scroll_offset 0; set_h_scroll_offset 0; None ); The sequence is: * Component renders (first time or re-render due to state change) * Framework checks if any values in ~deps changed since last render * If they changed, run the effect function * If the effect returns cleanup, that cleanup runs before the next effect For some widgets, I found I needed to perform manual calculations on the size to fill the space and correctly account for panel borders, header, dividers, and status. window_height - 6. In other cases, ~expand:true was available. scroll_view ~height:(`Cells (window_height - 6)) ~h_offset:h_scroll_offset ~v_offset:scroll_offset file_content; Colours can be defined as RGB values and then composed into Syles with the ++ operator. Styles are then applied to elements such as table headers: module Colors = struct let primary_blue = Style.rgb 66 165 245 (* Material Blue 400 *) end module Styles = struct let header = Style.(fg Colors.primary_blue ++ bold) end table ~header_style:Styles.header ... The panel serves as the primary container for our application content, providing both visual framing and structural organisation: panel ~title:(Printf.sprintf "Directory Browser - %s" (Filename.basename dir_info.path)) ~box_style:Rounded ~border_style:Styles.accent ~expand:true (vbox [ (* content goes here *) ]) Mosaic provides the table widget, which I found had a layout issue when the column widths exceeded the table width. It worked pretty well, but it takes about 1 second per 1000 rows on my machine, so consider pagination. let table_columns = [ Table.{ (default_column ~header:"Name") with style = Styles.file }; Table.{ (default_column ~header:"Type") with style = Styles.file }; Table.{ (default_column ~header:"Size") with style = Styles.file; justify = `Right }; ] in table ~columns:table_columns ~rows:table_rows ~box_style:Table.Minimal ~expand:true ~header_style:Styles.header ~row_styles:table_row_styles ~width:(Some (window_width - 4)) () The primary layout primitives are vbox and hbox: Vertical Box (vbox) - for stacking components vertically. vbox [ text "Header"; divider ~orientation:`Horizontal (); content; text "Footer"; ] Horizontal Box (hbox) - for arranging components horizontally. hbox ~gap:(`Cells 2) [ text "Left column"; text "Right column"; ] As I mentioned earlier, a subscription-based event handling system, for example, a component could subscribe to the keyboard events. use_subscription (Sub.keyboard_filter (fun event -> match event.Input.key with | Input.Char c when Uchar.to_int c = 0x71 -> (* 'q' *) dispatch_cmd Cmd.quit; Some () | Input.Enter -> (* handle enter *) Some () | _ -> None)) The keyboard_filter function allows components to selectively handle keyboard events, returning Some () for events that are handled and None for events that should be passed to other components. Mosaic provides a command system for handling side effects and application lifecycle events some of these you will have seen in earlier examples. dispatch_cmd Cmd.quit (* Exit the application *) dispatch_cmd Cmd.repaint (* Force a screen repaint *) dispatch_cmd (Cmd.batch [ (* Execute multiple commands *) Cmd.clear_screen; Cmd.repaint ]) I found that using Unicode characters in strings caused alignment errors, as their length was the number of data bytes, not the visual space used on the screen. The mless application is available on GitHub for further investigation or as a starter project.
dlvr.it
September 5, 2025 at 10:29 PM
Terminal GUI for ocluster monitoring
I’ve been thinking about terminal-based GUI applications recently and decided to give notty a try. I decided to write a tool to display the status of the ocurrent/ocluster in the terminal by gathering the statistics from ocluster-admin. I want to have histograms showing each pool’s current utilisation and backlog. The histograms will resize vertically and horizontally as the terminal size changes. And yes, I do love btop. It’s functional, but still a work in progress. mtelvers/ocluster-monitor The histogram module uses braille characters (U+2800-U+28FF) to create dense visualizations where each character can represent up to 2x4 data points using the dots of a braille cell. In the code, these positions map to bit values: Left Column Bits Right Column Bits 0x01 (1) 0x08 (4) 0x02 (2) 0x10 (5) 0x04 (3) 0x20 (6) 0x40 (7) 0x80 (8) 1. Bit Mapping The code defines bit arrays for each column: let left_bits = [ 0x40; 0x04; 0x02; 0x01 ] (* Bottom to top *) let right_bits = [ 0x80; 0x20; 0x10; 0x08 ] (* Bottom to top *) 2. Height to Dots Conversion let level = int_of_float (height *. 4.0) This converts a height value (0.0-1.0) to the number of dots to fill (0-4). 3. Dot Pattern Generation For each column, the algorithm: * Iterates through the bit array from bottom to top * Sets each bit if the current level is high enough * Uses bitwise OR to combine all active dots 4. Character Assembly let braille_char = braille_base lor left_dots lor right_dots * braille_base = 0x2800 (base braille character) * left_dots and right_dots are OR’d together * Result is converted to a Unicode character 5. Multi-Row Histograms For taller displays, the histogram is split into multiple rows: * Each row represents a fraction of the total height * Data values are normalized to fit within each row’s range * Rows are generated from top to bottom
dlvr.it
September 4, 2025 at 10:29 PM
A ZFS Scaling Adventure
The FreeBSD workers have been getting [slower]( (https://github.com/ocurrent/opam-repo-ci/issues/449): jobs that should take a few minutes are now timing out after 60 minutes. My first instinct was that ZFS was acting strangely. I checked the classic ZFS performance indicators: * Pool health: zpool status - ONLINE, no errors * ARC hit ratio: sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.misses - 98.8% (excellent!) * Fragmentation: zpool list - 53% (high but not catastrophic) * I/O latency: zpool iostat -v 1 3 and iostat -x 1 3 - 1ms read/write (actually pretty good) But the sync command was taking 70-160ms when it should be under 10ms for an SSD. We don’t need sync as the disk has disposable CI artefacts, so why not try: zfs set sync=disabled obuilder The sync times improved to 40-50ms, but the CI jobs were still crawling. I applied some ZFS tuning to try to improve things: # Crank up those queue depths sysctl vfs.zfs.vdev.async_read_max_active=32 sysctl vfs.zfs.vdev.async_write_max_active=32 sysctl vfs.zfs.vdev.sync_read_max_active=32 sysctl vfs.zfs.vdev.sync_write_max_active=32 # Speed up transaction groups sysctl vfs.zfs.txg.timeout=1 sysctl vfs.zfs.dirty_data_max=8589934592 # Optimize for metadata zfs set atime=off obuilder zfs set primarycache=metadata obuilder sysctl vfs.zfs.arc.meta_balance=1000 However, these changes were making no measurable difference to the actual performance. For comparison, I ran one of the CI steps on an identical machine, which was running Ubuntu with BTRFS:- opam install astring.0.8.5 base-bigarray.base base-domains.base base-effects.base base-nnp.base base-threads.base base-unix.base base64.3.5.1 bechamel.0.5.0 camlp-streams.5.0.1 cmdliner.1.3.0 cppo.1.8.0 csexp.1.5.2 dune.3.20.0 either.1.0.0 fmt.0.11.0 gg.1.0.0 jsonm.1.0.2 logs.0.9.0 mdx.2.5.0 ocaml.5.3.0 ocaml-base-compiler.5.3.0 ocaml-compiler.5.3.0 ocaml-config.3 ocaml-options-vanilla.1 ocaml-version.4.0.1 ocamlbuild.0.16.1 ocamlfind.1.9.8 optint.0.3.0 ounit2.2.2.7 re.1.13.2 repr.0.7.0 result.1.5 seq.base stdlib-shims.0.3.0 topkg.1.1.0 uutf.1.0.4 vg.0.9.5 This took < 3 minutes, but the worker logs showed the same step took 35 minutes. What could cause such a massive difference on identical hardware? On macOS, I’ve previously seen problems when the number of mounted filesystems got to around 1000. mount would take t minutes to complete. I wondered, how many file systems are mounted? # mount | grep obuilder | wc -l 33787 Now, that’s quite a few file systems. Historically, our FreeBSD workers had tiny SSDs, circa 128GB, but with the move to a new server with a 1.7TB SSD disk and using the same 25% prune threshold, the number of mounted file systems has become quite large. I gradually increased the prune threshold and waited for ocurrent/ocluster to prune jobs. With the threshold at 90% the number of file systems was down to ~5,000, and performance was restored. It’s not really a bug; it’s just an unexpected side effect of having a large number of mounted file systems. On macOS, the resolution was to unmount all the file systems at the end of each job, but that’s easy when the concurrency is limited to one and more tricky when the concurrency is 20 jobs.
dlvr.it
September 4, 2025 at 2:37 PM
Label Maker in js_of_ocaml using Claude
I’ve taken a few days off, and while I’ve been travelling, I’ve been working on a personal project with Claude. I’ve used Claude Code for the first time, which is a much more powerful experience than using claude.ai as Claude can apply changes to the code and use your build tools directly to quickly iterate on a problem. In another first, I used js_of_ocaml, which has been awesome. The project isn’t anything special; it’s a website that creates sheets of Avery labels. It is needed for a niche educational environment where the only devices available are iPads, which are administratively locked down, so no custom applications or fonts can be loaded. You enter what you want on the label, and it initiates the download of the resulting PDF. The original implementation, written in OCaml (of course), uses a cohttp web server, which generates a reStructuredText file which is processed via rst2pdf with custom page templates for the different label layouts. The disadvantage of this approach is that it requires a server to host it. I have wrapped the application into a Docker container, so it isn’t intrusive, but it would be easier if it could be hosted as a static file on GitHub Pages. On OCaml.org, I found camlpdf, otfm and vg, which when combined with js_of_ocaml, should give me a complete tool in the browser. The virtual file system embeds the TTF font into the JavaScript code! I set Claude to work, which didn’t take long, but the custom font embedding proved problematic. I gave Claude an example PDF from the original implementation, and after some debugging, we had a working project. Let’s look at the code! I should add that the labels can optionally have a box drawn on them, which the student uses to provide feedback on how they got on with the objective. Claude produced three functions for rendering text: one for a single line, one for multiline text with a checkbox, and one for multiline text without a checkbox. I pointed out that these three functions were similar and could be combined. Claude agreed and created a merged function with the original three functions calling the new merged function. It took another prompt to update the calling locations to call the new merged function rather than having the stub functions. While Claude had generated code that compiles in a functional language, the code tends to look imperative; for example, there were several instances like this: let t = ref 0 in let () = List.iter (fun v -> t := !t + v) [1; 2; 3] in t Where we would expect to see a List.fold_left! Claude can easily fix these when you point them out. As I mentioned earlier, Claude code can build your project and respond to dune build errors for you; however, some fixes suppress the warning rather than actually fixing the root cause. A classic example of this is: % dune build File "bin/main.ml", line 4, characters 4-5: 4 | let x = List.length lst ^ Error (warning 32 [unused-value-declaration]): unused value x. The proposed fix is to discard the value of x, thus let _x = List.length lst rather than realising that the entire line is unnecessary as List.length has no side effects. I’d been using Chrome 139 for development, but thought I’d try in the native Safari on my Monterey-based based MacPro which has Safari 17.6. This gave me this error on the JavaScript console. [Error] TypeError: undefined is not an object (evaluating 'k.UNSIGNED_MAX.udivmod') db (label_maker.bc.js:1758) (anonymous function) (label_maker.bc.js:1930) Global Code (label_maker.bc.js:2727:180993) I found that since js_of_ocaml 6.0.1 the minimum browser version is Safari 18.2, so I switched to js_of_ocaml 5.9.1 and that worked fine. The resulting project can be found at mtelvers/label-maker-js and published at mtelvers.github.io/label-maker-js.
dlvr.it
September 4, 2025 at 2:37 AM
OCaml Program Specification for Claude
I have a dataset that I would like to visualise using a static website hosted on GitHub Pages. The application that generates the dataset is still under development, which results in frequently changing data formats. Therefore, rather than writing a static website generator and needing to revise it continually, could I write a specification and have Claude create a new one each time there was a change? Potentially, I could do this cumulatively by giving Claude the original specification and code and then the new specification, but my chosen approach is to see if Claude can create the application in one pass from the specification. I’ve also chosen to do this using Claude Sonnet’s web interface; obviously, the code I will request will be in OCaml. I wrote a detailed 500-word specification that included the file formats involved, example directory tree layouts, and what I thought was a clear definition of the output file structure. The resulting code wasn’t what I wanted: Claude had inlined huge swathes of HTML and was using Printf.sprintf extensively. Each file included the stylesheet as a . However, the biggest problem was that Claude had chosen to write the JSON parser from scratch, and this code had numerous issues and wouldn’t even build. I directed Claude to use yojson rather than handcraft a parser. I intended but did not state in my specification that I wanted the code to generate HTML using tyxml. I updated my specification, requesting that the code be written using tyxml, yojson, and timedesc to handle the ISO date format. I also thought of some additional functionality around extracting data from a Git repo. Round 2 - Possibly a step backwards as Claude struggled to find the appropriate functions in the timedesc library to parse and sort dates. There were also some issues extracting data using git. I have to take responsibility here as I gave the example command as git show --date=iso-strict ce03608b4ba656c052ef5e868cf34b9e86d02aac -C /path/to/repo, but git requires the -C /path/to/repo to precede the show command. However, the fact that my example had overwritten Claude’s knowledge was potentially interesting. Could I use this to seed facts I knew Claude would need? Claude still wasn’t creating a separate stylesheet.css. Round 3 - This time, I gave examples on how to use the timedesc library, i.e. To use the timedesc library, we can call Timedesc.of_iso8601 to convert the Git ISO strict output to a Timedesc object and then compare it with compare (Timedesc.to_timestamp_float_s b.date) (Timedesc.to_timestamp_float_s a.date). Also, in addition to stating that all the styles should be shared in a common stylesheet.css, I gave a file tree of the expected output, including the stylesheet.css. Claude now correctly used the timedesc library and tried to write a stylesheet. However, Claude had hallucinated a css and css_rule function in tyxml to do this, where none exists. Furthermore, adding the link to the stylesheet was causing problems as link had multiple definitions in scope and needed to be explicitly referenced as Tyxml.Html.link. Claude’s style was to open everything at the beginning of the file: open Yojson.Safe open Yojson.Safe.Util open Tyxml.Html open Printf open Unix The compiler picked Unix.link rather than Tyxml.Html.link: File "ci_generator.ml", line 347, characters 18-33: 347 | link ~rel:[ `Stylesheet ] ~href:"/stylesheet.css" (); ^^^^^^^^^^^^^^^ Error: The function applied to this argument has type ?follow:bool -> string -> unit This argument cannot be applied with label ~rel Stylistically, please can we only open things in functions where they are used: let foo () = let open Tyxml.Html in .... This will avoid global opens at the top of the file and avoid any confusion where libraries have functions with the same name, e.g., Unix.link and TyXml.Html.link. Furthermore, I had two JSON files in my input, each with the field name. Claude converted these into OCaml types; however, when referencing these later as function parameters, the compiler frequently picks the wrong one. This can be fixed by adding a specific type to the function parameter let f (t:foo) = .... I’ve cheated here and renamed the field in one of the JSON files. type foo = { name : string; x : string; } type bar = { name : string; y : string; } Claude chose to extract the data from the Git repo using git show --pretty=format:'%H|%ai|%s', this ignores the --date=iso-strict directive. The correct format should be %aI. I updated my guidance on the use of git show. My specification now comes in just under 1000 words. From that single specification document, Claude produces a valid OCaml program on the first try, which builds the static site as per my design. wc -l shows me there are 662 lines of code. It’s amusing to run it more than once to see the variations in styling!
dlvr.it
September 3, 2025 at 2:37 PM
Moving to opam 2.4
opam 2.4.0 was released on 18th July followed by opam 2.4.1 a few days later. This update needs to be propagated through the CI infrastructure. The first step is to update the base images for each OS. Linux ocurrent/docker-base-images The Linux base images are created using the Docker base image builder, which uses ocurrent/ocaml-dockerfile to know which versions of opam are available. Kate submitted PR#235 with the necessary changes to ocurrent/ocaml-dockerfile. This was released as v8.2.9 under PR#28251. With v8.2.9 released, PR#327 can be opened to update the pipeline to build images which include opam 2.4. Rebuilding the base images takes a good deal of time, particularly as it’s marked as a low-priority task on the cluster. macOS ocurrent/macos-infra Including opam 2.4 in the macOS required PR#56, which adds 2.4.1 to the list of opam packages to download. There are Ansible playbooks that build the macOS base images and recursively remove the old images and their (ZFS) clones. They take about half an hour per machine. I run the Intel and Apple Silicon updates in parallel, but process each pool one at a time. The Ansible command is: ansible-playbook update-ocluster.yml FreeBSD (rosemary.caelum.ci.dev) ocurrent/freebsd-infra The FreeBSD update parallels the macOS update, requiring that 2.4.1 be added to the loop of available versions. PR#15. The Ansible playbook for updating the machine is named update.yml. However, we have been suffering from some reliability issues with the FreeBSD worker, see issue#449, so I took the opportunity to rebuild the worker from scratch. The OS reinstallation is documented in this post, and it’s definitely worth reading the README.md in the repo for the post-installation steps. Windows (thyme.caelum.ci.dev) ocurrent/obuilder The Windows base images are built using a Makefile which runs unattended builds of Windows using QEMU virtual machines. The Makefile required PR#198 to The command is make windows. Once the new images have been built, stop ocluster worker and move the new base images into place. The next is to remove results/* as these layers will link to the old base images, and remove state/* so obuilder will create a new empty database on startup. Avoid removing cache/* as this is the download cache for opam objects. The unattended installation can be monitored via VNC by connecting to localhost:5900. OpenBSD (oregano.caelum.ci.dev) ocurrent/obuilder The OpenBSD base images are built using the same Makefile used for Windows. There is a seperate commit in PR#198 for the changes needed for OpenBSD, which include moving from OpenBSD 7.6 to 7.7. Run make openbsd. Once the new images have been built, stop ocluster worker and move the new base images into place. The next is to remove results/* as these layers will link to the old base images, and remove state/* so obuilder will create a new empty database on startup. Avoid removing cache/* as this is the download cache for opam objects. As with Windows, the unattended installation can be monitored via VNC by connecting to localhost:5900. OCaml-CI OCaml-CI uses ocurrent/ocaml-dockerfile as a submodule, so the module needs to be updated to the released version. Edits are needed to lib/opam_version.ml to include V2_4, then the pipeline needs to be updated in service/conf.ml to use version 2.4 rather than 2.3 for all the different operating systems. Linux is rather more automated than the others Lastly, since we now have OpenBSD 7.7, I have also updated references to OpenBSD 7.6. PR#1020. opam-repo-ci opam-repo-ci tests using the latest tagged version of opam, which is called opam-dev within the base images. It also explicitly tests against the latest release in each of the 2.x series. With 2.4 being tagged, this will automatically become the used dev version once the base images are updated, but over time, 2.4 and the latest tagged version will diverge, so PR#448 is needed to ensure we continue to test with the released version of 2.4.
dlvr.it
September 3, 2025 at 2:36 AM
Tarides Website
Bella was in touch as the tarides.com website is no longer building. The initial error is that cmarkit was missing, which I assumed was due to an outdated PR which needed to be rebased. #20 [build 13/15] RUN ./generate-images.sh #20 0.259 + dune exec -- src/gen/main.exe file.dune #20 2.399 Building ocaml-config.3 #20 9.486 File "src/gen/dune", line 7, characters 2-9: #20 9.486 7 | cmarkit #20 9.486 ^^^^^^^ #20 9.486 Error: Library "cmarkit" not found. #20 9.486 -> required by _build/default/src/gen/main.exe #20 10.92 + dune build @convert #20 18.23 Error: Alias "convert" specified on the command line is empty. #20 18.23 It is not defined in . or any of its descendants. #20 ERROR: process "/bin/sh -c ./generate-images.sh" did not complete successfully: exit code: 1 The site recently moved to Dune Package Management, so this was my first opportunity to dig into how that works. Comparing the current build to the last successful build, I can see that cmarkit was installed previously but isn’t now. #19 [build 12/15] RUN dune pkg lock && dune build @pkg-install #19 25.39 Solution for dune.lock: ... #19 25.39 - cmarkit.dev ... Easy fix, I added cmarkit to the .opam file. Oddly, it’s in the .opam file as a pinned depend. However, the build now fails with a new message: #21 [build 13/15] RUN ./generate-images.sh #21 0.173 + dune exec -- src/gen/main.exe file.dune #21 2.582 Building ocaml-config.3 #21 10.78 File "src/gen/grant.ml", line 15, characters 5-24: #21 10.78 15 | |> Hilite.Md.transform #21 10.78 ^^^^^^^^^^^^^^^^^^^ #21 10.78 Error: Unbound module "Hilite.Md" #21 10.81 File "src/gen/blog.ml", line 142, characters 5-24: #21 10.81 142 | |> Hilite.Md.transform #21 10.81 ^^^^^^^^^^^^^^^^^^^ #21 10.81 Error: Unbound module "Hilite.Md" #21 10.82 File "src/gen/page.ml", line 52, characters 5-24: #21 10.82 52 | |> Hilite.Md.transform #21 10.82 ^^^^^^^^^^^^^^^^^^^ #21 10.82 Error: Unbound module "Hilite.Md" #21 10.94 + dune build @convert #21 19.46 Error: Alias "convert" specified on the command line is empty. #21 19.46 It is not defined in . or any of its descendants. #21 ERROR: process "/bin/sh -c ./generate-images.sh" did not complete successfully: exit code: 1 Checking the hilite package, I saw that there had been a new release last week. The change log lists: * Separate markdown package into an optional hilite.markdown package Ah, commit aaf60f7 removed the dependency on cmarkit by including the function buffer_add_html_escaped_string in the hilite source. Pausing for a moment, if I constrain hilite to 0.4.0, does the site build? Yes. Ok, so that’s a valid solution. How hard would it be to switch to 0.5.0? I hit a weird corner case as I was unable to link against hilite.markdown. I chatted with Patrick, and I recreated my switch, and everything worked. File "x/dune", line 3, characters 20-35: 3 | (libraries cmarkit hilite.markdown)) ^^^^^^^^^^^^^^^ Error: Library "hilite.markdown" not found. -> required by library "help" in _build/default/x -> required by _build/default/x/.help.objs/native/help__X.cmx -> required by _build/default/x/help.a -> required by alias x/all -> required by alias default Talking with Jon later about a tangential issue of docs for optional submodules gave me a sudden insight into the corner I’d found myself in. The code base depends on hilite, so after running opam update (to ensure I would get version 0.5.0), I created a new switch opam switch create . --deps-only, and opam installed 0.5.0. When I ran dune build, it reported a missing dependency on cmarkit, so I dutifully added it as a dependency and ran opam install cmarkit. Do you see the problem? hilite only builds the markdown module when cmarkit is installed. If both packages are listed in the opam file when the switch is created, everything works as expected. The diff turned out to be pretty straightforward. let html_of_md ~slug body = String.trim body |> Cmarkit.Doc.of_string ~strict:false - |> Hilite.Md.transform + |> Hilite_markdown.transform |> Cmarkit_html.of_doc ~safe:false |> Soup.parse |> rewrite_links ~slug Unfortunately, the build still does not complete successfully. When Dune Package Management builds hilite, it does not build the markdown module even though cmarkit is installed. I wish there was a dune pkg install command! I tried to split the build by creating a .opam file which contained just ocaml and cmarkit, but this meant running dune pkg lock a second time, and that caused me to run straight into issue #11644. Perhaps I can patch hilite to make Dune Package Management deal with it as opam does? Jon commented earlier that cmarkit is listed as a with-test dependency. opam would use it if it were present, but perhaps Dune Package Management needs to be explicitly told that it can? I will add cmarkit as an optional dependency. depends: [ "dune" {>= "3.8"} "mdx" {>= "2.4.1" & with-test} "cmarkit" {>= "0.3.0" & with-test} "textmate-language" {>= "0.3.3"} "odoc" {with-doc} ] depopts: [ "cmarkit" {>= "0.3.0"} ] With my branch of hilite, the website builds again with Dune Package Management. I have created a PR#27 to see if Patrick would be happy to update the package. Feature request for Dune Package Management would be the equivalent of opam option --global archive-mirrors="https://opam.ocaml.org/cache" as a lengthy dune pkg lock may fail due to a single curl failure and need to be restarted from scratch.
dlvr.it
September 2, 2025 at 10:29 PM
Package Tool
Would you like to build every package in opam in a single Dockerfile using BuildKit? In mtelvers/package-tool, I have combined various opam sorting and graphing functions into a CLI tool that will work on a checked-out opam-repository. Many of these flags can be combined. Package version package-tool --opam-repository ~/opam-repository The package can be given as 0install.2.18 or 0install. The former specifies a specific version while the latter processes the latest version. --all-versions can be specified to generate files for all package versions. Dependencies Dump the dependencies for the latest version of 0install into a JSON file. package-tool --opam-repository ~/opam-repository --deps 0install Produces 0install.2.18-deps.json: {"yojson.3.0.0":["dune.3.19.1"], "xmlm.1.4.0":["topkg.1.0.8"], "topkg.1.0.8":["ocamlfind.1.9.8","ocamlbuild.0.16.1"], ... "0install-solver.2.18"]} Installation order Create a list showing the installation order for the given package. package-tool --opam-repository ~/opam-repository --list 0install Produces 0install.2.18-list.json: ["ocaml-compiler.5.3.0", "ocaml-base-compiler.5.3.0", ... "0install.2.18"] Solution DAG Output the solution graph in Graphviz format, which can then be converted into a PDF with dot. package-tool --opam-repository ~/opam-repository --dot 0install dot -Tpdf 0install.2.18.dot 0install.2.18.pdf OCaml version By default, OCaml 5.3.0 is used, but this can be changed using the --ocaml 4.14.2 parameter. Dockerfile The --dockerfile argument creates a Dockerfile to test the installation. package-tool --opam-repository ~/opam-repository --dockerfile --all-versions 0install For example, the above command line outputs 5 Dockerfiles. * 0install.2.15.1.dockerfile * 0install.2.15.2.dockerfile * 0install.2.16.dockerfile * 0install.2.17.dockerfile * 0install.2.18.dockerfile As an example, 0install.2.18.dockerfile, contains: FROM debian:12 AS builder_0install_2_18 RUN apt update && apt upgrade -y RUN apt install -y build-essential git rsync unzip curl sudo RUN if getent passwd 1000; then userdel -r $(id -nu 1000); fi RUN adduser --uid 1000 --disabled-password --gecos '' opam ADD --chown=root:root --chmod=0755 [ "https://github.com/ocaml/opam/releases/download/2.3.0/opam-2.3.0-x86_64-linux", "/usr/local/bin/opam" ] RUN echo 'opam ALL=(ALL:ALL) NOPASSWD:ALL' >> /etc/sudoers.d/opam RUN chmod 440 /etc/sudoers.d/opam USER opam WORKDIR /home/opam ENV OPAMYES="1" OPAMCONFIRMLEVEL="unsafe-yes" OPAMERRLOGLEN="0" OPAMPRECISETRACKING="1" ADD --chown=opam:opam --keep-git-dir=false [ ".", "/home/opam/opam-repository" ] RUN opam init default -k local ~/opam-repository --disable-sandboxing --bare RUN opam switch create default --empty RUN opam install ocaml-compiler.5.3.0 >> build.log 2>&1 || echo 'FAILED' >> build.log RUN opam install ocaml-base-compiler.5.3.0 >> build.log 2>&1 || echo 'FAILED' >> build.log ... RUN opam install 0install-solver.2.18 >> build.log 2>&1 || echo 'FAILED' >> build.log RUN opam install 0install.2.18 >> build.log 2>&1 || echo 'FAILED' >> build.log ENTRYPOINT [ "opam", "exec", "--" ] CMD bash This can be built using Docker in the normal way. Note that the build context is your checkout of opam-repository. docker build -f 0install.2.18.dockerfile ~/opam-repository Additionally, it outputs Dockerfile, which contains the individual package builds as a multistage build and an aggregation stage as the final layer: FROM debian:12 AS results WORKDIR /results RUN apt update && apt upgrade -y RUN apt install -y less COPY --from=builder_0install_2_15_1 [ "/home/opam/build.log", "/results/0install.2.15.1" ] COPY --from=builder_0install_2_15_2 [ "/home/opam/build.log", "/results/0install.2.15.2" ] COPY --from=builder_0install_2_16 [ "/home/opam/build.log", "/results/0install.2.16" ] COPY --from=builder_0install_2_17 [ "/home/opam/build.log", "/results/0install.2.17" ] COPY --from=builder_0install_2_18 [ "/home/opam/build.log", "/results/0install.2.18" ] CMD bash Build all the versions of 0install in parallel using BuildKit’s layer caching: docker build -f Dockerfile -t opam-results ~/opam-repository We can inspect the build logs in the Docker container: $ docker run --rm -it opam-results root@b28da667e754:/results# ls^C root@b28da667e754:/results# ls -l total 76 -rw-r--r-- 1 1000 1000 12055 Jul 22 20:17 0install.2.15.1 -rw-r--r-- 1 1000 1000 15987 Jul 22 20:19 0install.2.15.2 -rw-r--r-- 1 1000 1000 15977 Jul 22 20:19 0install.2.16 -rw-r--r-- 1 1000 1000 16376 Jul 22 20:19 0install.2.17 -rw-r--r-- 1 1000 1000 15150 Jul 22 20:19 0install.2.18 Annoyingly, Docker doesn’t seem to be able to cope with all of opam at once. I get various RPC errors. [+] Building 2.9s (4/4) FINISHED docker:default => [internal] load build definition from Dockerfile => => transferring dockerfile: 10.79MB => resolve image config for docker-image://docker.io/docker/dockerfile:1 => CACHED docker-image://docker.io/docker/dockerfile:1@sha256:9857836c9ee4268391bb5b09f9f157f3c91bb15821bb77969642813b0d00518d => [internal] load build definition from Dockerfile ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: connection error: COMPRESSION_ERROR
dlvr.it
September 2, 2025 at 6:34 PM
Depth-first topological ordering
Over the last few months, I have written several posts on the package installation graphs specifically, Topological Sort of Packages, Installation order for opam packages and Transitive Reduction of Package Graph. In this post, I’d like to cover a alternative ordering solution. Considering the graph above, first presented in the Topological Sort of Packages, which produces the installation order below. * base-threads.base * base-unix.base * ocaml-variants * ocaml-config * ocaml * dune The code presented processed nodes when all their dependencies are satisfied (i.e., when their in-degree becomes 0). This typically means we process “leaf” nodes (nodes with no dependencies) first and then work our way up. However, it may make sense to process the leaf packages only when required rather than as soon as they can be processed. The easiest way to achieve this is to reverse the edges in the DAG, perform the topological sort, and then install the pages in reverse order. let reverse_dag (dag : PackageSet.t PackageMap.t) : PackageSet.t PackageMap.t = let initial_reversed = PackageMap.fold (fun package _ acc -> PackageMap.add package PackageSet.empty acc ) dag PackageMap.empty in PackageMap.fold (fun package dependencies reversed_dag -> PackageSet.fold (fun dependency acc -> let current_dependents = PackageMap.find dependency acc in PackageMap.add dependency (PackageSet.add package current_dependents) acc ) dependencies reversed_dag ) dag initial_reversed With such a function, we can write this: reverse_dag dune |> topological_sort |> List.rev * ocaml-variants * ocaml-config * ocaml * base-unix.base * base-threads.base * dune Now, we don’t install base-unix and base-threads until they are actually required for the installation of dune.
dlvr.it
September 2, 2025 at 2:37 PM
Q2 Summary
I am grateful for Tarides’ sponsorship of my OCaml work. Below is a summary of my activities in Q2 2025. OCaml Infrastructure and Development OCaml Maintenance Activities General maintenance work on OCaml’s infrastructure spanned many areas, including updating minimum supported OCaml versions from 4.02 to 4.08 and addressing issues with opam-repo-ci job timeouts. Platform-specific work included resolving compatibility issues with Fedora 42 and GCC 15, addressing Ubuntu AppArmor conflicts affecting runc operations, and managing macOS Sequoia upgrades across the Mac Mini CI workers. Complex build issues were investigated and resolved, including C++ header path problems in macOS workers and FreeBSD system upgrades for the CI infrastructure. OCaml Infrastructure Migration Due to the impending sunset of the Equinix Metal platform, the OCaml community services needed to be migrated. Services including OCaml-CI, opam-repo-ci, and the opam.ocaml.org deployment pipeline were migrated to new blade servers. The migration work was planned to minimise service disruption, which was kept to just a few minutes. Complete procedures were documented, including Docker volume transfers and rsync strategies. opam2web Deployment Optimisation work was undertaken on the deployment pipeline for opam2web, which powers opam.ocaml.org, to address the more than two-hour deployment time. The primary issue was the enormous size of the opam2web Docker image, which exceeded 25GB due to the inclusion of complete opam package archives. The archive was moved to a separate layer, allowing Docker to cache the layer and reducing the deployment time to 20 minutes. opam Dependency Graphs Algorithms for managing OCaml package dependencies were investigated, including topological sorting to determine the optimal package installation order. This work extended to handling complex dependency scenarios, including post-dependencies and optional dependencies. Implemented a transitive reduction algorithm to create a dependency graph with minimal edge counts while preserving the same dependency relationships, enabling more efficient package management and installation processes. OCaml Developments under Windows Significant work was undertaken to bring containerization technologies to OCaml development on Windows. This included implementing a tool to create host compute networks via the Windows API, tackling limitations with NTFS hard links, and implementing copy-on-write reflink tool for Windows. OxCaml Support Support for the new OxCaml compiler variant included establishing an opam repository and testing which existing OCaml packages successfully built with the new compiler. ZFS Storage and Hardware Deployment Early in the quarter, a hardware deployment project centred around Dell PowerEdge R640 servers with a large-scale SSD storage was undertaken. The project involved deploying multiple batches of Kingston 7.68TB SSD drives, creating automated deployments for Ubuntu using network booting with EFI and cloud-init configuration. Experimented with ZFS implementation as a root filesystem, which was possibly but ultimately discarded and explored dm-cache for SSD acceleration of spinning disk arrays. Investigated using ZFS as a distributed storage archive system using an Ansible-based deployment strategy based upon a YAML description. Talos II Repairs Significant hardware reliability issues affected two Raptor Computing Talos II POWER9 machines. The first system experienced complete lockups after as little as 20 minutes of operation, while the second began exhibiting similar problems requiring daily power cycling. Working with Raptor Computing support to isolate the issues, upgrading firmware and eventually swapping CPUs between the systems resolved the issue. Concurrently, this provided an opportunity to analyse the performance of OBuilder operations on POWER9 systems, comparing OverlayFS on TMPFS versus BTRFS on NVMe storage, resulting in optimised build performance. EEG Systems Investigations Various software solutions and research platforms were explored as part of a broader system evaluation. This included investigating Slurm Workload Manager for compute resource scheduling, examining Gluster distributed filesystem capabilities, and implementing Otter Wiki with Raven authentication integration for collaborative documentation. Research extended to modern research data management platforms, exploring InvenioRDM for scientific data archival and BON in a Box for biodiversity analysis workflows. To support the Teserra workshop, a multi-user Jupyter environment was set up using Docker containerization. Miscellaneous Technical Explorations Diverse technical explorations included implementing Bluesky Personal Data Server and developing innovative SSH authentication mechanisms using the ATProto network by extracting SSH public keys from Bluesky profiles. Additional projects included developing OCaml-based API tools for Box cloud storage, creating Real Time Trains API integrations, and exploring various file synchronisation and backup solutions. Investigation of reflink copy mechanisms for efficient file operations using OCaml multicore.
dlvr.it
September 1, 2025 at 10:29 PM
Reflink Copy
I hadn’t intended to write another post about traversing a directory structure or even thinking about it again, but weirdly, it just kept coming up again! Firstly, Patrick mentioned Eio.Path.read_dir and Anil mentioned bfs. Then Becky commented about XFS reflink performance, and I commented that the single-threaded nature of cp -r --reflink=always was probably hurting our obuilder performance tests. Obuilder is written in LWT, which has Lwt_unix.readdir. What if we had a pool of threads that would traverse the directory structure in parallel and create a reflinked copy? Creating a reflink couldn’t be easier. There’s an ioctl call that just does it. Such a contrast to the ReFS copy-on-write implementation on Windows! #include #include #include #include #include #ifndef FICLONE #define FICLONE 0x40049409 #endif value caml_ioctl_ficlone(value dst_fd, value src_fd) { CAMLparam2(dst_fd, src_fd); int result; result = ioctl(Int_val(dst_fd), FICLONE, Int_val(src_fd)); if (result == -1) { uerror("ioctl_ficlone", Nothing); } CAMLreturn(Val_int(result)); } We can write a reflink copy function as shown below. (Excuse my error handling.) Interestingly, points to note: the permissions set via Unix.openfile are filtered through umask, and you need to Unix.fchown before Unix.fchmod if you want to set the suid bit set. external ioctl_ficlone : Unix.file_descr -> Unix.file_descr -> int = "caml_ioctl_ficlone" let copy_file src dst stat = let src_fd = Unix.openfile src [O_RDONLY] 0 in let dst_fd = Unix.openfile dst [O_WRONLY; O_CREAT; O_TRUNC] 0o600 in let _ = ioctl_ficlone dst_fd src_fd in Unix.fchown dst_fd stat.st_uid stat.st_gid; Unix.fchmod dst_fd stat.st_perm; Unix.close src_fd; Unix.close dst_fd; My LWT code created a list of all the files in a directory and then processed the list with Lwt_list.map_s (serially), returning promises for all the file operations and creating threads for new directory operations up to a defined maximum (8). If there was no thread capacity, it just recursed in the current thread. Copying a root filesystem, this gave me threads for var, usr, etc, just as we’d want. Wow! This was slow. Nearly 4 minutes to reflink 1.7GB! What about using the threads library rather than LWT threads? This appears significantly better, bringing the execution time down to 40 seconds. However, I think a lot of that was down to my (bad) LWT implementation vs my somewhat better threads implementation. At this point, I should probably note that cp -r --reflink always on 1.7GB, 116,000 files takes 8.5 seconds on my machine using a loopback XFS. A sequential OCaml version, without the overhead of threads or any need to maintain a list of work to do, takes 9.0 seconds. Giving up and getting on with other things was very tempting, but there was that nagging feeling of not bottoming out the problem. Using OCaml Multicore, we can write a true multi-threaded version. I took a slightly different approach, having a work queue of directories to process, and N worker threads taking work from the queue. Main Process: Starts with root directory ↓ WorkQueue: [process_dir(/root)] ↓ Domain 1: Takes work → processes files → adds subdirs to queue Domain 2: Takes work → processes files → adds subdirs to queue Domain 3: Takes work → processes files → adds subdirs to queue ↓ WorkQueue: [process_dir(/root/usr), process_dir(/root/var), ...] Below is a table showing the performance when using multiple threads compared to the baseline operation of cp and a sequential copy in OCaml. Copy command Duration (sec) cp -r –reflink=always 8.49 Sequential 8.80 2 domains 5.45 4 domains 3.28 6 domains 3.43 8 domains 5.24 10 domains 9.07 The code is available on GitHub in mtelvers/reflink.
dlvr.it
September 1, 2025 at 6:34 PM