OCaml
@ocaml.org
1.2K followers 15 following 400 posts
https://ocaml.org
Posts Media Videos Starter Packs
Reposted by OCaml
sabine.sh
Hey everyone, we're down to 7 PRs.

In particular, that means all of the cookbook PRs waiting for review were addressed and we're now ready to take more cookbook contributions! 🧡🐫

github.com/ocaml/ocaml....
github.com
ocaml.org
Backstage OCaml: You Can Try the Experimental Branch of Merlin That Uses Domains and Effects
The Merlin team is excited to share that you can now try out an experimental branch of Merlin that leverages OCaml 5's domains and effects! This is Merlin-domains, and we'd love for you to test it and share your feedback. What is Merlin-domains? Merlin-domains is an experimental branch that uses domains and effects to implement two optimisations to improve performance in large buffers: partial typing and cancellation. As a reminder, Merlin is the editor service that powers OCaml's IDE features—if you're using the OCaml Platform extension with VS Code or ocaml-eglot with Emacs, you're already using Merlin under the hood through OCaml LSP Server. Why This Matters While Merlin has had relatively few performance complaints over the years, in some contexts like very large files, the parsing-typing-analysis mechanism could sometimes cause slowdowns. The experimental branch addresses this in a clever way. When you run an analysis command on a very large file, the type-checker will progress up to the location that makes the analysis possible, run the analysis phase, return the result, and then continue typing the file. This separation is made possible through control flow management enabled by effects, with two domains interacting with each other. The result? Analysis phases become much more efficient! This is a great example of migrating a regular OCaml application to take advantage of multicore. Learn More at Lambda World Want to understand the technical details? Sonja Heinze and Carine Morel will present their talk "When magic meets multicore - OCaml and its elegant era of parallelism" at Lambda World, where they'll dive into how this experimental branch works internally. How to Test It Currently, the branch is in its incubation phase. To test it, pin the branch in the switches where you want to experiment: opam pin add https://github.com/ocaml/merlin#merlin-domains Although this experimental branch passes the test suite, your feedback is very important to help collect potential bugs we may have missed. The team has added a Bug/Merlin-domains label to organize tickets related to this branch. What's Next The goal is for this branch to eventually become the main branch, so that all users can benefit from these improvements. The rest of the ecosystem depending on Merlin, including OCaml LSP Server, will be adapted to take full advantage of these new features. We need you! Try out merlin-domains with your real-world OCaml projects and share your experience on the Discuss thread. Your testing and feedback will help shape the future of Merlin!
dlvr.it
ocaml.org
Backstage OCaml: You Can Try the Experimental Branch of Merlin That Uses Domains and Effects
The Merlin team is excited to share that you can now try out an experimental branch of Merlin that leverages OCaml 5's domains and effects! This is Merlin-domains, and we'd love for you to test it and share your feedback. What is Merlin-domains? Merlin-domains is an experimental branch that uses domains and effects to implement two optimisations to improve performance in large buffers: partial typing and cancellation. As a reminder, Merlin is the editor service that powers OCaml's IDE features—if you're using the OCaml Platform extension with VS Code or ocaml-eglot with Emacs, you're already using Merlin under the hood through OCaml LSP Server. Why This Matters While Merlin has had relatively few performance complaints over the years, in some contexts like very large files, the parsing-typing-analysis mechanism could sometimes cause slowdowns. The experimental branch addresses this in a clever way. When you run an analysis command on a very large file, the type-checker will progress up to the location that makes the analysis possible, run the analysis phase, return the result, and then continue typing the file. This separation is made possible through control flow management enabled by effects, with two domains interacting with each other. The result? Analysis phases become much more efficient! This is a great example of migrating a regular OCaml application to take advantage of multicore. Learn More at Lambda World Want to understand the technical details? Sonja Heinze and Carine Morel will present their talk "When magic meets multicore - OCaml and its elegant era of parallelism" at Lambda World, where they'll dive into how this experimental branch works internally. How to Test It Currently, the branch is in its incubation phase. To test it, pin the branch in the switches where you want to experiment: opam pin add https://github.com/ocaml/merlin#merlin-domains Although this experimental branch passes the test suite, your feedback is very important to help collect potential bugs we may have missed. The team has added a Bug/Merlin-domains label to organize tickets related to this branch. What's Next The goal is for this branch to eventually become the main branch, so that all users can benefit from these improvements. The rest of the ecosystem depending on Merlin, including OCaml LSP Server, will be adapted to take full advantage of these new features. We need you! Try out merlin-domains with your real-world OCaml projects and share your experience on the Discuss thread. Your testing and feedback will help shape the future of Merlin!
dlvr.it
ocaml.org
A second foray into agentic coding
Continuing the previous theme of dabbling with matters agentic. Previously, I’d quite assiduously kept my fingers away from files. This time, I wanted to try something exploratory, switching to the agent for things I was actively stuck on. I was still (very) curious at the latent remaining bug in Lucas’s excellent work. There were some corners which had been cut in the prototype, and I had a brief foray into this problem, with a view this time to ensuring artefact equivalence between what OCaml’s build system would produce and what our altered driver program was doing. If you have a pre-built compiler and a clean (of binary artefacts) OCaml source tree, you can actually build the bytecode compiler in just three, ahem, short commands (I’m intentionally glossing over all the generated source files): $ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats -linkall -a -o compilerlibs/ocamlcommon.cma utils/config.mli utils/build_path_prefix_map.mli utils/format_doc.mli utils/misc.mli utils/identifiable.mli utils/numbers.mli utils/arg_helper.mli utils/local_store.mli utils/load_path.mli utils/profile.mli utils/clflags.mli utils/terminfo.mli utils/ccomp.mli utils/warnings.mli utils/consistbl.mli utils/linkdeps.mli utils/strongly_connected_components.mli utils/targetint.mli utils/int_replace_polymorphic_compare.mli utils/domainstate.mli utils/binutils.mli utils/lazy_backtrack.mli utils/diffing.mli utils/diffing_with_keys.mli utils/compression.mli parsing/location.mli parsing/unit_info.mli parsing/asttypes.mli parsing/longident.mli parsing/parsetree.mli parsing/docstrings.mli parsing/syntaxerr.mli parsing/ast_helper.mli parsing/ast_iterator.mli parsing/builtin_attributes.mli parsing/camlinternalMenhirLib.mli parsing/parser.mli parsing/pprintast.mli parsing/parse.mli parsing/printast.mli parsing/ast_mapper.mli parsing/attr_helper.mli parsing/ast_invariants.mli parsing/depend.mli typing/annot.mli typing/value_rec_types.mli typing/ident.mli typing/path.mli typing/type_immediacy.mli typing/outcometree.mli typing/primitive.mli typing/shape.mli typing/types.mli typing/data_types.mli typing/rawprinttyp.mli typing/gprinttyp.mli typing/btype.mli typing/oprint.mli typing/subst.mli typing/predef.mli typing/datarepr.mli file_formats/cmi_format.mli typing/persistent_env.mli typing/env.mli typing/errortrace.mli typing/typedtree.mli typing/signature_group.mli typing/printtyped.mli typing/ctype.mli typing/out_type.mli typing/printtyp.mli typing/errortrace_report.mli typing/includeclass.mli typing/mtype.mli typing/envaux.mli typing/includecore.mli typing/tast_iterator.mli typing/tast_mapper.mli typing/stypes.mli typing/shape_reduce.mli file_formats/cmt_format.mli typing/cmt2annot.mli typing/untypeast.mli typing/includemod.mli typing/includemod_errorprinter.mli typing/typetexp.mli typing/printpat.mli typing/patterns.mli typing/parmatch.mli typing/typedecl_properties.mli typing/typedecl_variance.mli typing/typedecl_unboxed.mli typing/typedecl_immediacy.mli typing/typedecl_separability.mli lambda/debuginfo.mli lambda/lambda.mli typing/typeopt.mli typing/typedecl.mli typing/value_rec_check.mli typing/typecore.mli typing/typeclass.mli typing/typemod.mli lambda/printlambda.mli lambda/switch.mli lambda/matching.mli lambda/value_rec_compiler.mli lambda/translobj.mli lambda/translattribute.mli lambda/translprim.mli lambda/translcore.mli lambda/translclass.mli lambda/translmod.mli lambda/tmc.mli lambda/simplif.mli lambda/runtimedef.mli file_formats/cmo_format.mli middle_end/internal_variable_names.mli middle_end/linkage_name.mli middle_end/compilation_unit.mli middle_end/variable.mli middle_end/flambda/base_types/closure_element.mli middle_end/flambda/base_types/var_within_closure.mli middle_end/flambda/base_types/tag.mli middle_end/symbol.mli middle_end/flambda/base_types/set_of_closures_id.mli middle_end/flambda/base_types/set_of_closures_origin.mli middle_end/flambda/parameter.mli middle_end/flambda/base_types/static_exception.mli middle_end/flambda/base_types/mutable_variable.mli middle_end/flambda/base_types/closure_id.mli middle_end/flambda/projection.mli middle_end/flambda/base_types/closure_origin.mli middle_end/clambda_primitives.mli middle_end/flambda/allocated_const.mli middle_end/flambda/flambda.mli middle_end/flambda/freshening.mli middle_end/flambda/base_types/export_id.mli middle_end/flambda/simple_value_approx.mli middle_end/flambda/export_info.mli middle_end/backend_var.mli middle_end/clambda.mli file_formats/cmx_format.mli file_formats/cmxs_format.mli bytecomp/instruct.mli bytecomp/meta.mli bytecomp/opcodes.mli bytecomp/bytesections.mli bytecomp/dll.mli bytecomp/symtable.mli driver/pparse.mli driver/compenv.mli driver/main_args.mli driver/compmisc.mli driver/makedepend.mli driver/compile_common.mli utils/config.ml utils/build_path_prefix_map.ml utils/format_doc.ml utils/misc.ml utils/identifiable.ml utils/numbers.ml utils/arg_helper.ml utils/local_store.ml utils/load_path.ml utils/clflags.ml utils/profile.ml utils/terminfo.ml utils/ccomp.ml utils/warnings.ml utils/consistbl.ml utils/linkdeps.ml utils/strongly_connected_components.ml utils/targetint.ml utils/int_replace_polymorphic_compare.ml utils/domainstate.ml utils/binutils.ml utils/lazy_backtrack.ml utils/diffing.ml utils/diffing_with_keys.ml utils/compression.ml parsing/location.ml parsing/unit_info.ml parsing/asttypes.ml parsing/longident.ml parsing/docstrings.ml parsing/syntaxerr.ml parsing/ast_helper.ml parsing/ast_iterator.ml parsing/builtin_attributes.ml parsing/camlinternalMenhirLib.ml parsing/parser.ml parsing/lexer.mli parsing/lexer.ml parsing/pprintast.ml parsing/parse.ml parsing/printast.ml parsing/ast_mapper.ml parsing/attr_helper.ml parsing/ast_invariants.ml parsing/depend.ml typing/ident.ml typing/path.ml typing/primitive.ml typing/type_immediacy.ml typing/shape.ml typing/types.ml typing/data_types.ml typing/rawprinttyp.ml typing/gprinttyp.ml typing/btype.ml typing/oprint.ml typing/subst.ml typing/predef.ml typing/datarepr.ml file_formats/cmi_format.ml typing/persistent_env.ml typing/env.ml typing/errortrace.ml typing/typedtree.ml typing/signature_group.ml typing/printtyped.ml typing/ctype.ml typing/out_type.ml typing/printtyp.ml typing/errortrace_report.ml typing/includeclass.ml typing/mtype.ml typing/envaux.ml typing/includecore.ml typing/tast_iterator.ml typing/tast_mapper.ml typing/stypes.ml typing/shape_reduce.ml file_formats/cmt_format.ml typing/cmt2annot.ml typing/untypeast.ml typing/includemod.ml typing/includemod_errorprinter.ml typing/typetexp.ml typing/printpat.ml typing/patterns.ml typing/parmatch.ml typing/typedecl_properties.ml typing/typedecl_variance.ml typing/typedecl_unboxed.ml typing/typedecl_immediacy.ml typing/typedecl_separability.ml typing/typeopt.ml typing/typedecl.ml typing/value_rec_check.ml typing/typecore.ml typing/typeclass.ml typing/typemod.ml lambda/debuginfo.ml lambda/lambda.ml lambda/printlambda.ml lambda/switch.ml lambda/matching.ml lambda/value_rec_compiler.ml lambda/translobj.ml lambda/translattribute.ml lambda/translprim.ml lambda/translcore.ml lambda/translclass.ml lambda/translmod.ml lambda/tmc.ml lambda/simplif.ml lambda/runtimedef.ml bytecomp/meta.ml bytecomp/opcodes.ml bytecomp/bytesections.ml bytecomp/dll.ml bytecomp/symtable.ml driver/pparse.ml driver/compenv.ml driver/main_args.ml driver/compmisc.ml driver/makedepend.ml driver/compile_common.ml $ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats -a -o compilerlibs/ocamlbytecomp.cma bytecomp/bytegen.mli bytecomp/printinstr.mli bytecomp/emitcode.mli bytecomp/bytelink.mli bytecomp/bytelibrarian.mli bytecomp/bytepackager.mli driver/errors.mli driver/compile.mli driver/maindriver.mli bytecomp/instruct.ml bytecomp/bytegen.ml bytecomp/printinstr.ml bytecomp/emitcode.ml bytecomp/bytelink.ml bytecomp/bytelibrarian.ml bytecomp/bytepackager.ml driver/errors.ml driver/compile.ml driver/maindriver.ml $ ocamlc -I utils -I parsing -I typing -I bytecomp -I file_formats -I lambda -I middle_end -I middle_end/closure -I middle_end/flambda -I middle_end/flambda/base_types -I driver -I runtime -g -compat-32 -o ocamlc -strict-sequence -principal -absname -w +a-4-9-40-41-42-44-45-48 -warn-error +a -bin-annot -strict-formats compilerlibs/ocamlcommon.cma compilerlibs/ocamlbytecomp.cma driver/main.mli driver/main.ml I wanted to try a different angle on the Load_path, and this time produced a function which predicts the files in the tree. The rules for this were pretty easy for me to define, and I wasn’t sure I could face watching Claude special-case everything. 130 lines of verifiably correct hacked OCaml later, I had my load path function. A little bit more code later, those three commands above were translated into an OCaml script (based on the ocamlcommon and ocamlbytecomp libraries) which should exactly the same build. It ran - and it built the compiler. ocamlc was, pleasingly, exactly the same. The .cma files, however, were not. For ocamlcommon.cma, that turned out to be me being sloppy with my commands. ocamlcommon.cma is linked with -linkall, but ocamlc -a foo.cma -linkall bar.cmo is not the same as ocamlc -a foo.cma -linkall bar.ml, because -linkall gets recorded in the .cmo file as well. Easy fix - but the files were still different. A bit more tweaking and I could see that actually the .cmo files were different. A bit more poking and checking with ocamlobjinfo and a few other flags and tricks, and I observed that: $ ocamlc -g -c utils/config.ml resulted in slightly different debug information from: $ console -g -c utils/config.mli utils/config.ml (it’s observably to do with the debug information - omit the -g and they’re all identical). Lots to suspect here, but time for… $ claude ╭───────────────────────────────────────────────────╮ │ ✻ Welcome to Claude Code! │ The problem was easy to state, but not quite so quick to come up with a conclusive explanation. Claude, like most of these models, appears not to have been trained on this old cartoon, and very merrily buzzes along for a few rounds of investigation, followed by a highly dubious explanation for how it was probably something to do with marshalling and, mumble mumble, the final binaries are the same so this bug is probably OK. Hmm. A few rounds of, “no, this needs to be equivalent as otherwise it’s not reproducible” (“You’re so right!”), and we had a lot of test programs, a frequent need for reminders that debugging OCaml’s Marshalling format was possibly not going to help, but we weren’t very much closer to an answer. Stepping back, I re-framed the problem, instead asking Claude to produce a program which would give a textual dump of the debug information in each file, so we could compare it. This was interesting - especially the occasional hallucinations at having analysed “all the fields”, but we got there. What was interesting was that we were struggling to perceive differences between anything. Claude at this point was desperate to delve into the runtime code and start doing hex-dumps of the marshal format to see what was actually different. I appear to be a little older than Claude, and was more reticent about this approach. I suggested we look at the polymorphic hash of some of these fields instead. At this point, we started to see some differences - Claude’s inferences at this point were working well, and there was a strong suggestion to add all sorts of accessor functions into the Types module to be able to introspect some of the values in more detail than normally intended (i.e. polymorphic hash was telling that us that some abstract values were different, but we wanted to see what the differences really were). Reader, I told it to use Obj.magic instead 🫣 However, what happened next was truly fascinating and definitely very efficient. The value being returned for one of the type IDs was simply not believable. It was far too high. Claude also correctly observed that it was in fact a block, and not an integer, which was what we were expecting. The human brain at this point cuts in, and looks at the type: Types.get_id: t -> int. No, that accessor looks right. Brain slowly whirring; look at the code: let get_id t = (repr t).id Oh - it’s not an accessor (in another life, I could possibly have performed Claude’s responses…). All I had to point out was that Types.get_id was not an accessor, it was normalising the result (to walk Tlink members of the type representation), and Claude was on it, replacing semi-elegant OCaml code with a sea of calls to Obj functions. But we had our answer - the type chain was different, if semantically equivalent and, more importantly, Claude then leaped to the problem. The internal Types.new_id reference isn’t reset between compilations 💥 A quick rebuild later, and the same debug information was given regardless of whether utils/config.mli was compiled at the same time as utils/config.ml. Go Claude. My contribution was keeping the explorations looking at relevant parts of the system, and not disappearing off on sometimes ridiculous and unbelievable tangents. Maybe it would have got there on its own, but who knows the tokens required and the GPUs scorched… Plug that back into my little script. ocamlcommon.cma still different. At this point, a line from Four Weddings and a Funeral could be heard loud and clear in the human mind. It’s the one which follows “Dear Lord, forgive me for what I am about to, ah, say in this magnificent place of worship…”. The fix was definitely working. But a quick bit of further experimentation revealed that including other .mli files before utils/config.ml (and there are a lot) was causing the information to change. So: $ claude -c As a human of hopefully normal emotional response to situations, the feeling of being back at square one would normally have meant I’d have at least needed a coffee before being able to face dusting off all the tools and scripts which had been constructed in the previous investigations. But here of course the LLM doesn’t care and was straight into using the tools previously constructed to look at the revised problem. A lot more Obj.magic-like investigations later looking at the shape of some debugging information, and Claude found another bit to reset, this time in Ctype. All the level information in the type-checker isn’t reset between compilations. Not a semantic issue, because the type checker uses those numbers relatively, but again they leak into the representation of some of the debugging information. And it was working 🥳 Next up was trying to put those fixes into something resembling a commit series that might one day be an acceptable PR. What I really wanted was a test. Claude was great for this, although it lacks anything approximating taste (and this is me writing…!). However, with no feelings to be hurt, the pointers were easy to issue and the results impressive - especially constructing a non-trivial ocamltest block. The result is previewable in dra27/ocaml#237 on my GitHub fork, and the test is entirely Claude’s. Having got to this stage, I extended the compiler with some of Lucas’s patches, and started passing just the .ml files for compilation, allowing the compiler to compile the .mli files on demand, as before. With some idle tinkering, I got to the end of “coreall”, which is the point in OCaml’s build process where ocamlc, the bytecode versions of everything in tools/ and ocamllex have all been compiled, along with the Standard Library. That was all being done from a single compiler process, where the OCaml script driving the compiler consisted mostly of the list of .ml files. Coupled with the predictive load path I’d already put together, at this stage the “plumbing” needed in the scheduler is just: let compile_file source_file () = Compenv.readenv Format.std_formatter (Before_compile source_file); let output_prefix = Compenv.output_prefix source_file in if Filename.extension source_file = ".mli" then Compile.interface ~source_file ~output_prefix else let start_from = Clflags.Compiler_pass.Parsing in Compile.implementation ~start_from ~source_file ~output_prefix let rec execute task = try task () with effect (Load_path.Missing path), k -> let file = Filename.chop_extension path ^ ".mli" in execute (compile_file file); execute (Effect.Deep.continue k) (as an aside, when it goes to being done with Domains I’ll possibly switch it to a shallow handler, because the call stack with the deep handlers isn’t as reasonable as I’d hoped for, but to be honest I just wanted to see it work!) Fascinatingly, all the artefacts (.cma and binaries) being produced were identical except for the Lazy module in the Standard Library! $ claude -c Claude was simultaneously amazing and useless at this. Amazing, because I was prompting some of this while cooking a meal, so being able to bark an instruction (actually, I hadn’t set it up for voice - I was just quickly typing) and then leave it to think for a minute or two was strangely efficient, because investigating this on my own would have taken too much continuous concentration. It was useless because we didn’t get anywhere near a believable explanation, despite various efforts at resetting things. Sometimes you just have to say /exit (and eat a meal…). However, after the aforementioned meal, I dug into it a bit further. The issue here was clearly to do with some state in the compiler - if ocamlcommon.cma or ocamlmiddleend.cma were compiled, then the Lazy module differed. Incidentally, at this point this wasn’t debug information which varied, it was the actual module, but it was still semantically the same. Claude had correctly identified that it was to do with the marshalling, and we had identified that there was a difference in string sharing (so not entirely useless, in fairness). I carried on poking and, with a little bit of jerry-rigging, managed to determine the relatively small set of files in flambda and in ocamlcommon whose compilation caused the change in Lazy. I was highly suspicious it was to do with compilation of lazy values. $ claude -c Feeding this information to Claude was a much better trick - the reasoning at this point would contradict its own tangents (“I should look at … but wait, the user has given me the list of affected files”). Impressively, we did hone in on the much more complex explanation for this third issue, which is to do with lazy values used in globals in the Matching module. In this particular case, if the compiler has compiled a file which matched on a lazy, causing Matching.code_force_lazy_block to be forced in the compiler and thus the CamlinternalLazy identified to be added to the current persistent environment, then a subsequent module (in this case lazy.ml in the Standard Library) which both pattern matches on a lazy and which also refers to CamlinternalLazy ends up with two extern’d string representations of CamlinternalLazy instead of one. The reason is that the forced code block in Matching still refers to a string used in a previous persistent environment. It’s not a semantic issue at all, but it manifests itself because the string is not shared when the subsequent file looks up the CamlinternalLazy identifier. It was a battle to update the test to show this behaviour, but in fairness that would have been a battle anyway! However, we got there too. Three reproducibility issues identified, and a viable PR produced - with tests!
dlvr.it
ocaml.org
OCaml @ocaml.org · 10d
Retrofitting a build system into a compiler
Over the summer, Lucas Ma has been investigating ideas surrounding using effects in the OCaml compiler itself. He’s blogged some of his discoveries and adventures. The technical core of this work leads towards being able to use the OCaml compiler as a library on-demand to create a longer-lived “compiler service”. Of itself, that’s not at all revolutionary, but it is quite hard to do that with a 30 year old codebase that really was designed for single-shot separate compilation. Lucas got to grips pretty swiftly with OCaml’s build system, and initially looked at generalising a core internal part of the compiler called the Load_path. This is used by the compiler for scanning the various “include” directories for files, principally typing information. For example, if your code contains a call to Unix.stat, then the type checker needs the typing information for a module called Unix which will cause it to request unix.cmi from the Load_path and which will then hopefully resolve that to, say, ~/.opam/switch/lib/ocaml/unix/unix.cmi. Effects provide an elegant way of inverting the control for this lookup, as the program calling the compiler can then change the way these files are looked up. It also provides the opportunity to “lie” to the compiler about the files which are actually present, and this was the first thing Lucas started to do with this change. In particular, it allows us to ignore the dependency graph. When compiling a module, OCaml requires all the type information that a module refers to have been compiled beforehand. If you have a module in bar.ml with interface in bar.mli and where the code refers to Foo.value, then OCaml requires foo.mli and bar.mli both to have been compiled before bar.ml is compiled. However, thanks to this effectful trick, Lucas could instead allow the compiler to start with just bar.ml. When Foo.value is encountered, there’s a request made for foo.cmi, at which point, in the first prototype, the compiler then quickly spawned another instance of itself to compile foo.mli and then resumed compilation for bar.ml, with the same trick then happening at the end of the compilation with bar.cmi. i.e. three files (foo.mli, bar.mli and bar.ml) all compiled just from ocamlc -c bar.ml. Possibly neat for being able to remove monstrosities like this from OCaml’s source tree one day, but so far not so exciting. However, effects give us more than just hooks into the compiler’s operations. We’ve got an entire suspended compilation packaged up in a continuation… which means that that same compiler “process” can now do something else. The next trick was to have it that instead of spawning a new compiler, the current process itself returned back into the compiler and itself compiled the required interface file and then simply resumed the continuation of the previous filke. At this point, the 30-year-old codebase rears its head again. For reasons of speed and space, many parts of the compiler, especially in the type checker, feature a lot of global mutable state. In particular, the compilation pipeline is not re-entrant. Luckily, thanks to the Merlin project, there is a mechanism in the type-checker for taking snapshots of all this global state. Lucas was able to piggy-back on this so that, just before the compiler performs an effect to request a .cmi file (that doesn’t yet exist), it snapshots all its global state, performs the effect and then, when resumed, restores that state again. Using this to interrupt type-checking and start on something else isn’t quite what this Local_store mechanism was originally intended for, and there was a bit of debugging to find a few more pieces global state which weren’t being “registered”, but Lucas was able to get a means of building the OCaml bytecode compiler with nothing pre-compiled where all the compiler had to be given was the list of .ml files required. From a toolchain perspective, we’re essentially retiring ocamldep. So far, still mostly just so neat: one single compiler process (just about) successfully recompiling the compiler. However, that’s equivalent to compiling with make -j1 - a sequential, and therefore slow, build. The awesome part came next - Domains. In the final version Lucas was working on, multiple domains were started up, each one beginning compilation of one of the .ml files required for the compiler in parallel, with a scheduler handling effects coming from each of these in turn when .mli files needed compiling, and despatching those. The Local_store mechanism in the came in handy here - Lucas extended it to use Domain Local Storage, combined with the snapshotting. The prototype - for simplicity - featured no sharing between these domains. By the end of the summer, this was very nearly working, which is a result consiserably further than I’d expected in the time available! As is so often the case with these investigations, Lucas’s work had revealed some new facets to this area that weren’t clear to me before. I had previously been wondering how we would be exposing this kind of multi-threaded compiler to the user via the driver programs, but it became increasingly clear that this wasn’t something that would be necessary - the program that we were working on to build the compiler itself was of course not the compiler driver, but a build system. To me, there are two particularly exciting things about that: * It’s a really simple build system. Hopefully when the last few kinks in the parallel type checker are ironed out (read on…), we may be able to add that it’s really simple and performant. * It’s fundamental portable. It leads to the possibility of bootstrapping OCaml trivially with itself. This has been done before with ocamlbuild, but the result was a maintenance disaster. However, the sheer simplicity of the multi-domain effect-scheduling approach is making this perennial build system hacker tinker…
dlvr.it
ocaml.org
OCaml @ocaml.org · 13d
Parsimoni Joins Techstars' Autumn 2025 Programme!
We are thrilled to announce that our sister company, Parsimoni, is part of Techstars’ autumn 2025 space accelerator programme in Los Angeles! They are in great company, with 4 other very synergistic space startups ANT61, Azora, CISGAM, and Translunar Exports and Servicing Incorporated. Kick off was on the 8th of September. For the next 3 months, Parsimoni will benefit from Techstars’ mentorship, network, and investment to drive their mission of making satellite-based resources accessible to all. Parsimoni’s SpaceOS Parsimoni is developing SpaceOS, a next-generation operating system tailored to satellites and their payloads. By using unikernel technology (see MirageOS for another example), the size of each application is optimised to its minimal viable state, reducing the attack size and improving performance. The resulting efficiency is valuable for users – reducing costs, extending mission capabilities, and allowing for more advanced processing in orbit – and also better for the planet. More efficient, multi-purpose satellites make better use of limited resources and create new possibilities for the satellite industry. These advantages, along with its security-by-design approach (including PQC) and high adaptability to new features and applications, make SpaceOS the perfect candidate for use cases like AI-powered earth observation, secure mission operations, and next-generation space marketplaces. Parsimoni uses OCaml to build their software, benefitting from the language's security guarantees, and this approach is garnering attention from a world-leading accelerator. It illustrates how Tarides' mission of building mission-critical systems in OCaml is a recipe for success – in the FinTech sector, space sector, and beyond! Visit Parsimoni’s website to learn more about SpaceOS and its future development and deployment. Techstars Techstars is an accelerator with a global network that has been helping founders launch and grow their companies since 2006. Their three-month programme is designed to help startups put together all the pieces they need for success. This includes assigning mentors to each company, fostering business storytelling, providing fundraising opportunities and guidance on strategy, workshops, and networking. Techstars alumni include the very successful Chainanalysis, Zipline, DataRobot, and Alloy. For their autumn 2025 cohort, Techstars have selected startups with a strong focus on innovation and meeting future markets. Parsimoni is in the space accelerator group, and other groups include healthcare, the future of food, and the future of finance. Over 50 of the startups are also using AI in their offers. What’s Next We’re looking forward to seeing how Parsimoni will rise to the challenge over the next few months. If you want to keep up with them and SpaceOS, follow them on LinkedIn and keep an eye out for updates! Stay tuned to our blog – we will keep you updated about how Parsimoni is enabling the next revolution of space-based innovation. You can connect with us on Bluesky, Mastodon, Threads, and LinkedIn or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!
dlvr.it
ocaml.org
OCaml @ocaml.org · 14d
Caching opam solutions - part 2
Caching opam solutions - part 2 * published 2025-09-23 * notanotebook Some results from the previous post. This time I've run day10 on 144 or so commits from opam-repository to see how well the cache performs. The results are quite interesting. First let's talk about the "examination map". This is a map from package name to a list of other packages whose solutions should be recalculated if the package in question is altered. It's built by first looking at the packages that the solver asks about during the solution for a package, and then taking all of the solutions, and 'inverting' the map, so for example, if both packages 'a' and 'b' ask about package 'c' during their solutions, then altering 'c' means that the solutions for both 'a' and 'b' need to be recalculated. The examination map entry for 'c' would then be 'a'; 'b'. We can plot the histogram of the sizes of each entry in the examination map: Some interesting features from these data: * The most common number of observers is 1, meaning that the package is not involved in the solution of any other package. There are approximately 2000 such packages. * Most (~80%) of packages have fewer than 100 observers. This means that if we alter one of these packages, we only need to recalculate the solutions for fewer than 100 other packages. * A very small number of packages are observed in all 4,400 solutions. This is actually a bit artificial, as the solver adds the ocaml-compiler package as an input to all solves to ensure we get the correct compiler version. There's another way to do this which would avoid this particular problem. * A small number of packages have a very large number of observers, around 3800. This mostly corresponds with dune and its dependencies and associated packages. There are around 350 such packages, and any change to these means we need to recalcuate most of the solutions. This last point doesn't mean that we actually recompile 3,800 packages, just that we need to recalcualte the solution, which might then lead to a cache hit of the layer and no actual compilation. However, recalculating the solutions of all of the packages takes (on my computer) around 10,000 seconds, or roughly 5 minutes of wall-clock time as I've got 32 threads. However, if the package that's changes isn't one of those 350 packages, then the number of solutions that need to be recalculated is dramatically reduced. I ran the logic over the last few weeks of commits to opam-repository, from commit 109398e2fd61803126becd398df0f1eabc9f3ca2 of the 10th September up until commit 3f21ebe342ce440d9c9142ffe1185d8e5a326085 from the 22nd. In this time there were 144 commits (counting only those from git log --first-parent). Of these, only 4 resulted in a full resolve - the first commit, since obviously we have no cache at that point, the release of OCaml 5.4.0 beta2 by Florian Angeletti, a fix of ocaml-base-compiler for MSVC by David and a fix for BER-OCaml by Jeremy Yallop. Then 25 commits resulted in recalculating solutions for 3800 packages as they hit dune-adjacent packages, 5 commits resulted in recalculating between 100 and 300 packages and the remaining 110 commits resulted in recalculating fewer than 100 packages, the majority of which resulted in recalculating fewer than 5 packages. Overall, at a rough estimate, this means that over this period, using this caching strategy gave us a 5x speedup in the solver! Continue reading here
dlvr.it