Sam Westrick
@shwestrick.bsky.social
620 followers 150 following 180 posts
assistant professor @NYU Courant CS :: programming languages :: parallel computing :: music :: lead dev of the MaPLe compiler (https://github.com/mpllang/mpl) https://cs.nyu.edu/~shw8119/
Posts Media Videos Starter Packs
shwestrick.bsky.social
We got really fantastic feedback during reviews (thank you POPL reviewers!) and we're looking forward to updating this preprint. Stay tuned for the final paper
shwestrick.bsky.social
Of course, there are tons of juicy details and intricacies that cannot fit into (even this many) tweets. Take a look at the preprint to see all the nitty gritty stuff!

cs.nyu.edu/~am15509/pub...
cs.nyu.edu
shwestrick.bsky.social
One of the surprising things we discovered is that this "backwards-in-time" / "up-pointer invariant" concept can be very elegantly encoded in the type system as a form of subtyping that we call subtiming.

The whole system is essentially just standard unification + subtiming

⬇️
shwestrick.bsky.social
thinking about a tree of tasks (where parents fork into their children), this corresponds to an **up-pointer invariant**: the data of child tasks can only contain pointers to ancestor data.

In this way, you can never have pointers between concurrent siblings/cousins/etc

⬇️
shwestrick.bsky.social
at compile-time, TypeDis computes a partial order over timestamps (derived from the nested fork-join structure of parallel tasks) and enforces that all pointers must flow _backwards_ in time

⬇️
shwestrick.bsky.social
Yes! enter TypeDis

similar to region types, TypeDis associates a task identifier or "timestamp", δ, with every allocation.

e.g. the type string@δ indicates that the string was allocated at time δ

Unboxed types don't need these annotations (e.g. raw integers, booleans, etc)

⬇️
shwestrick.bsky.social
Can we reason about disentanglement _statically_, at the source level, and enforce it automatically at compile-time?

In other words: can we statically guarantee that the GC remains fully parallel, and never has to synchronize across concurrent tasks??

⬇️
shwestrick.bsky.social
Nearly-zero cost only holds if your program is disentangled; as soon as a violation is detected, the GC has to start synchronizing across concurrent tasks and we're back to the tricky world of reasoning about the synchronization overheads of GC

So, the question is... ⬇️
shwestrick.bsky.social
today, in MaPLe, disentanglement is checked and managed dynamically

We've put a ton of work into ensuring that this dynamic management cost is nearly zero (see dl.acm.org/doi/10.1145/... and dl.acm.org/doi/10.1145/...)

However, ⬇️
shwestrick.bsky.social
The idea is (1) give every task its own local heap, and (2) enforce that the heaps of any two _concurrent_ tasks never directly point to each other, i.e., the heaps are "disentangled"

We've been exploring this idea over the past ~10 years in MaPLe (github.com/mpllang/mpl)
GitHub - MPLLang/mpl: The MaPLe compiler: efficient and scalable parallel functional programming
The MaPLe compiler: efficient and scalable parallel functional programming - MPLLang/mpl
github.com
shwestrick.bsky.social
Disentanglement is all about parallel GC. It decomposes the heap into pieces that can be traced independently, in parallel, with no synchronization

How does it work?
shwestrick.bsky.social
in TypeDis (conditionally accepted at POPL!), we develop a type system for enforcing **disentanglement** statically at compile-time. This project was led by Alexandre Moine here at NYU, in close collaboration with Stephanie Balzer at CMU.
shwestrick.bsky.social
absolutely thrilled to announce 2 papers (conditionally) accepted at POPL!

TypeDis: A Type System for Disentanglement
(Moine, Balzer, Xu, Westrick)

All for One and One for All: Program Logics for Exploiting Internal Determinism in Parallel Programs
(Moine, Westrick, Tassarotti)
Reposted by Sam Westrick
yminsky.bsky.social
Excited to say that we're looking to hire someone to focus on OxCaml education! We're doing enough to change the language that we have a pretty big internal education task ahead of us, and we want to hire someone to focus on it!

Please share this with others!

www.janestreet.com/join-jane-st...
OxCaml Educator :: Jane Street
Jane Street is a quantitative trading firm and liquidity provider with a unique focus on technology and collaborative problem solving.
www.janestreet.com
Reposted by Sam Westrick
anil.recoil.org
And if you’re interested in OxCaml, we have a tutorial on Sunday at ICFP walking through it conf.researchr.org/track/icfp-s... (materials will be online for anyone afterwards. Just the minor detail of finishing writing them first)
shwestrick.bsky.social
Both projects led by Alexandre Moine here at NYU.

Preprints available on Alexandre's website!
cs.nyu.edu/~am15509/
Home - Alexandre Moine
cs.nyu.edu
shwestrick.bsky.social
absolutely thrilled to announce 2 papers (conditionally) accepted at POPL!

TypeDis: A Type System for Disentanglement
(Moine, Balzer, Xu, Westrick)

All for One and One for All: Program Logics for Exploiting Internal Determinism in Parallel Programs
(Moine, Westrick, Tassarotti)
shwestrick.bsky.social
(if you consider Standard ML readable) it does!!!
shwestrick.bsky.social
(The idea is to compute all prefix sums in parallel wrt this associative operator. Parallel prefix sums can be done in N work and logN span)
shwestrick.bsky.social
There are quite a few examples of “surprising” monoids. Perhaps my favorite is:

copy (a,b) = if b < 0 then a else b

This function is associative over the integers and you can use it (for example) to select the nearest positive number for every element of an array in parallel
shwestrick.bsky.social
That’s a great question — I’m sure there are more examples but off the top of my head I’m not coming up with anything right now.
shwestrick.bsky.social
one of the subtle and interesting bits is that the merging function isn’t associative, so the parallel algorithm is (in a weak sense) non-deterministic, depending on how the reduction is implemented.

And yet, all possible combination orders still yield a correct overall result
shwestrick.bsky.social
so this leaves me wondering -- has this parallel Boyer-Moore algorithm always been folklore? Is there a definitive reference for it?
shwestrick.bsky.social
and, Adam Blank alludes to the parallel algorithm on the bottom of page 6 in this set of lecture notes:
courses.cs.washington.edu/courses/cse3...
courses.cs.washington.edu
shwestrick.bsky.social
parallel Boyer-Moore majority selection is a nice bit of code; here it is in MaPLe.

this algorithm seems to be folklore -- the original Boyer-Moore algorithm is sequential, but I've found at least two mentions of the parallel algorithm in the wild: ⬇️