`BTreeMap<A, B>` -> `BTreeMap<A, C>`: Fast and memory efficient
Considering the following problem:
* I have a `BTreeMap<A, B>`
* I have a `Fn(B) -> C`
* I want a `BTreeMap<A, C>`
I currently have two options, to pick from:
* `BTreeMap::from_iter(map.into_iter().map(f))` - cpu efficient but memory inefficient: since `from_iter` can't assume the iterator is already sorted, it consumes the entire iterator into a temporary heap allocation, then sorts it, then calls the internal `bulk_build_from_sorted_iter` function (which is very fast). Assuming a `BTreeMap<String, String>` with 10_000_000 entries on a 64bit computer, this is going to cause an additional 457MiB allocation for the Vec however.
* `new_map.extend(old_map.into_iter().map(f))` - memory efficient but cpu inefficient: this calls `.insert()` for each item, effectively copying them over one-by-one. This doesn't have any unexpected spikes in memory usage, but each insert is going to traverse the tree to find the correct insert position, over and over.
There was some prior discussion about exposing `bulk_build_from_sorted_iter` so we can make our own function that is both fast+efficient:
Feature Request] Expose bulk_build_from_sorted_iter in BTreeMap [libs
> BTreeMap's from_iter implementation collects all values into a vec and sorts them, before passing into bulk_build_from_sorted_iter. Sometimes, the data comes sorted. For my use case, from_iter is already insanely fast; I am looking mainly to avoid the extra Vec allocation necessary for sorting. Use case is a dictionary of 1mil+ terms each containing some misc info, parsed using a custom iterator (not in contiguous Vec<_> form so I think this can't possibly be compiler-optimized away). This is a…
github.com/rust-lang/rfcs
#### Make BTreeMap's from_sorted_iter public
opened 01:18AM - 02 Dec 16 UTC
PlasmaPower
T-libs-api
Use case: I'm trying to merge `BTreeMap`s, but in the case of a duplicate key, I…'d prefer to add the values instead of ignoring the first. To do this, I was planning on implementing a custom MergeIter, which seems simple enough. However, I encountered one problem: BTreeMap's `from_sorted_iter` is private, meaning that I cannot efficiently use this iterator to merge maps. I think the reasoning behind this being private is that being passed an unsorted iterator would result in a logic error in the BTreeMap. However, it's already possible to cause this in safe code with an unreliable Cmp implementation. If the function is made public, I'd recommend a couple of changes to make it ready for use without other internal functions. First, don't take `self`, but instead start with an empty map. Note that we might also consider just asserting that the map is empty to start with, as I think this change would require either an unnecessary BTreeMap creation or `mem::uninitialized` in `BTreeMap::append`. Second, add `fix_right_edge()` to the end of the function. This change shouldn't cause any problems for `BTreeMap::append` as it's already calling it afterwords.
To avoid the discussion about "should it panic, should it be unsafe { }" (me personally would prefer "may panic" over "unsafe"), and since `BTreeMap<A, B>` already gives us `Ord` guarantees, there could be the following APIs (slightly pseudo-code-y):
* `BTreeMap::map_values<K, V, F: Fn(K) -> V>(self, F) -> Self<K, V>`
* `BTreeMap::try_map_values<K, V, E, F: Fn(K) -> Result<V, E>>(self, F) -> Result<Self<K, V>, E>`
This is the first time I'm proposing language improvements like this (also first post in the internals forum), let me know what you think!