#rstats developer | geospatial engineering | soil microbiome and crop health mapping | product development
https://github.com/BlasBenito - www.blasbenito.com
These ensure that the training data represents the spatial correlation structure of the full dataset.
These ensure that the training data represents the spatial correlation structure of the full dataset.
spatialRF::thinning() controls spatial clustering in point data to mitigate spatial autocorrelation and sampling bias.
The ugly figure shows the before and after of an extreme thinning run with a distance of 5 degrees on a global dataset with 30k points.
spatialRF::thinning() controls spatial clustering in point data to mitigate spatial autocorrelation and sampling bias.
The ugly figure shows the before and after of an extreme thinning run with a distance of 5 degrees on a global dataset with 30k points.
The benchmark uses 30k points to compare spatialRF::thinning() (plain R), its C++ version, and an optimized C++ algorithm using spatial indexing.
Result: ~500x speed-up 🚀
Additional outcome: I didn't waste my morning!
The benchmark uses 30k points to compare spatialRF::thinning() (plain R), its C++ version, and an optimized C++ algorithm using spatial indexing.
Result: ~500x speed-up 🚀
Additional outcome: I didn't waste my morning!
This feature sounded like scifi BS to me just weeks ago.
This feature sounded like scifi BS to me just weeks ago.
The agrometeorological dataset AgERA5 (URL: cds.climate.copernicus.eu/datasets/sis...) goes off technical support.
The agrometeorological dataset AgERA5 (URL: cds.climate.copernicus.eu/datasets/sis...) goes off technical support.
The new function step_collinear() lets you add multicollinearity filtering directly into your {recipes} pipelines.
This integration omits target-encoding, as it doesn’t fit well with how recipes work.
The new function step_collinear() lets you add multicollinearity filtering directly into your {recipes} pipelines.
This integration omits target-encoding, as it doesn’t fit well with how recipes work.
The output of collinear() now takes you from raw data to model-ready output:
✅ Filtered data frame
✅ Ranking of predictors resulting from preference_order()
✅ Names of the selected predictors
✅ Model formulas to kickstart exploratory modelling.
The output of collinear() now takes you from raw data to model-ready output:
✅ Filtered data frame
✅ Ranking of predictors resulting from preference_order()
✅ Names of the selected predictors
✅ Model formulas to kickstart exploratory modelling.
By default, {collinear} analyzes the data's correlation structure to configure multicollinearity thresholds automatically.
Learn more here: blasbenito.github.io/collinear/ar...
By default, {collinear} analyzes the data's correlation structure to configure multicollinearity thresholds automatically.
Learn more here: blasbenito.github.io/collinear/ar...
Writing it was a long and rather boring effort (no LLMs were harmed), but it's helping me catch internal inconsistencies and quickly identify the splash area of new features.
Writing it was a long and rather boring effort (no LLMs were harmed), but it's helping me catch internal inconsistencies and quickly identify the splash area of new features.
I don't remember ever seeing this thing in the wild!
#rstats
I don't remember ever seeing this thing in the wild!
#rstats
Version 3.0 is coming soon, with a few significant improvements and some changes, so we'll see how things go after that.
#rstats
Version 3.0 is coming soon, with a few significant improvements and some changes, so we'll see how things go after that.
#rstats
There is a clear drop after releasing v2.0 in Jan 2025, but I get it: this was a full rewrite with no backward compatibility.
Breaking changes break trust!
#rstats
There is a clear drop after releasing v2.0 in Jan 2025, but I get it: this was a full rewrite with no backward compatibility.
Breaking changes break trust!
#rstats
Experiment:
- Subset df (30k rows, 249 cols) to random dimensions.
- Filter using a random max correlation.
- Find VIF producing the most similar result to the step above.
- Repeat 10k times.
#rstats 📦 {collinear}
Experiment:
- Subset df (30k rows, 249 cols) to random dimensions.
- Filter using a random max correlation.
- Find VIF producing the most similar result to the step above.
- Repeat 10k times.
#rstats 📦 {collinear}
I've used it at work before, but now that I am using it for my own stuff, I can say this out loud: Jenkins is pretty cool!
I've used it at work before, but now that I am using it for my own stuff, I can say this out loud: Jenkins is pretty cool!
These franken-workflows combined Bash, Grass GIS, R, and even Octave.
And ran simulations for months on a few of my lab's computers!
These franken-workflows combined Bash, Grass GIS, R, and even Octave.
And ran simulations for months on a few of my lab's computers!
It combined species distribution models, cellular automata, and least-cost paths. If my memory doesn't fail me, I used OpenModeller for the SDM, and Grass GIS for the simulation.
Fun times!
It combined species distribution models, cellular automata, and least-cost paths. If my memory doesn't fail me, I used OpenModeller for the SDM, and Grass GIS for the simulation.
Fun times!
You can run your package unit tests with {testthat} in parallel with two simple steps (see testthat.r-lib.org/articles/par...):
tldr:
1. Add `Config/testthat/parallel: true` to DESCRIPTION.
2. Add `TESTTHAT_CPUS=8` to your .Renviron and restart R.
#rstats
You can run your package unit tests with {testthat} in parallel with two simple steps (see testthat.r-lib.org/articles/par...):
tldr:
1. Add `Config/testthat/parallel: true` to DESCRIPTION.
2. Add `TESTTHAT_CPUS=8` to your .Renviron and restart R.
#rstats
It runs Ubuntu 24.04 on a 16-core i9 and 62 GB RAM, with a total storage of 5 TB.
I installed Rstudio Server there last week and bookmarked the server's address in my laptop's browser.
It took me 5 minutes!
It runs Ubuntu 24.04 on a 16-core i9 and 62 GB RAM, with a total storage of 5 TB.
I installed Rstudio Server there last week and bookmarked the server's address in my laptop's browser.
It took me 5 minutes!