Etienne Bacher
@etiennebacher.bsky.social
120 followers 69 following 68 posts
PhD in economics from LISER, Luxembourg, now looking for research software engineer or data science positions. Mostly here to talk about #rstats https://github.com/etiennebacher
Posts Media Videos Starter Packs
etiennebacher.bsky.social
I might have something for you in a few weeks
Reposted by Etienne Bacher
vincentab.bsky.social
{tinytable} 0.14.0 for #RStats makes it super easy to draw tables in html, tex, docx, typ, md & png.

There are only a few functions to learn, but don't be fooled! Small 📦s can still be powerful.

Check out the new gallery page for fun case studies.

vincentarelbundock.github.io/tinytable/vi...
a table about lemurs a table about students and schools a table about wines
etiennebacher.bsky.social
Other features include:
* improved support for lubridate functions
* experimental support for fallback to R when a function cannot be
translated to polars
* performance improvements when handling thousands of columns

and more.

Full changelog: www.tidypolars.etiennebacher.com/news/

(2/2)
Changelog
www.tidypolars.etiennebacher.com
etiennebacher.bsky.social
#rstats tidypolars 0.14.0 is now available!

tidypolars provides the tidyverse syntax while using polars for efficient computations in the background.

This release brings compatibility with polars 1.0.0, released earlier
this week, and several features and bug fixes.

(1/2)
etiennebacher.bsky.social
Would love to but looks unlikely. The upstream polars library uses very recent Rust versions which are not available on CRAN machines, and they update the rust version they use quite frequently.

R-universe / R-multiverse is the best way to get polars binaries for now.
etiennebacher.bsky.social
polars can be used via webR, thanks to the work of @gws.phd both on webR and on the Rust component of polars.

It also uses savvy, developed by @yutannihilation-en.bsky.social‬, to connect Rust and R.
etiennebacher.bsky.social
polars 1.0.0 contains many breaking changes because internals have changed or because we better match the function and argument names provided in python polars.

We encourage you to refer to the python polars documentation if you use some functions or arguments that don't exist anymore.
etiennebacher.bsky.social
The way to handle custom R functions has been revamped and no longer relies on map_elements() or map_batches(). Instead, if your code cannot be implemented using polars syntax, we encourage the use of purrr and mirai.

A new vignette explains this in more details: pola-rs.github.io/r-polars/vig...
Using custom functions - Polars R Package
pola-rs.github.io
etiennebacher.bsky.social
Catching up with the python implementation means that we now have access to the new streaming engine with collect(engine = "streaming").

This is in many cases faster than the default engine and allows processing data that is larger than RAM. It will also continue to improve in future releases.
Some code (joining some flight data with some weather data) that runs using the standard engine, takes about 11s to finish. Some code (joining some flight data with some weather data) that runs using the streaming engine, takes about 6s to finish.
etiennebacher.bsky.social
polars now uses rlang to take advantages of dynamic dots, better checks, and clearer error messages.

Compare for instance the old (polars0) and new (polars) errors when a column name doesn't exist:
A picture of code that shows an error message with the old version of polars. 

> library(polars0)
> pl$DataFrame(x = "a")$select(pl$col("a")$str$pad_end(5, "multiple_chars"))
Error:
! Execution halted with the following contexts
   0: In R: in $select()
   0: During function call [pl$DataFrame(x = "a")$select(pl$col("a")$str$pad_end(5, "multiple_chars"))]
   1: The argument [fillchar] caused an error
   2: Expected a value of type [char]
   3: Got value [Rvalue: ["multiple_chars"], Rsexp: Strings, Rclass: ["character"]] A picture showing the error message with the new version of polars:

> library(polars)
> pl$DataFrame(x = "a")$select(pl$col("a")$str$pad_end(5, "multiple_chars"))
Error:
! Evaluation failed in `$select()`.
Caused by error:
! Evaluation failed in `$select()`.
Caused by error in `pl$col("a")$str$pad_end()`:
! Evaluation failed in `$pad_end()`.
Caused by error:
! Expected a string with one character only, currently has 14 (from "multiple_chars").
etiennebacher.bsky.social
#rstats polars 1.0.0 is now available!

polars provides bindings to the eponymous Rust library, a very efficient dataframe library.

This is the outcome of a 1-year long rewrite led by @eitsupi to refactor the internals and match the API of Python polars.

pola-rs.github.io/r-polars/

Details ⬇️
Polars R Package
pola-rs.github.io
etiennebacher.bsky.social
You could also be interested in tidypolars, especially for (but not limited to) parquet datasets.

(Disclaimer: I'm the author)

Same as duckplyr, that depends on whether the functions you use are translated.

Major drawback is that it's not on CRAN: tidypolars.etiennebacher.com
More Efficient Tidyverse Code, Using Polars in the Background
Polars is a cross-language tool for manipulating very large data. However, one drawback is that the R implementation has a syntax that will look odd to many R users who are not used to Python syntax. ...
tidypolars.etiennebacher.com
etiennebacher.bsky.social
Wow that's surprisingly low to me
etiennebacher.bsky.social
Is this about dependencies mentioned in DESCRIPTION or also recursive ones?
etiennebacher.bsky.social
fix_*() functions now have an argument `interactive` to show suggested changes in a Shiny app before applying them (see the gif below). This is useful to check the changes instead of modifying several files at once.

(3/3)
A gif that shows some lints in two R files (e.g. any(is.na(x))). Then it shows that flir::fix(c("foo1.R", "foo2.R"), interactive = TRUE) opens a Shiny app in the viewer so that the user can decide whether the proposed changes should be accepted or not.
etiennebacher.bsky.social
Package developers can now provide customized rules for {flir} users, for instance to detect and replace deprecated or superseded functions.

This is detailed in the following vignette and is still an early feature, so feedback is much appreciated: flir.etiennebacher.com/articles/sha...

(2/3)
Sharing rules across packages
flir.etiennebacher.com
etiennebacher.bsky.social
#rstats {flir} 0.5.0 is now available!

{flir} can detect and rewrite code patterns.

This is the 1st CRAN release and has two nice features:
- share custom rules across projects
- show changes in Shiny app before applying them

flir.etiennebacher.com/news/#flir-050

More details in the 🧵 (1/3)
Changelog
flir.etiennebacher.com
Reposted by Etienne Bacher
lrberge.bsky.social
Just gave a presentation a the REGIS summer school we organised in Bordeaux.

Algorithms for empirical research

Slides and material here: github.com/lrberge/2025...

The conclusion is that, even in the age of AI, using your brain is still relevant!

#RStats #EconSky
etiennebacher.bsky.social
Nope but someone writes rv, that I haven't tried: github.com/A2-ai/rv
etiennebacher.bsky.social
Used to agree but uv really changes that. It's very good
etiennebacher.bsky.social
...it's better to let the students "suffer" with projects that dont have version control. To me, it's easier to see the benefits of this tool after you have spent some time without it, so it's clearer what gap it fills in the workflow.

So in an intro course I would teach projects but not git (2/2)
etiennebacher.bsky.social
Good question, I think it depends on the tool you teach. For instance, Rstudio projects shouldn't be too hard to understand (despite the caveat about tech savvy you mentioned), so I try to introduce this early.

Other tools are harder to understand, such as git. In this case, I think ... (1/2)
etiennebacher.bsky.social
Totally agree with you, my point was just that regardless whether you teach them rstudio or positron, you will have to teach them how to properly use a project setup.

There are reasons to prefer rstudio over positron, I just don't think Rstudio projects is the main one.