Lightnews — Scholar-powered news

Noah Greifer

@noahgreifer.bsky.social

3.7K followers 170 following 400 posts

Statistical consultant and programmer at Harvard IQSS. Author/maintainer of the #Rstats packages 'MatchIt', 'WeightIt', and 'cobalt' for causal inference, among many others | He/him ngreifer.github.io

Posts Media Videos Starter Packs

Pinned

Noah Greifer @noahgreifer.bsky.social · Oct 31

So grateful for the new followers! I'll definitely be posting more on here, especially about #Rstats, causal inference, R package development, and my statistical interests more generally. Also happy to answer questions about any of the above. Feel free to tag me in stuff!

5 4 48

Noah Greifer @noahgreifer.bsky.social · 3d

My hot take is that "fixed effects" has a single, clear meaning that is equivalent across all subdisciplines of statistics.

1 9

Noah Greifer @noahgreifer.bsky.social · 12d

bsky.app/profile/noah...

Noah Greifer @noahgreifer.bsky.social · Apr 24

Thinking about odds ratios...

An odds is a ratio of events to non-events. For example, if the event is survival, the odds of survival is the number of survivors per death. If the event is getting a disease, the odds is the number of diseased individuals per healthy individual.

Noah Greifer @noahgreifer.bsky.social · 20d

These methods are new and bespoke, but I'd love to hear about if they inspire or help you in your own research! Feel free to let me know if you have any questions about the methodology. How would you have solved these problems?

Noah Greifer @noahgreifer.bsky.social · 20d

All this can be done using my {MatchingFrontier} #Rstats package, which isn't yet on CRAN. This is part of my growing body of "cool new methods I've programmed but am too lazy to write a paper about". Please get in touch if you want to collab on some.

Computation of the Balance-Sample Size Frontier in Matching Methods for Causal Inference

Returns the subset of the data with the minimum imbalance for every possible subset size (N - 1, N - 2, ...), down to the data set with the minimum possible imbalance. Also includes tool...

iqss.github.io

1 10

Noah Greifer @noahgreifer.bsky.social · 20d

We used g-computation with a cluster-robust SE for pair membership. Because we dropped units from both groups, this analysis targeted the ATO, which is the best we could do given the lack of overlap.

I won't speak about the results (IMO they are less cool than the methods 😉)

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

After selecting our subset, we did a 2:1 pair match on the scaled Euclidean distance to assign each control unit a treated unit, which supplied its initiation date. Finally, we regressed the 90-day outcomes on the treatment and selected covariates in the matched sample and estimated the effect.

1 1

Noah Greifer @noahgreifer.bsky.social · 20d

The lowest energy distance was achieved with too small a sample to make inferences, given our hypothesized effect sizes. Instead, we chose the largest sample along the 2:1 energy distance frontier that had all SMDs below .1, and adjusted for remaining imbalance with regression.

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

At each step, we dropped the unit that, when dropped, yielded the largest decrease in the energy distance, subject to the constraint that the ratio of control units to treated units was 2:1 so we could perform 2:1 matching in the final selected subset. This process is greedy, not optimal, but worked

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

We dropped units to minimize the energy distance between the groups. The energy distance was described for balancing by Huling & Mak (2014); it is a scalar, multivariate measure of the difference between two *joint* distributions. In this case, those are the confounder distributions in the groups.

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

We dropped units from the sample one at a time so that the units that remained formed a balanced sample. How did we decide which unit to drop at each step and when to stop dropping units? The framework for this is described by King et al (2017) as the "matching frontier".

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

In traditional matching, pairing is used to identify a matched subset of the original sample that is balanced. Though we want pairs and and we want balance, it turns out we can do the subset selection first and the pairing second and get better results. Cho et al (2013) describe subset selection.

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

References for the above, since these papers are not well known, but should be:

King et al (2017): doi.org/10.1111/ajps...
Huling & Mak (2024): doi.org/10.1515/jci-...
Cho et al (2013): doi.org/10.1111/stan...

1 1 4

Noah Greifer @noahgreifer.bsky.social · 20d

To solve this, we combined several existing methods to develop our own. This was a combination of the matching frontier developed by King et al (2017), energy balancing developed by Huling & Mak (2024), and subset selection by Cho et al (2013).

1 3

Noah Greifer @noahgreifer.bsky.social · 20d

Traditional and even advanced matching methods failed. PS matching, Mahalanobis distance matching, and cardinality matching all failed to achieve adequate balance or maintain sufficient sample size, even after trying many variations (calipers, etc.). We needed a more specialized approach.

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

Here were our main obstacles:

1) Control units need to be paired with a treated unit to ascertain an initiation date
2) We had severe imbalance and lack of overlap on key confounders
3) Hypothesized effect sizes were small, so we needed to retain a large sample

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

The paper estimates the effect of an intervention applied to some city "blocks" over the course of several years. Treated blocks have an initiation date, and control blocks don't. We want the effect of the treatment 90 days after initiation. We needed to adjust for many confounding variables.

1 2

Noah Greifer @noahgreifer.bsky.social · 20d

A new paper I worked on is out in Justice Quarterly! I won't speak on the substantive nature of the paper as I worked solely as the methodologist, but I developed a new matching method not otherwise described in the literature, and I want to tell you about it!

#statssky #casualsky

The Effects of a Place-Based Intervention on Resident Reporting of Crime and Service Needs: A Frontier Matching Approach

Prior research has found that reporting of crime incidents and service needs remain low in many U.S. cities. This study employs a matching strategy using observational data from a large public repo...

doi.org

2 5 26

Reposted by Noah Greifer

Low Carpe Diet @lowcarpdiet.bsky.social · Feb 14

opossum my possum

1 2

Noah Greifer @noahgreifer.bsky.social · 29d

As always, please let me know if you find the package useful or have any questions, comments, bug reports, or feature requests!

cran.r-project.org/package=optw...

8/

optweight: Optimization-Based Stable Balancing Weights

Use optimization to estimate weights that balance covariates for binary, multi-category, continuous, and multivariate treatments in the spirit of Zubizarreta (2015) <<a href="https://doi.org/10.108...

cran.r-project.org

1 6

Noah Greifer @noahgreifer.bsky.social · 29d

I wrote {optweight} because I read Zubizarreta (2015) and felt like Jose was taking too long to make an R package that implemented such a cool method. Instead of doing what I was supposed to do at my SAS internship in 2018, I wrote {optweight}, and it became the basis of my dissertation.

7/

Estimating Balancing Weights for Continuous Treatments Using Constrained Optimization

cdr.lib.unc.edu

1 2

Noah Greifer @noahgreifer.bsky.social · 29d

{optweight} has so many new features, they won't fit in a thread, but check them out here: ngreifer.github.io/optweight/ne...

I also made a vignette that I think does a good job of demonstrating its capabilities. I'd love feedback on how to make the documentation and package more accessible.

6/

Using optweight to Estimate Stable Balancing Weights

ngreifer.github.io

Noah Greifer @noahgreifer.bsky.social · 29d

The methods {optweight} implements are described in Zubizarreta (2015) (doi.org/10.1080/0162...) and Wang & Zubizarreta (2020) (doi.org/10.1093/biom...), two papers that changed my life. I also recommend Chattopadyay et al (2020) (doi.org/10.1002/sim....) as an introduction to these methods.

5/

Balancing vs modeling approaches to weighting in practice

There are two seemingly unrelated approaches to weighting in observational studies. One of them maximizes the fit of a model for treatment assignment to then derive weights—we call this the modeling ...

doi.org

1 1

Noah Greifer @noahgreifer.bsky.social · 29d

Unlike standard entropy balancing, you can choose how balanced you want each covariate to be, allowing you to prioritize balance on some in order to maximize the precision of your estimate. {optweight} supports investigation of the "dual variables" that allow one to fine-tune this choice.

4/

1 1

Noah Greifer @noahgreifer.bsky.social · 29d

The variability of the weights can be measures using one of a few options. When it's the variance, {optweight} implements stable balancing weights of Zubizarreta (2015). When it's the negative entropy, it's entropy balancing of Hainmueller (2012). Each choice has some theoretical benefit.

3/

1 1

Noah Greifer @noahgreifer.bsky.social · 29d

{optweight} solves the following optimization problem:

Minimize the variability of the weights such that the weighted standardized mean difference for each covariate is less than some user-supplied threshold.

That's it. No propensity score model, no hoping you achieve balance; it just works.

2/