Crystal Lewis
@cghlewis.bsky.social
5.5K followers 1.7K following 1.3K posts
Research Data Management Consultant | cghlewis.com Co-organizer @r-ladies-stl.bsky.social‬ Co-organizer POWER Data Management Hub | https://osf.io/ap3tk/ Author of DMLSER: https://datamgmtinedresearch.com/ RDM Weekly: https://rdmweekly.substack.com/
Posts Media Videos Starter Packs
Pinned
cghlewis.bsky.social
Re-introduction for new followers!
Hello! 👋
I am currently a freelance research data management consultant. I also co-organize R-Ladies St. Louis. I mostly post about data management and #rstats data wrangling tips. I also recently wrote this book.
datamgmtinedresearch.com
Welcome | Data Management in Large-Scale Education Research
This is the in-progress version of Data Management in Large-Scale Education Research.
datamgmtinedresearch.com
Reposted by Crystal Lewis
simonpcouch.com
ICYMI, @sara-altman.bsky.social and I have been writing a biweekly newsletter on AI and open source data science on the @posit.co blog!

A bit about how that came to be on my #rstats blog: www.simonpcouch.com/blog/2025-10...
A screenshot of the Posit Blog homepage showing statistics (936+ posts, 22+ categories, 386+ tags) and two featured blog post cards. Both posts are AI Newsletter roundups from September 2025 by Sara Altman and myself, featuring AI-related R package hexes in their hero images.
Reposted by Crystal Lewis
researchcultureuol.bsky.social
Day 2 of ##Hiddenref2025 and a reminder that people are as important as outputs.
Reposted by Crystal Lewis
elipousson.bsky.social
Before I code something from scratch, does anyone have a #rstats function like setdiff but it works with named lists and/or data frame rows? Optionally dropping duplicate values and keeping the differences from the first list?
Reposted by Crystal Lewis
noamross.net
A month or two ago someone posted a link to their really amazing set of LLM system instructions for writing #rstats code with good tidy/NSE patterns. (They were also good for humans!) Does anyone recall who or where that was?
cghlewis.bsky.social
No one prepares you that when you work for yourself, you no longer have IT support available to you. But today, thanks to YouTube, I became my own IT person. 😂🙏
cghlewis.bsky.social
All blessing, no curse. :)
Reposted by Crystal Lewis
kbriney.bsky.social
Today on my blog: some thoughts on the best data management strategies for collaboration: dataabinitio.com?p=1204

What's your best data tip for collaborative research?
Data Management for Collaborations » Data Ab Initio
dataabinitio.com
cghlewis.bsky.social
Thanks for checking the newsletter out! Ooof, I don't think I can choose a favorite because they all are very interesting and helpful for different reasons. But I think the AI-generated participant data article is one that probably piques a lot of interest right now.
cghlewis.bsky.social
Issue 16 of RDM Weekly is out! 📬

It includes:
- Data is Not Available Upon Request @ianhussey.mmmdata.io
- AI Generated Participants in Social Science @jamiecummins.bsky.social @science.org
- Why’s it Hard to Teach Data Cleaning? @randyau.com
and more!

rdmweekly.substack.com/p/rdm-weekly...
RDM Weekly - Issue 016
A weekly roundup of Research Data Management resources.
rdmweekly.substack.com
Reposted by Crystal Lewis
Reposted by Crystal Lewis
weedenkim.bsky.social
Just filled out a web survey with a bunch of Likert-type items. The response categories were in the same order on each item, but not the order you'd expect:

fairly important
important
unimportant
very important

Pretty sure "alphabetize response categories" is not best practice in survey design.
cghlewis.bsky.social
When you've been working with someone for a while and you start to see the little ways that you are impacting how they work with data. 🤩

The name of a file someone just shared with me
"feedback_survey_raw_2025-08-15"
Reposted by Crystal Lewis
devezer.bsky.social
It's so deflating to lose an irreplaceable staff member. It's worse when you lose them to another unit on campus. I view that as a clear administrative failure and so should the admin. Academic staff is the glue that holds everything together yet they're so routinely underpaid and underappreciated.
cghlewis.bsky.social
Oh no! 😅 I'm sorry, John!
cghlewis.bsky.social
That is definitely a way to look on the bright side!
cghlewis.bsky.social
Does it mean you're doing too much when you get the late start date wrong and you get your kiddo to school 1.5 hours late? 🤦
Reposted by Crystal Lewis
bakerdphd.bsky.social
"Deloitte Australia will issue a partial refund to the federal government after admitting that artificial intelligence had been used in the creation of a $440,000 report littered with errors including three nonexistent academic references and a made-up quote from a Federal Court judgement."
cghlewis.bsky.social
You know you're watching something from the 90s when you hear the term "The Net".
cghlewis.bsky.social
Prioritize documentation that has the biggest ROI for you, integrate documentation into your project workflow (assigning team members as responsible for it and setting aside times to update it), and also automate what you can (for instance versioning).
cghlewis.bsky.social
I think teams know it takes time and they struggle to keep up with it. Also, some teams are just unsure how to get started with this type of documentation.
cghlewis.bsky.social
The questions you ask are dependent on the data and the issues you run into. If you don't want to slow down a workflow, make sure you obtain all the documentation necessary to allow you to understand data lineage. Otherwise, be prepared to start asking those questions. :)
cghlewis.bsky.social
If you want to be a good data manager, you have to get really comfortable with asking a lot of questions. When something is unclear or doesn't seem right, you can't settle or make assumptions. That's how you end up with bad data. Stay curious.