Ryan Morton
mortonanalytics.bsky.social
Ryan Morton
@mortonanalytics.bsky.social
I deadlift, and I know stuff
I could do a short tutorial for a small group (?)
October 21, 2025 at 5:42 PM
I think that depends on the service and model, but I hear ya
September 15, 2025 at 10:10 PM
Do you agree? Should a data engineer, analyst, machine learning engineer, etc consult with a subject matter expert before deciding on what to do with outlier or missing data?

How do we all think Gen AI solutions handle this issue?
September 2, 2025 at 4:24 PM
I'm often asked by prospective clients how I handle outlier or missing data. Among other things, I always mention that I consult with subject matter experts prior to a final decision on imputation, filtering, floor/ceiling, or anything else. It's not always received well.
September 2, 2025 at 4:24 PM
The challenge: getting the data models correct so everyone can use it. I prefer one column = one data element, one row = one observation in one big table or star schema.
August 28, 2025 at 5:57 PM
What do you wish you knew before you started?
What makes a scripting language good for pre-processing data before handing off the spreadsheet?
Did coworkers embrace the output as part of their own workflow?

I want to know!
August 27, 2025 at 9:58 PM
I think the best part is that the GitHub Actions could be integrated with the rest of the repo. No additional integrations or vendors.

So, 8/10 for GitHub Actions on a very small data ETL. If you've used GitHub Actions for ETL, what did you think?
August 25, 2025 at 9:23 PM
We then set up 4 scripts to run in a sequence using simple YAML and GitHub secrets for username/password, database URL, and other items. Each script printed its own log, but I didn't see any way to export the logs - may need to research.
August 25, 2025 at 9:23 PM
Forcing the enterprise subscription rather than just paying for the service is probably the biggest down-size to me.
August 25, 2025 at 9:23 PM
To whitelist the IP address for our database, we had to pay for an enterprise account in order to set up a runner with a dedicated IP address. This ensured basic level of firewall security for the database.
August 25, 2025 at 9:23 PM
GitHub Actions enable scheduled and triggered processes to run GitHub servers (runners). I recently used this to schedule ETL processing for vessel location data that updated every hour for 6 vessels - very small data that did not need a large setup.
August 25, 2025 at 9:23 PM
4. Want fast, reliable data analysis pipelines? Inquire within!
August 22, 2025 at 5:35 PM
3. Lesson I learned from building out a report using software. Repeatable is more manageable. Writing functions that could be reused with parameters made modifications, additional requests from the client, and consistency 𝗳𝗮𝘀𝘁𝗲𝗿, 𝗰𝗵𝗲𝗮𝗽𝗲𝗿, 𝗮𝗻𝗱 𝗯𝗲𝘁𝘁𝗲𝗿.
August 22, 2025 at 5:35 PM
2. How to get a robust analysis from US Census in under 40 hrs? Use repeatable processes for API data requests, data pre-processing, charts, maps, and tables.
August 22, 2025 at 5:35 PM
⚬ I used R's purrr library to programmatically build a 10 year data set in under an hour for 20 different concepts and over 100 variables.
⚬ Wrapped the report in a Quarto markdown document for repeatable document production.
August 22, 2025 at 5:35 PM
1. How I achieved a complete demographic and economic time series from the US Census American Community Survey in under 40 hrs?
⚬ I used the tidycensus R library as wrapper to the Census API.
August 22, 2025 at 5:35 PM