Lightnews — Scholar-powered news

Ryan Morton

@mortonanalytics.bsky.social

I could do a short tutorial for a small group (?)

October 21, 2025 at 5:42 PM

Ryan Morton

@mortonanalytics.bsky.social

I think that depends on the service and model, but I hear ya

September 15, 2025 at 10:10 PM

Ryan Morton

@mortonanalytics.bsky.social

Do you agree? Should a data engineer, analyst, machine learning engineer, etc consult with a subject matter expert before deciding on what to do with outlier or missing data?

How do we all think Gen AI solutions handle this issue?

September 2, 2025 at 4:24 PM

Ryan Morton

@mortonanalytics.bsky.social

I'm often asked by prospective clients how I handle outlier or missing data. Among other things, I always mention that I consult with subject matter experts prior to a final decision on imputation, filtering, floor/ceiling, or anything else. It's not always received well.

September 2, 2025 at 4:24 PM

Ryan Morton

@mortonanalytics.bsky.social

The challenge: getting the data models correct so everyone can use it. I prefer one column = one data element, one row = one observation in one big table or star schema.

August 28, 2025 at 5:57 PM

Ryan Morton

@mortonanalytics.bsky.social

What do you wish you knew before you started?
What makes a scripting language good for pre-processing data before handing off the spreadsheet?
Did coworkers embrace the output as part of their own workflow?

I want to know!

August 27, 2025 at 9:58 PM

Ryan Morton

@mortonanalytics.bsky.social

I think the best part is that the GitHub Actions could be integrated with the rest of the repo. No additional integrations or vendors.

So, 8/10 for GitHub Actions on a very small data ETL. If you've used GitHub Actions for ETL, what did you think?

August 25, 2025 at 9:23 PM

Ryan Morton

@mortonanalytics.bsky.social

We then set up 4 scripts to run in a sequence using simple YAML and GitHub secrets for username/password, database URL, and other items. Each script printed its own log, but I didn't see any way to export the logs - may need to research.

August 25, 2025 at 9:23 PM

Ryan Morton

@mortonanalytics.bsky.social

Forcing the enterprise subscription rather than just paying for the service is probably the biggest down-size to me.

August 25, 2025 at 9:23 PM

Ryan Morton

@mortonanalytics.bsky.social

To whitelist the IP address for our database, we had to pay for an enterprise account in order to set up a runner with a dedicated IP address. This ensured basic level of firewall security for the database.

August 25, 2025 at 9:23 PM

Ryan Morton

@mortonanalytics.bsky.social

GitHub Actions enable scheduled and triggered processes to run GitHub servers (runners). I recently used this to schedule ETL processing for vessel location data that updated every hour for 6 vessels - very small data that did not need a large setup.

August 25, 2025 at 9:23 PM

Ryan Morton

@mortonanalytics.bsky.social

4. Want fast, reliable data analysis pipelines? Inquire within!

August 22, 2025 at 5:35 PM

Ryan Morton

@mortonanalytics.bsky.social

3. Lesson I learned from building out a report using software. Repeatable is more manageable. Writing functions that could be reused with parameters made modifications, additional requests from the client, and consistency 𝗳𝗮𝘀𝘁𝗲𝗿, 𝗰𝗵𝗲𝗮𝗽𝗲𝗿, 𝗮𝗻𝗱 𝗯𝗲𝘁𝘁𝗲𝗿.

August 22, 2025 at 5:35 PM

Ryan Morton

@mortonanalytics.bsky.social

2. How to get a robust analysis from US Census in under 40 hrs? Use repeatable processes for API data requests, data pre-processing, charts, maps, and tables.

August 22, 2025 at 5:35 PM

Ryan Morton

@mortonanalytics.bsky.social

⚬ I used R's purrr library to programmatically build a 10 year data set in under an hour for 20 different concepts and over 100 variables.
⚬ Wrapped the report in a Quarto markdown document for repeatable document production.

August 22, 2025 at 5:35 PM

Ryan Morton

@mortonanalytics.bsky.social

1. How I achieved a complete demographic and economic time series from the US Census American Community Survey in under 40 hrs?
⚬ I used the tidycensus R library as wrapper to the Census API.

August 22, 2025 at 5:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news