Jay_the_analyst
brosjay.bsky.social
Jay_the_analyst
@brosjay.bsky.social
Data Scientist | Chelsea Fan | Sports Lover | Music
That's it.

Implement these changes and continuously keep an eye out for new features/ways to improve the efficiency of your dashboards.

It benefits both you and your users.

If you enjoyed this follow me:

@brosjay.bsky.social

for more data-related content.
September 29, 2023 at 6:09 PM
TL;DR:

Start with a clear purpose, consolidate tables, optimise relationships and DAX measures, and prioritise the user experience.

Regularly review and refine your model to keep it in top shape.
September 29, 2023 at 6:09 PM
7 - Prioritise user experience.

Make sure your report is intuitive for users.

Organize visuals logically, provide clear instructions, and use bookmarks or tooltips for guided navigation.

A user-friendly report reduces confusion.
September 29, 2023 at 6:08 PM
6 - Don't forget about it post-deployment.

As your report evolves, revisit the data model.

Remove unused tables, columns, or measures.

Refactor DAX code for clarity and efficiency.

Keeping your model tidy is an ongoing process.
September 29, 2023 at 6:08 PM
5 - Streamline DAX measures.

Simplify DAX measures by breaking complex calculations into smaller, reusable parts.

Organise them into groups for easier management.

Keep measures concise and named appropriately for easy understanding.
September 29, 2023 at 6:07 PM
4 - Optimise relationships.

↔️ Ensure relationships are simple and logical.

Avoid creating circular relationships or overly complex chains.

Stick to one-to-many and many-to-one relationships wherever possible.
September 29, 2023 at 6:07 PM
3 - Use calculated columns sparingly.

While calculated columns can be useful, excessive use can slow down your model.

Reserve them for essential calculations; move the rest to measures or even source queries for better performance.
September 29, 2023 at 6:07 PM
2 - Limit the number of tables.

Avoid creating too many tables.

Consolidate similar data into fewer tables to simplify the model.

Fewer tables mean easier navigation and less overhead.
September 29, 2023 at 6:07 PM
1 - Start with a clear purpose.

Define the key objectives of your report.

Knowing what you want to achieve helps you focus on the necessary data and relationships, avoiding unnecessary complexity.
September 29, 2023 at 6:06 PM
Thank you for your time.

If you got something from this thread, consider following me: @brosjay.bsky.social

I post data-related content daily.
September 27, 2023 at 4:28 PM
And there you have it - my data cleaning 101.

TL;DR:

- Handle duplicates and missing data
- Fix naming conventions in values and columns.
- Validate and test - does it make sense/look right?
September 27, 2023 at 4:25 PM
Finally, validate and test.

This is the most crucial part.

Do some test aggregations, and visualisations and ask yourself:

Does the data make sense?
Does it prove/disprove your theory?
Is there anything that doesn't "feel" right when observing?
September 27, 2023 at 4:25 PM
Missing data is the next thing to look for.

Why?

Because to perform any calculations or grouping having missing data can cause complications.

It's important to understand how you want to handle missing data.

You could remove them or fill them with a value.
September 27, 2023 at 4:24 PM
Investigate outliers.

Sometimes outliers give us a crucial bit of info to base our work on.

Sometimes they are truly an anomaly (something odd or different to what's expected).

But it's important to work out whether it's valid or not, and then deal with it appropriately.
September 27, 2023 at 4:24 PM
Next, fix any naming conventions, typos, or incorrect capitalisation.

Consistency is key here.

You might have 0 and "Zero" - it's worth fixing and grouping these values correctly.

This applies to column names as well as values - make it easy for anyone to understand.
September 27, 2023 at 4:24 PM
Check for duplicate or irrelevant values and remove them.

Duplicate values are most common when you combine datasets.

If you sell books and want to analyse how the fiction genre is performing, irrelevant data (for this project) would be non-fiction so they can be removed.
September 27, 2023 at 4:24 PM
There is no "one size fits all" when it comes to data cleaning.

But it is important to understand the different components so you can spot scenarios to apply the most relevant techniques.

Let's get stuck in...
September 27, 2023 at 4:23 PM
Well, datasets might have incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data.

Especially when working with more than one dataset (joining or merging), there are many opportunities for data to be duplicated or mislabelled.
September 27, 2023 at 4:23 PM
First, why do we even need to clean data anyway?

How can you have "dirty data"?
September 27, 2023 at 4:22 PM
September 8, 2023 at 11:37 PM
September 8, 2023 at 11:37 PM
September 8, 2023 at 11:35 PM