- ✓Real data projects rarely begin with a clean, well-structured data set: data cleaning and preparation, sometimes called data wrangling, typically accounts for 60 to 80 per cent of the time spent on a data analysis project.
- ✓A structured analytical workflow, moving from problem definition through data collection, cleaning, exploration, analysis, visualisation and communication, provides a reliable framework for tackling any data challenge.
- ✓Exploratory data analysis (EDA) is the process of getting to know a new data set by examining its structure, distributions, outliers and potential data quality issues before applying more sophisticated analytical techniques.
- ✓The framing of analytical findings for a business audience is as important as the quality of the analysis itself: the most rigorous analysis has no value if the people who need to act on it cannot understand or trust it.
- ✓Building a portfolio of completed data projects, even from self-directed practice using publicly available data sets, is one of the most effective ways to demonstrate your capabilities to potential employers in the data field.
Listen to the full episode inside the course. Enrol to access all 80 episodes, plus assignments, tutor support and Student Finance funding.
Start learning →Alex: Hello and welcome back to The Study Podcast. Today we're closing out Unit 5 with a practical end-to-end walkthrough of working with data, from raw data through to insight. Sam, why is this kind of practical synthesis so valuable?
Sam: Because the individual techniques we've covered in previous lessons only really come alive when you see how they fit together in a real workflow. There's a significant gap between knowing what a scatter plot is and knowing when to use one in the context of a real analytical challenge. This lesson is about bridging that gap.
Alex: So let's say we have a data set. What's the very first thing you do?
Sam: Before you do anything analytical, you need to understand the data. What is it measuring, where did it come from, when was it collected, what does each field mean, are there known quality issues? This exploratory phase is where you get familiar enough with the data to ask sensible questions of it. It's tempting to skip this and dive straight into analysis, but that usually leads to time-wasting mistakes later.
Alex: And then the cleaning phase.
Sam: Data cleaning is unglamorous but essential. In a typical real-world data set you'll find missing values in some records, duplicates, values that are clearly errors, text fields with inconsistent capitalisation or spelling, dates in multiple formats. Each of these needs to be handled before you can trust your analysis. You need to decide, for each issue, whether to correct it, remove the affected records, impute a value or flag it as uncertain. These decisions need to be documented so they can be reviewed and replicated.
Alex: Once you have clean data, what does the analytical phase look like?
Sam: It starts with exploratory data analysis: getting a feel for the distributions of key variables, looking for correlations, identifying outliers and starting to form hypotheses about what the data might tell you. Then you move into more focused analysis: applying the specific statistical or visualisation techniques that are most appropriate for the questions you're trying to answer. And then you iterate: the results of one analysis often prompt new questions that require different approaches.
Alex: And the communication phase at the end is often undervalued, isn't it?
Sam: Enormously undervalued. You can do brilliant analysis and have it completely ignored because you communicated it poorly. Good data communication starts with understanding your audience: what decisions do they need to make, what do they already know, what level of technical detail can they absorb? Then you build your narrative: what is the key insight, what evidence supports it, what are the caveats and limitations? The visualisations you choose should serve that narrative, not just display all the analysis you did.
Alex: Any advice on developing these skills practically?
Sam: Get your hands on real data sets and practise. There are excellent freely available data sets on platforms like Kaggle, the UK government's open data portal and the ONS. Set yourself a question, gather and clean the relevant data, analyse it and build a visualisation or report of your findings. Do this regularly and you'll develop the practical intuition that no amount of reading can substitute for.
Alex: Brilliant advice to close out Unit 5. We'll move into Unit 6 on cloud fundamentals in our next lesson. Thanks, Sam.