Guide to Portfolio Projects for Data Analysts

Dec 10

Portfolio projects are becoming more and more relevant as the field of data analytics matures and becomes more saturated. The field is becoming increasingly difficult for employers to navigate due to the number of certifications available (not to mention the varying amount of effort needed to acquire them). Considering all of this, portfolio projects are a fantastic way to stand out from the crowd and prove your skills.

But where do you start? Is the best route to scrape a bunch of data yourself using a custom Python script, spend the time to clean it and do some feature engineering, and then build a standalone app that an employer can play with?

Definitely not. Let me explain why.

Portfolio Projects have one of two distinct outcomes

Depending on where you are in your journey, a project should fully cover one of two use cases: learning or showing off your skills. While I’ve seen many people combine these two, I believe that does a disservice to both.

If it’s a learning project, you usually concentrate on one skill.

A great way to learn any skill is to combine structured learning, like a course, with a follow-up project to solidify the learning and expose any gaps in your knowledge. This is how I was taught math - a lecture and then a bunch of practice questions at the end. If I was confused then I could go back and spend more time learning that section. This has the additional effect of creating better connections between the knowledge and its application because you necessarily need to do both. Applying your knowledge makes it sink in better, increasing your retention.

Contrast this with a portfolio project you’re working on to try and land a job. In this scenario, you want to have a polished final product that you’re happy to show off. It should be brag-worthy just from looking at it.

Why shouldn’t these be the same project? Because a learning project should be messy and ugly. You should be trying things out and experimenting with different methods and ways of working with data. When you’ve learned how to do one specific thing, say building a bar chart, you don’t need to spend time making that look clean and pretty. I’ll go so far as to call that a waste of your time. Time that you could instead be spending on learning how to do another chart (as long as it isn’t a pie chart). A learning project that has 5 charts in it that are all functional and ugly is a great use of your time. That same project with only 1 beautiful chart is a waste of your time unless the skill you’re working on is making wonderful visualizations.

Projects to learn a skill

When learning a skill, projects are the bread and butter of learning. They force you to do everything yourself and they expose your knowledge gaps. They also put you in more of a real-world situation - you’re working with a dataset and need to transform it into your desired shape.

I strongly believe that learning projects should only cover 1-2 skills at a time, maximum. When you try to incorporate more skills, you do one of two things:

1. You stay at the surface level with each skill and fail to fully understand it and learn more deeply.

2. You do a good job of going deep on learning and it takes you 6 months to do a single project.

I built massive projects early in my learning journey and I wish I could go back and shake myself.

When you work on a project this large it will also mean that you’re spreading a lot of your learning out. Once you have your data in the correct format, you likely won’t go back and touch data gathering and cleaning again. There’s a good chance that you’ll forget a lot of what you worked on and what you learned.

This follows similar logic to having ugly projects - making them beautiful is a skill. Unless you’re working on that skill, you can safely leave most of it out.

Projects to show off a skill

Similar to learning projects, those to show off your skills should only focus on 1-2 skills at a time. If you have 1 mega project that shows off your prowess with SQL, Tableau, Python, web scraping, and so on, then explaining it will be harder. You won’t be able to go as deep into the skill you worked on without extensive documentation. I’ve also found it harder to remember the details of how I built a project as it gets larger and more complex.

In contrast, projects that focus on only 1 skill are easy for me to remember. Even if they get a bit larger, like an all-encompassing dashboard, I can still keep the details relatively straight and the few details I forget tend to be those that are less important anyway. Contrast this with a scenario where I completely forgot to mention that I used SQL in a project.

I also believe that you should use the cleanest data that you can find for these projects unless you’re specifically trying to work on data cleaning, feature engineering, learning to scrape data, or anything else of that sort. Otherwise, you’ll spend a significant amount of time cleaning your data. This is a waste of time because an interviewer doesn’t care that you scraped a bunch of data that you then cleaned yourself if you’re trying to show them that you’re an expert in Tableau. Mentioning the scraping and cleaning would take away from showcasing your Tableau skills, and doing the opposite of what you intend.

What if I want to scrape my own data though?

That’s awesome - go ahead and do it! Just be aware that it’s going to take much longer than you expect.

When I set out to scrape data for a project, I expected it to take me a month at most. I worked tirelessly on it and by the end of the month I was able to scrape some data, but then I realized that I needed more data for my analysis, sending me back to the scraping stage. On top of that, when I finally got all of my data scraped I had to go about cleaning it. Because this was my first time scraping data, I wasn’t thinking ahead and ended up doing my scraping in a way that was faster but meant messier data. I had loads of random brackets and other special characters in my data. Cleaning that up took me another few weeks alone. It ended up being much more involved than I expected and if I knew all of this going into the project then I wouldn’t have scraped my own data.

If you do decide to scrape your own data, set yourself limitations to avoid scope creep. Decide what the actual minimum data you need is and stick to that. I suggest setting a deadline if you want to use this to apply for jobs. You can thank me later.

Where should I showcase a portfolio?

In the place that is most easily accessed by everyone, especially non-technical people. The last thing you want is to put all of this effort into your wonderful projects and have recruiters and hiring managers get scared off because they’re intimidated by GitHub. Or worse, they try to figure out how to access your project only to get confused and frustrated - effectively throwing your application into the trash bin.

In my opinion, the best place to showcase a portfolio is on a personal blog or website. This allows you to have all the pieces in 1 place - your code, your final product/dashboard, and also a write-up of the project itself. As a general rule, any medium that can give you the flexibility to do all 3 things will work so use what you’re most comfortable with.

You also don’t need to go out and pay for your own domain. There are a number of different free options - you can use a subdomain in GitHub which is free, or you can just use a free domain in general. Using a service like freenom.com can also allow you to grab a domain with your name in it for free for a year. It just likely won’t be a dot com.

Where do I find my data?

If this is your first project, go with a place that has clean, well-documented data. My current recommendation is government websites that are created specifically for this purpose. In Canada, I point folks to Stats Canada, the Canadian database for government data. All the data here can be downloaded as a CSV and worked on from there. You can certainly use a dataset from Kaggle, but unless you do something extra interesting you may find it harder to stand out from the crowd with one of these.

If you want to work with a more complex dataset, combine a few of these! One that I recently worked through with a student was demographic data combined with income and housing data. We used this to see if some areas of Canada were building faster or slower than the population changed, and we used the income data to see if that had any effect. All 3 datasets were very straightforward and boring alone, but combined they were super interesting and gave us enough data to play around.

You can find other data to integrate in this same way. Weather data can be quite useful for a range of topics, including grocery sales, agricultural data, and even flight data. While many of these datasets would be somewhat boring on their own, joining them with a second one can bring a lot more intricacy!

If you get stumped trying to do this, send me a LinkedIn DM and I’ll give some suggestions for data to integrate and questions to try and answer.

How do I make this relevant to an actual job?

Treat it as if you’re doing this for a company! Identify what stakeholders you would have and what questions they would want to answer with this data. To be honest, I recommend identifying what questions you CAN answer with this data and then deciding who those stakeholders would be and what actions they could take as a result of your work. With projects like this, you’re often limited to what data you have and some questions might be out of reach no matter how hard you try.

If this seems silly, try going through the below thought exercise:

Think about two different scenarios in a job interview.

Scenario 1 - You built a cool project with some interesting data. You’ve been able to show off that you can do some fancy things and it’s impressive.

Scenario 2 - You used boring data but at the end of the project you explained what actions the business can take and a few ways to measure their impact. You even went so far as to give an estimate of the impact and backed that up.

Which would get a recruiter or hiring manager more excited? Which person would you want to hire of these two?

This isn’t to say you’ll immediately get a job offer, but it does show that you truly understand what the role of a data analyst is. In the current, saturated market, that is likely to get you somewhat of a leg up.

Bonus: How to enjoy working on projects

One of the constants that I’ve found when working on any sort of project is that following my curiosity brings more joy and a better outcome than trying to force it. In the end, looking for a dataset that you’ll be interested in brings about a much better outcome than just using any old one. Part of the reason is that as humans we get bored easily and when we’re bored we don’t try as hard or put as much effort in. Another part of this is more functional - working with data is hard without any context. When you go and learn a bit more about your data, or find a dataset in an area that you’re already excited about, you have additional knowledge to help you.

If you’re struggling to find your curiosity, try just focusing on one aspect of the dataset that could be interesting. For example, in the housing data project I talked about above, we were interested in where building was happening the most. We’re both around the age that buying a house is on our minds and we wanted to know if we could somehow gain a small edge in our understanding of the relationship between the variables we looked at. This isn’t to say that we had much interest in the rest of the data, but searching for an answer to this one area made the rest of the data more enjoyable to work with. Finding areas of excitement makes the whole process better!

If you learned from this and feel that you could benefit from my advice, feel free to go check out my Coaching & Resources page!

Dylan Deppiesse