Taking you behind the scenes of Portfolio creation 🤸‍♀️

Portfolios for Data Science

You get a portfolio, and you get a portfolio. Everybody gets a portfolio!

That’s what I imagine when I picture myself building portfolios with amazing women who have been working hard to learn new data skills but are unsure how to create a portfolio.

So what better than sharing today a Data Science portfolio 101 (or a mini introduction) and also how I would tackle the process of creating a portfolio for getting a job or internship in Data Science or AI?

I think it’s much fun to share with you the ups and downs of creating a portfolio from scratch.

First the first, what is a Data Science portfolio?

I know, I know, it may be obvious, but sometimes, starting with the most essential piece of information is the best path to understanding the end result and the required steps to get there.

So, a DS portfolio is a collection of all your shiny skills and the things you feel the most proud of have achieved.

It is the one place for anyone who wants to get to know you better (professionally speaking).

If someone in the DS realm asks you about your skills and the tools you master, then just glancing at your portfolio should answer their questions.

And that led me to the next question.

What should be included in a portfolio?

A portfolio could include all your data projects, but realistically, almost no one will take hours to review an extensive list of projects to understand whether your skills would be valuable for the company or if you’re the right person for the job.

So, a portfolio should include the projects you’re the most proud of, the projects that highlight your skills in the best possible way.

What is a good project for a Portfolio?

We can include thousands (if not millions) of projects in the “good portfolios” category.

Almost any project has the potential to be a good addition to your portfolio. What makes the difference between a project that’s worth adding to your portfolio is the story, the algorithms, the data source, the plots, and the interesting results.

Am I exploring the data and extracting the best possible insights, or did I make a superficial exploration that led to halfway conclusions?

Did I make beautiful, self-explanatory graphs or put together 20 graphs that almost no one would understand unless I stood there to explain them one by one?

Short, clear, and at one-glance results are a must in our current fast-paced world.

As I wrote in a previous post (you can read it in full here):

You can take a simple data set and create something remarkable.

Because looking for the golden egg (in terms of datasets) will only make us procrastinate and add unnecessary pressure to our project, in other words, an extensive dataset won’t create an outstanding portfolio.

Have you read a beautiful poem, a short description, or a fascinating explanation of something? What do they have in common?

They are easy to read and understand and flow seamlessly to the end result without you even noticing it.

Creating something simple involves a deep understanding, and that’s why when we lack clarity, we usually end up overcomplicating things or adding unnecessary layers. (I’m guilty of this, and you can read more about it here.)

What is currently missing in the DS portfolios sphere?

The most important piece in getting into DS, and, creating an outstanding portfolio is what I wrote some months ago (read in full here):

“I learned that we need programs that not only teach technical skills but also boost our self-confidence and teach us to speak up for ourselves and defend our points of view.

What is missing is learning how to be so confident that it’s obvious we should apply to that interesting job, EVEN when we don’t meet ALL the requirements.

What is missing is learning how to act and feel confident even AFTER learning all the technical skills.

What is missing is to be heard. What is missing is to embrace our emotions and remember that we are not just logical beings but also beings of heart, love, and compassion.

And I know this is, and will be, my most valuable contribution to the world: Empowering women with technical skills and self-confidence to get into DS.”

Now let’s see the behind-the-scenes. And for the sake of clarity let’s imagine I’m looking for a job in Data Science.

What is my first step in creating a portfolio?

Creating a portfolio is the stepping stone to getting into Data Science jobs or internships.

And the first thing for me is to be clear about the industry (or industries) in which I would love to get a job.

That is because the topic and skills in my portfolio should ideally align with the industry in which I’m planning to get a job.

Let’s say I’m planning to get a job in an industry focused on forecasting. Then, creating a project that includes ML skills would make a lot of sense.

This first step helped me to define the skills, tools, and libraries I should use and gave me an overall idea of the projects I should work on.

This step aims to get to something like: I really want to get a job working with Natural Language Processing, and my desired industry is healthcare. (I’m assuming I know NLP and have some experience with the healthcare industry).

The second step: the project idea and data sets

Now comes a fun moment of exploring and evaluating the potential of my project ideas. I would explore datasets online and see if any spark my curiosity. The ideal scenario here is to work with a dataset that makes me feel excited and interested in the insights I can get.

I see it as a treasure hunt of a puzzle. We start with a lot of data, and step by step, we organize it so we can see clearly what is “hidden.”

I also consider information that I can scrape. This approach has the added benefit of knowing that there aren’t many other projects using my specific dataset because I scrapped it (although there is a probability that there are).

Also, in this step, I consider how much time the project would take me and check if I can manage to invest that time to finish the project. This last is very important because, talking from experience, for me, there is nothing worse than starting a project, investing several hours, and then leaving it unfinished.

At the end of this step, I’m looking for something like a data set that sounds very interesting and intriguing. Of course, I would need data in text form because I want to use NLP. Also, knowing the data roughly helps me define the possible insights and plots I can create. But keep in mind that at this stage, it is still a rough idea.

I can add many more details to this step. Let me know if you want to hear more about choosing or scrapping datasets and planning your projects.

Third step: exploring the data and extracting insights

This step will take the majority of my time.

First, cleaning and handling missing data.

Then, explore the possible graphs and the story unveiled with the data.

This will take time, but it will be worth it!

In this step, I focus on adding as much of my knowledge and skills as I can. Also, this is the moment where keeping up with new python libraries will be a valuable asset.

For example, I recently discovered a Python library called Anywidget that simplifies creating and publishing custom Jupyter Widgets.

So, I started to explore the many options available and got particularly excited to try the statistical visualization library called Vega-Altair:

Source: Vega-Altair https://github.com/vega/altair

Isn’t that amazing?

Fourth step: Presenting the results

The last step is to present the results cohesively. At this stage in the process, I may have many graphs, many results, and some insights. But I must be sure that with my results, I’m sharing a story that is understandable and makes sense.

So, I start with a draft of my results. I select plots that are very clear and speak for themselves. If a plot requires a long explanation to be understood, then I don’t use it. I focus on finding a way of sharing the insights without a long explanation. That’s because nowadays, almost anyone will read a project with many pages and text.

Final thoughts

I hope this step-by-step process can help you clear your doubts about creating a portfolio.

And please keep in mind that I’m sharing a simplification of the process. But no worries. Let me know in the comments if you would like to get more details on any of these steps, and I’ll do my best to share more in future post.

Thanks for reading! I’d love to hear your thoughts. Also, do you want to see more of the behind the scenes?

4 thoughts on “Taking you behind the scenes of Portfolio creation 🤸‍♀️”

  1. Hello Lina!
    I am currently reading through all your posts as I am interested in switching careers to data science—I am very much still in the beginning stages of developing different technical skills, but it is so exciting already. Your texts are all packed with great insights into the field, but this post in particular is incredibly helpful! So thank you for making data science a bit more accessible to all the women who wish to embark in this journey!
    Nevertheless, I was wondering whether you could expand on step 2 on how to find datasets or scrape them. I have a few datasets I have worked with—mostly regarding the current state of European democracies—but once I step outside of what I already know, I don’t even know where to begin looking. I am also a bit lost on how to best evaluate whether a dataset will be useful for projects for my portfolio. Any pointers you could give us?
    Again, thank you for all your work!

    1. Hello Area!
      Thanks a lot for your kind words! It means a lot you’re finding the posts helpful for switching careers.
      And even though you are just in the beginning stages of switching to DS, you’re definitely on the right track feeling excited already 🙂

      Regarding your question about step 2, starting with datasets on familiar topics is a good idea so you can leverage some of your skills/knowledge from other areas. For example, suppose from your previous studies you know already about European democracies. In that case, starting with that topic makes a lot of sense because, in that way you can think of information that could be insightful or interesting in that area. Basically, it means you already know the interesting questions to ask in that area. Try to use your advantage in your areas of knowledge that aren’t related to data science.

      Then, to step outside what you already know, my best suggestion is to first be clear about the area of data science you want to focus on.
      That will allow you to, for example, focus on getting images if you’re planning to create a project with Computer Vision or to get some data in text form if you want to explore NLP.
      A good source of datasets for beginners is Kaggle, so you can start there if you aren’t already. Then, when you move a few steps up, think about scrapping some information. But I suggest getting some experience with already available datasets before embarking on this task for several reasons: due to AI training, there are a lot of restrictions about what you can’t or can scrape; also, it could be time-consuming if you aren’t very familiar with scrapping.

      To evaluate if a dataset will be useful for projects for your portfolio, in the beginning, think first about the libraries and tools you want to use/learn.
      To give you an example, if you want to learn more about Machine Learning and train a model, then you can focus on finding datasets with a high number of rows, in numeric format or preferably categorical, and with relatively low missing information (one can dream ;)) in a topic that is familiar to you. Next, start with the training, validation, and test. Then, consider applying the same process to datasets in another topic to expand your knowledge.

      But overall, I like to always highlight that learning about DS and building a portfolio is a process that will take time. But with every step, you get closer and closer to your goals.
      Area, my best wishes to you in switching your career to DS 😀

      1. Hello Lina,

        Many thanks for your very detailed response! Once again, your insight has made me grow excited for the prospect of continuing to hone my skills in data science. I will follow your advice as I build my portfolio.

        Thank you again and have a wonderful week!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top