One Must-Have Skill for aspiring data scientists

Hey friends!

When looking at job posts for data science-related positions, you are going to come across requisites like:

  • Expertise in GIT and the data science Python stack.
  • Familiarity with collaborative tools (such as Git)

Nowadays, many data science roles consider version control an essential skill.

And there are two main reasons why version control is essential:

  1. For tracking changes or reverting files if you accidentally overwrite something important.
  2. For smooth collaborations.

Currently, Git is the most widely used version control system out there.

Given its importance, let’s examine some of the most critical things you need to know about version control with Git.

What is Git, and how can it help with your daily tasks or work?

In short, Git acts like an unlimited Ctrl+Z or Undo for your documents, folders, or computer files.

It means that Git allows you to go back to previous versions of a document.

And if you are wondering what a “git repository” is. Well, it refers to your group of designated computer files that can have the magic of unlimited Undo.

And just to mention one example of use. With Git, we can save hours of work if we need to get back to that paragraph or piece of information we deleted some months ago. It also means that you can avoid having several copies of the same file with minor changes.

Do the file names “finalversion1,” “finalversion2,” “realfinalversion,” etc., sound familiar to you?

Well, with Git, those file names will be unnecessary, and now you can have just one file or folder together with all its history of changes.

Git vs GitHub

If you are wondering, Git and GitHub are not the same.

Git is an open-source control system (or unlimited Undo). You can think of it as a time machine for your files.

Meanwhile, GitHub is a website that focuses on sharing and hosting repositories.

GitHub is a platform built on top of Git that lets you share your code and projects with the world (or just your team). And this is gold for collaboration!

Another major difference is that Git is meant to be downloaded and installed on your computer, while GitHub offers a web interface.

Here is a table summary:

Some other advantages

There are other significant advantages of using Git and GitHub for your work or even your job search:

  1. A well-maintained GitHub profile showcasing your data science projects with clean code and version control demonstrates professionalism and proficiency to potential employers.
  2. Git integrates seamlessly with popular cloud platforms used in data science, like Google Cloud Platform (GCP) and Amazon Web Services (AWS). This makes it easier to manage and deploy projects.

Where to learn Git and GitHub?

If you are wondering how to get started with Git and GitHub, my recommendation is to audit this Coursera course by IBM, which covers fundamentals like creating a repository and using Git commands.

You can find the link to the course below.

Getting Started with Git and GitHub by IBM

Have a great week!

Lina Marieth xx


What I’m reading

Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) showcase impressive capabilities but they face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. In this context, retrieval-augmented Generation (RAG) emerges as a cutting-edge solution for improving the accuracy and credibility of the generation, particularly for knowledge-intensive tasks.

Read the full text.

Quote of the week

Those who wait, wait. Waiting for anything is a dangerous game because there is no guarantee that conditions will ever be just right. You won´t regret the things you tried and failed at. But you will regret a life spent waiting. Those who wait, wait. You have a life to live.

From the book I wish I knew by Dona Ashworth.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top