Cloud Hosted Notebook Showdown

Many of us have been enjoying Google Colab to share Jupyter Notebooks with our python code running on free cloud GPU compute from Google. Recently, AWS released SageMaker Studio Lab, its competitor service to Google Colab.

I dove into comparing Google Colab to Studio Lab and here is what I found.

What is a Cloud Hosted Notebook?

A cloud hosted python notebook is a new concept in the world of data science and machine learning. In order to fire up a Jupyter Notebook, you used to have to install jupyter on your own local machine and run jupyter notebook to fire up a notebook in your local browser.

Local notebook: Jupyter Notebook running on a local machine (cite)

The Problems with Local Notebooks

Local notebooks suffer from two limitations.

1) Local notebooks are not easily shareable. Jupyter Notebooks save as a huge file and are difficult to version and commit, so it is hard to share them via a normal code repository. This has gotten better over time, but still kind of sucks.

2) Most machine learning applications require hardware acceleration via a GPU, and your local machine likely does not have a GPU. To solve this issue, you can boot up a cloud instance, rebuild Jupyter up there and find a way to expose your Jupyter service to somewhere where you have a web browser. But that is quite a headache.

Enter Hosted Notebooks

To address the issue with local notebooks, Google Cloud released Colab, a notebook running on one of their cloud instances, where you can select CPU and GPU runtimes.

Hosted notebook: Google Colab, dark mode

Shortly thereafter, Paperspace introduced a competitor, Gradient Notebooks (outside the scope of this blog post).

Hosted notebook:Paperspace Gradient Jupyter Notebook

And now just recently, AWS released their competitor, Studio Lab.

Hosted notebook: SageMaker Studio Lab

It will be a race of future development to see which service will lead the space. For the time being, here is our comparison.

Colab vs Studio Lab: Hardware

Winner: Studio Lab

The number one thing we should care about when using a hosted notebook is the hardware that we get for free. If you are training a machine learning algorithm, your experiments and development will be bottlenecked by the hardware you are using.

On the Studio Lab free tier you get a Tesla T4.

On the Google colab free tier you get a Tesla P100 or Tesla K80.

We can look at purchase price on these GPUs and stop there.

Free hardware on Studio Labs is significantly more valuable (cite)

Colab vs Studio Lab: Shareability

Winner: Google Colab

One of the major improvements cloud hosted notebooks have over local notebooks is that you can easily share your code with others.

Google Colab excels in the shareability category. To share a notebook you can use the same sharing and authentication that you use with Google Drive.

Sharing code and execution results in Colab is a breeze

To share code in Studio Lab, you need to commit code to a repository and clone it when you are opening a notebook.

Opening a notebook file from a GitHub repository on Studio labs

Colab vs Studio Lab: Environment

Winner: Studio Lab

When you boot up a cloud notebook, the server underneath comes with a bunch of software installs, which you would need to configure from scratch if you had booted up a fresh instance.

Both Google Colab and Studio Lab come with NVIDIA drivers and related libraries installed. This saves you a lot of time, and it is not the kind of thing you will likely care to customize.

Google Colab comes with a number of additional machine learning libraries installed to your python libraries, like pytorch and tensorflow. You have to install these on your own in Studio Lab.

A huge advantage of Studio Lab is that it saves your project's machine image and spins that up for you, so you have a stable install base.

I personally have found the pre-installed Colab environment quite frustrating, because Google will often shift it underneath you. There is no versioning of Colab environments, they can change it at their will. However, this can be nice if you are picking something up quickly and just want to get started.

Colab vs Studio Labs: UI

Winner: Studio Lab

While you're programming in an IDE, the UI has a small effect at first, but over time you start to learn the nuances and full utility of the IDE.

Both Google Colab and Studio Lab have the Jupyter Notebook UI.

Google Colab UI
Studio Lab UI

In my experience, the Studio Lab is more slick and responsive. Google Colab can feel like it has some lag when you are interacting with UI elements and spinning up an instance underneath.

Colab vs Studio Labs: Resources

Winner: Studio Lab

Cloud notebooks are most widely used by students and practitioners who are learning.

Studio Labs was released with a bunch of resources on machine learning, including the Dive into Deep Learning course, which I would highly recommend.

Studio Lab deep learning content

When to Use SageMaker Studio Lab or Google Colab?

If you are working on a project where you want to get started quickly and share your work easily, you should look at using Google Colab.

If you are starting a longer project in data science or machine learning, you should look at using Studio Lab. You will be utilizing better hardware and working in a programming environment that you can tailor.

Good luck!