How to deal with large dataset using Kaggle while working on Google Colab

Bhadresh Savani
3 min readAug 20, 2019

While working on a Machine Learning project sometimes the dataset is really big. The problem while using it on Colab is that you need to upload the dataset on Colab and when you restart the kernel, it will be gone. You need to upload it again.

One solution to deal with this problem is to upload it on Google Drive and access it in Colab but again Google Drive has a storage limit of 15GB for normal users.

We have one more beautiful solution. We will upload it on Kaggle as a dataset and access it on Colab using Kaggle API.

Dataset transfer From Kaggle to Colab

Here are the Steps for using Kaggle Dataset on Google Colab,

  1. Download Kaggle.JSON: For using Kaggle Dataset, we need Kaggle API Key. After Signing in to the Kaggle click on the My Account in the User Profile Section. In the API Section click on the “ Create New API Token” link, It will download kaggle.json file which consists of the detail of API key
You might see the Create New API Token link in the image

2. Upload Kaggle.json file in Colab Notebook

kaggle.json is uploaded

3. Install the Kaggle API client.

!pip install -q kaggle

4. The Kaggle API client expects this file to be in ~/.kaggle, so we need to move it there.

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

5. change permissions to avoids a warning on Kaggle tool startup.

!chmod 600 ~/.kaggle/kaggle.json

6. Get Dataset Path and Store it in the ‘dataset_path’ Variable, For that go to the https://www.kaggle.com/datasets, and click on your dataset, I am using ‘goodreadsbooks’ dataset uploaded by ‘jealousleopard’ so my dataset path will be ‘jealousleopard/goodreadsbooks’

dataset_path="jealousleopard/goodreadsbooks"
Path in the URL

7. Copy the dataset set locally.

!kaggle datasets download -d dataset_path

That’s it, Now we can use this dataset in Colab

Here is the Demo Notebook Link of the above code,

Feel free to clap if you find it helpful!

Follow my telegram channel to get awesome blogs, projects, and learning opportunities for Python, Machine Learning, and Data Science Stuff.

Keep Learning!!

--

--