Saving Files from Google Colab to Google Cloud Storage
Von Eric Antoine ScuccimarraI have previously written about Google CoLab which is a way to access Nvidia K80 GPUs for free, but only for 12 hours at a time. After a few months of using Google Cloud instances with GPUs I have run up a substantial bill and have reverted to using CoLab whenever possible. The main problem with CoLab is that the instance is terminated after 12 hours taking all files with it, so in order to use them you need to save your files somewhere.
Until recently I had been saving my files to Google Drive with this method, but while it is easy to save files to Drive it is much more difficult to read them back. As far as I can tell, in order to do this with the API you need to get the file id from Drive and even then it is not so straightforward to upload the files to CoLab. To deal with this I had been uploading files that needed to be accessed often to an AWS S3 bucket and then downloading them to CoLab with wget, which works fine, but there is a much simpler way to do the same thing by using Google Cloud Storage instead of S3.
First you need to authenticate CoLab to your Google account with:
from google.colab import auth
auth.authenticate_user()
Once this is done you need to set your project and bucket name and then update the gcloud config.
project_id = [project_name]
bucket_name = [bucket_name]
!gcloud config set project {project_id}
After this has been done files can simply and quickly be upload or downloaded from the bucket with the following simple commands:
# download
!gsutil cp gs://{bucket_name}/foo.bar ./foo.bar
# upload
!gsutil cp ./foo.bar gs://{bucket_name}/foo.bar
I actually have been adding the line to upload the weights to GCS to my training code so it is automatically uploaded every couple epochs, which removes the need for me to manually back them up periodically throughout the day.
Etiketten: coding, python, machine_learning, google, google_cloud