HomeGuidesChangelog
GuidesDiscussionChangelogLog In

Directly Access Files in Gigantum Projects or Datasets

All data is stored on disk so you access materials without using Gigantum

If you want to get important files out of Gigantum, you can directly access your code, environment and data in a Gigantum Project or Dataset that is stored locally. You just need to know where to look in the local file system.

Getting files out of a Gigantum Project

📘

You must import Projects locally in order to have access to the files.

Remember to first import them from within the Client or download them directly from the Hub. You can see more on importing Projects from the Hub here.

For Gigantum Projects on a local machine, you can access the code, environment and data directly on the file system. Everything held locally on disk is contained in the Gigantum working directory, which is typically the gigantum folder in your user directory (e.g. /Users/<username>/gigantum on Mac, C:/Users/<username>/gigantum on Windows). The working directory has a fixed structure that lets you find the Project you need.

To find Project files on disk you just need the following identifiers:

  • The name of the Project
  • The server ID for the storage hub (for most users this will be gigantum-com). If using a Team Server, it will be a unique identifier)
  • The username of the account you are logged into
  • The username of the owner that created the Project

The path to the desired project is then ~/gigantum/servers/<server_id>/<logged_in_username>/<owner_username>/labbooks/<project_name>.

In the screenshot below you can see a specific example (on Windows) for the following identifiers:

  • Project name model-tracking-example-1
  • Server ID gigantum-com
  • Logged in username tylercasablanca
  • Project owner username tylercasablanca.
Example Project contentsExample Project contents

Example Project contents

Once you are in the Project folder, you can access all of the files in Project. Checkout the Project Structure section for more information on how data is organized.

Accessing Environment Configuration

You will find the environment configuration data in the .gigantum/env subfolder of the Project. This folder has a representation of the environment configuration, and when a Project is imported or the environment is changed using the Client, the Dockerfile is rendered.

You can build and run this Dockerfile once it has been rendered as long as you include all of the required volumes and environment variables. Review the Manually Run a Project section if you wish to run a Project as-is outside of Gigantum.

You can see the contents of this subfolder in the screenshot below.

Example environment configurationExample environment configuration

Example environment configuration

Copying Base Images

To keep building a Project without Gigantum managed and hosted base images, you should copy the image into an image repository that you control and update your Dockerfile.

First, look at top of your Project's Dockerfile as shown in the section above. Make note of base image that is specified by the line that starts with FROM (e.g. gigantum/python3-minimal:f415d5dff3-2021-04-29).

Make sure this image exists locally using:

docker pull gigantum/<base_image>:<tag>
docker pull gigantum/python3-minimal:f415d5dff3-2021-04-29

Push the image to your own Docker Hub account. In the example below, let's assume your Docker Hub username is "user123".

docker image tag gigantum/<base-image>:<tag> <your-user>/<base-image>:<tag>
docker push <your-user>/<base-image>:<tag>
docker image tag gigantum/python3-minimal:f415d5dff3-2021-04-29 user123/python3-minimal:f415d5dff3-2021-04-29
docker push user123/python3-minimal:f415d5dff3-2021-04-29

Finally, update the FROM instruction to point to your image instead of the Gigantum image.

Getting files out of a Gigantum Dataset

📘

Make sure the files for the Dataset have been downloaded

Datasets handle files differently than Projects. If you have imported a Dataset in the Client, you will still need to make sure that that the files are downloaded and present on disc. You can see more about downloading Datasets here.

Gigantum Datasets store files differently to increase the storage efficiency of large data. After importing your Dataset, you must also download all of the files via the Client UI, which will create a directory that has all of the files for Dataset at the current version. You can then copy the files out of this directory as needed.

To access the files you need to know some simple identifiers:

  • The name of the Dataset
  • The server ID that was used as the storage hub (for most users this will be gigantum-com. If using a Team Server, it will be a unique identifier)
  • The username of the account you are logged into
  • The username of the owner that created the Dataset

The path to the desired Dataset's files will then be ~/gigantum/.labmanager/datasets/<server_id>/<logged_in_username>/<owner_username>/<dataset_name>.

In this directory you will see 2 or more folders. One folder will be called objects and the others will look like long random hashes, which are the git repository hash of the Dataset at different versions. If you are using the Dataset at different versions there may be more than one hash directory. Inside these directories you will find all of the files.

In the screenshot below you can see a specific example for the following identifiers:

  • Dataset name hymenoptera-data
  • Server ID gigantum-com
  • Logged in username tylercasablanca
  • Dataset owner username tinydav
The files are found in the folder with the hash as its name.The files are found in the folder with the hash as its name.

The files are found in the folder with the hash as its name.