Gigantum

Gigantum Documentation

You'll find comprehensive guides and documentation to help you start working with the Gigantum platform as quickly as possible. If you get stuck, then there is help for that too. Just reach out to the team or get help from the community using our our Spectrum channel.

Get Started    Changelog

Untracked Folders

Untracked folders don't get versioned or synced

In each of the input, code, and output directories is a directory called untracked. As of client v1.3.0, these directories will be automatically created if they don't already exist. Anything written in one of the directories - from inside a running project, or on your host OS - is ignored when versioning, detecting changes, and syncing.

There are several use cases for untracked directories outlined below, and many more we haven't listed. We'd love to hear of any interesting uses for untracked directories or if you have any questions via our Spectrum Chat forum.

Untracked folder located in each file browser widget

Large, Intermediate or Sensitive Data in Untracked Folders

You can use untracked folders to store large files that are too big to upload due to size restrictions meant to guarantee a project doesn't get too slow to use. Note that currently you will be limited when uploading through the browser, but you can always copy data in on your host.

Sometimes you want to write out intermediate data that you don't necessarily want to version and keep around. This is a great reason to write to untracked folders in input or output.

Finally, you might have sensitive data that you do not want synced and shared. If you place your data in the untracked folders they will not be versioned or shared. Remember, your collaborators will need to obtain files some other way and place them in the same location for your code to work! Also, you may want to read the Including Sensitive Information section that outlines a few more ways to manage sensitive data.

Mounting External Data via Untracked Folders

In some cases you may have your data on a network share or a large external drive. While there are multiple ways to get access to such data inside your project, a general approach that works on macOS or Linux is making a mount on the host in an untracked folder, for example, you could do one of the following from your project directory (projects are located in ~/gigantum/your-username/project-owner/labbooks/project-name). Note that bind-mounts are unfortunately only readily available on Linux:

cd ~/gigantum/your-username/project-owner/labbooks/project-name
mkdir input/untracked/my-huge-dataset
mount -o bind /mnt/my-huge-external-dataset input/untracked/my-huge-dataset
cd ~/gigantum/your-username/project-owner/labbooks/project-name
mkdir input/untracked/my-huge-dataset
mount -t nfs localhost:/my-huge-dataset input/untracked/my-huge-dataset

You could likewise create a mount in output/untracked if you were going to generate a large amount of output data. Once you've created such a link, the next time you launch your project, you'll be able to access those files. If you need to do something like this on Windows or want to discuss other alternatives, please drop us a line on Spectrum and we'll help you out! There are lots of different ways this could work which are not documented here.

For example, another option is to mount an entire volume into your untracked folder. This may be something you need to do in macOS, again depending on your setup. In this example, a USB drive is mounted into the input/untracked folder of a project.

sudo mount -t msdos /dev/disk2s1 output/untracked/some-dir

In any case, we recommend documenting what you did inside your README. Note that you will need to manage synchronizing data between machines, and the mounts will need to be created manually in each location that you use it, and after every reboot (or you can use /etc/fstab - but that's beyond the scope of this article).

Using External Git Repositories via Untracked Folders

Another use-case for untracked folders is if you want to checkout code from a git repository to use in your Project. Our automated versioning doesn't currently play nice with embedded Git repositories if you aren't careful. Additional git repositories must be placed in untracked directories or manually added to the Project's .gitignore file.

You can easily add a notebook inside your respository (e.g. 00-setup.ipynb or 00-setup.Rmd) that includes a command like git clone https://github.com/me/my-repo untracked/my-repo. You can explain in a comment that this only needs to be done once. Depending on your audience, you might also include content in other notebooks that checks for the existence of that directory, additional comments, etc.

An example using a notebook to clone a repository from GitHub

Updated 6 months ago

Untracked Folders


Untracked folders don't get versioned or synced

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.