GuidesDiscussionChangelogLog In

Data De-Duplication

Datasets de-duplicate data to help save you local disk space. This is done in two ways. First, files are stored based on a hash of their content and are then hard linked for actual use. This means if two files have the exact same content, only one copy will be stored.

The second way files are de-duplicated is across Projects. A much more common case is the desire to use the same data in many Projects. If you embed the data in the Project, each Project will have to keep a complete copy of the data. With Datasets, since the files are only linked into the Project at runtime, only one copy of the files are needed.