Why Git DRS?¶
Git DRS exists to extend Git LFS with a scalable, standards-based data access layer without changing the Git user experience. It is explicitly aligned with the GA4GH Data Repository Service (DRS) specification so Git-native workflows can interoperate with the broader GA4GH ecosystem.
At a high level, every data flow follows the same model:
Git → Git LFS → Git DRS → Object Store
flowchart LR
%% Top row: main components
Git[Git] --> LFS[Git LFS] --> DRS[Git DRS] --> Store[Object Store]
%% Second row: responsibilities (aligned under each box)
Git_r[commits]
LFS_r[pointers]
DRS_r[auth + resolve]
Store_r[blobs]
%% Third row: concrete artifacts
Git_a[data files]
LFS_a[transfer queue]
DRS_a[signed URLs / DRS]
Store_a[S3 / GCS / Azure / on-prem]
%% Vertical alignment
Git_a --> Git_r --> Git
LFS --> LFS_r --> LFS_a
DRS --> DRS_r --> DRS_a
Store --> Store_r --> Store_a
%% Styling to mimic ASCII boxes
classDef main fill:#ffffff,stroke:#333,stroke-width:1px;
classDef note fill:#f9f9f9,stroke:#999,stroke-dasharray:3 3;
classDef artifact fill:#eef2ff,stroke:#4c6ef5;
class Git,LFS,DRS,Store main
class Git_r,LFS_r,DRS_r,Store_r note
class Git_a,LFS_a,DRS_a,Store_a artifact
This separation of concerns is intentional.
The Problem Git DRS Solves¶
Git LFS solves how large files integrate with Git, but it intentionally does not define:
- How objects are globally identified beyond a repository
- How access is authorized across organizations and environments
- How storage backends are abstracted (S3, GCS, Azure, on‑prem)
- How metadata, lineage, and reuse are tracked at scale
Git DRS fills these gaps while remaining fully compatible with Git LFS.
By leveraging the GA4GH DRS standard, Git DRS also unlocks common DRS use cases:
- Federated discovery and access across multiple data repositories
- Controlled-access datasets with consistent authN/authZ patterns
- Cross-institution data sharing for genomics and other biomedical data
- Stable identifiers that survive repo moves and lifecycle changes
Architecture Overview¶
1. Git (Source Control)¶
- Tracks commits, branches, and history
- Stores Git LFS pointer files in the repository
- Remains unaware of large object storage details
2. Git LFS (User Experience Layer)¶
- Replaces large files with SHA‑256 pointer files
- Manages clean/smudge filters
- Schedules uploads and downloads
- Invokes a custom transfer adapter
Git LFS defines when data moves — not where or how it is authorized.
3. Git DRS (Resolution & Authorization)¶
- Implements a Git LFS custom transfer adapter
- Resolves SHA‑256 OIDs to DRS objects
- Enforces authorization and access policy
- Issues signed URLs or delegated credentials
- Records metadata and lineage when required
Git DRS is where platform policy lives.
4. Object Store (Persistence)¶
- Stores immutable, content‑addressed blobs
- Typically S3, GCS, Azure Blob, or on‑prem equivalents
- Optimized for durability and throughput, not Git semantics
Why This Matters¶
This layered design enables:
- Content‑addressed deduplication across repos and projects
- Cross‑environment portability (dev → staging → prod)
- Standards alignment with GA4GH DRS
- Clear security boundaries between Git users and storage credentials
- Future extensibility without breaking Git workflows
Most importantly, it preserves the developer experience:
From the user’s perspective, it’s still just
git add,git commit, andgit push.
Summary¶
Git DRS is not a replacement for Git or Git LFS.
It is the missing architectural layer that allows Git LFS to operate safely, portably, and at scale in regulated and multi‑tenant environments.
Git → Git LFS → Git DRS → Object Store is the contract.