Skip to content

Conversation

@tomwhite
Copy link
Collaborator

Fixes #41


This specification depends on definitions and terminology from [The Variant Call Format Specification, VCFv4.3 and BCFv2.2](https://samtools.github.io/hts-specs/VCFv4.3.pdf),
and [Zarr storage specification version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html).
and [Zarr storage specification version 2](https://zarr.readthedocs.io/en/stable/spec/v2.html) or [Zarr core specification version 3](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html) [experimental].
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as "experimental" until we've got experience using implementations of this. (Also, I'm not planning on merging this soon.)

| `bool` | `\|b1` | `bool` | Flag |
| `int` | `<i1`, `<i2`, `<i4`, `<i8` or `>i1`, `>i2`, `>i4`, `>i8` | `int8`, `int16`, `int32`, `int64` | Integer |
| `float` | `<f4`, `<f8` or `>f4`, `>f8` | `float32`, `float64` | Float |
| `char` | `\<U1` or `\>U1` | `string` | Character |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zarr format v3 doesn't use NumPy dtypes any more, and it doesn't have a "char" type, so I've represented VCF Character as a Zarr string. This would work, but would mean that converting VCF Zarr back to VCF would be lossy in the sense that a VCF Character field would be converted back to a VCF String. This may not matter, but it could potentially be fixed by either i) adding a "char" type/fixed-length string type to Zarr, or ii) adding an attribute to VCF Zarr to record the fact that it is logically a single character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Zarr v3 format

1 participant