Quick start
Input and output data formats
These are data formats that users can prepare for biodem
modules.
Tabular data
The Modules DEMDataset
can read tabular data in the CSV and Parquet formats.
- The first column contains sample IDs that will be read as string type.
- The first column's name must be
"ID"
.
Please see the detailed requirements in the Modules > Utilities > Preprocessing Data > DEMDataset
.
Example:
ID | gene_1 | gene_2 |
---|---|---|
id_1 | 0.87 | 0.03 |
id_2 | 0.34 | 0.65 |
VCF
Genotype data in Variant Call Format (VCF).
GFF
Genomic annotations in Generic Feature Format Version 3 (GFF3).
Running tests
A simple usage example is provided in ./tests/test_biodem.py
. Please refer to the script and "Modules" documentation for more details.