biodem.utils.uni
biodem.utils.uni
Some universal functions.
CollectFitLog
Source code in src\biodem\utils\uni.py
610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 |
|
__init__(dir_log)
Collect training logs from optuna db files and ckpt files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_log
|
str
|
Directory containing the model fitting logs. |
required |
Source code in src\biodem\utils\uni.py
collect()
Collect training logs from optuna db files and ckpt files.
Source code in src\biodem\utils\uni.py
collect_ckpt()
Collect info from ckpt files and tensorboard events.
Source code in src\biodem\utils\uni.py
collect_optuna_db()
Collect info of optuna db files.
Source code in src\biodem\utils\uni.py
get_df_csv(dir_output, overwrite_collected_log=False)
Collect trained models for each fold in nested cross-validation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_output
|
str
|
Directory to save the collected logs. |
required |
overwrite_collected_log
|
bool
|
Whether to overwrite existing collected logs. |
False
|
Source code in src\biodem\utils\uni.py
read_tensorboard_events(dir_events, get_test_loss=True)
Read tensorboard events from the directory.
Source code in src\biodem\utils\uni.py
remove_inferior_models()
Remove inferior models based on the collected result table.
Source code in src\biodem\utils\uni.py
search_ckpt()
Search checkpoints in the directory and its subdirectories.
Source code in src\biodem\utils\uni.py
ProcOnTrainSet
Source code in src\biodem\utils\uni.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 |
|
__init__(df_in, ind_for_fit, n_feat2save=None, df_labels=None)
Process all data points based on the training set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_in
|
DataFrame
|
Input dataframe. |
required |
ind_for_fit
|
Optional[List[Any]]
|
Sample indices for fitting the preprocessors. |
required |
n_feat2save
|
Optional[int]
|
Number of features to save. |
None
|
df_labels
|
Optional[DataFrame]
|
Labels dataframe. |
None
|
How to use
- Initialize the class.
- Call the method
pr_xxxxx
to process the data. - Call the method
save_processors
to save the processors (as a dict) to a pickle file.
Source code in src\biodem\utils\uni.py
keep_preprocessors(x_value)
The key (int, 0-based) is automatically generated by the order of the data processor, for the reproduction of data processing steps.
Source code in src\biodem\utils\uni.py
RFSelector
Source code in src\biodem\utils\uni.py
__init__(n_feat2save, random_states, n_estimators, n_jobs, save_processors=False)
Select features based on random forest.
After fit
, the selector recgonizes the colnames of the input dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_feat2save
|
int
|
The number of features to save. |
required |
random_states
|
List[int]
|
The random states for the random forest. |
required |
n_estimators
|
int
|
The number of trees in the random forest. |
required |
n_jobs
|
int
|
The number of jobs to run in parallel. |
required |
save_processors
|
bool
|
Whether to save the random forest models. |
False
|
Source code in src\biodem\utils\uni.py
keep_preprocessor(x_processor)
The key (int, 0-based) is automatically generated by the order of the data processor, for the reproduction of data processing steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_processor
|
Any
|
the processor to be kept. |
required |
Source code in src\biodem\utils\uni.py
VarThreSelector
Source code in src\biodem\utils\uni.py
__init__(threshold)
Select features based on variance.
After fit
, the selector recgonizes the colnames of the input dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
threshold
|
float
|
The threshold for variance. |
required |
Source code in src\biodem\utils\uni.py
get_indices_ncv(k_outer, k_inner, which_outer_test, which_inner_val)
Get indices of fragments for NCV.
Source code in src\biodem\utils\uni.py
idx_convert(indices, onehot_bits=const.default.snp_onehot_bits)
Convert the indices to the corresponding indices in the one-hot vector.
Source code in src\biodem\utils\uni.py
intersect_lists(lists, get_indices=True, to_sorted=True)
Find the shared elements between multiple lists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lists
|
List[List[Any]]
|
A list of lists. |
required |
get_indices
|
bool
|
Whether to return the indices of the shared elements in each list. |
True
|
to_sorted
|
bool
|
Whether to sort the shared elements. |
True
|
Source code in src\biodem\utils\uni.py
onehot_encode_snp_mat(snp_matrix, onehot_bits=None, genes_snps=None)
One-hot encode the SNP matrix.
Source code in src\biodem\utils\uni.py
read_labels(path_label, col2use=None)
Read labels from a csv file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_label
|
str
|
Path to the labels file. |
required |
col2use
|
Optional[List[Any]]
|
A list of column names or indices.
If |
None
|
Source code in src\biodem\utils\uni.py
read_omics(data_path)
Read omics data from various formats.
Source code in src\biodem\utils\uni.py
read_omics_xoxi(data_path, which_outer_test, which_inner_val, trnvaltst=const.abbr_train, file_ext=None, prefix=None)
Read processed data from a directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_path
|
str
|
Path to the directory containing the data. |
required |
which_outer_test
|
int
|
Which outer test set to read. |
required |
which_inner_val
|
int
|
Which inner validation set to read. |
required |
trnvaltst
|
str
|
The abbreviation of the training/validation/test set. |
abbr_train
|
file_ext
|
Optional[str]
|
The file extension of the data files. If None, the default extension will be used. |
None
|
prefix
|
Optional[str]
|
The prefix of the file name. (Optional) |
None
|
Source code in src\biodem\utils\uni.py
read_pkl_gv(path_pkl)
Read processed VCF data from a pickle file.
Source code in src\biodem\utils\uni.py
train_model(model, datamodule, es_patience, max_epochs, min_epochs, log_dir, devices=const.default.devices, accelerator=const.default.accelerator, in_dev=False)
Fit the model.