byconaut
¶
The byconaut
package contains scripts for data processing for and based on the
bycon
package. The main use cases are:
- generation of utility collections for the standard Progenetix data model
collations
frequencymaps
provide binned CNV frequency values for samples belonging to a given collation code
- I/O & transformations for
bycon
generated files
Installation¶
byconaut
depends on the bycon
package which can be downloaded from its
repository. Please see the repository
and the corresponding documentation site.
While there is also a pip
installation possible over pip3 install bycon
this will not include the local configuration files necessary e.g. for
processing the databases.
Test with examplez
database from ¶
- download
- unpack somewhere & restore with (your paths etc.):
mongosh examplez --eval 'db.dropDatabase()' mongorestore --db $database .../mongodump/examplez/
- proceed w/ step 4 ... below
Create your own databases¶
Core Data¶
A basic setup for a Beacon compatible database - as supported by the bycon
package -
consists of the core data collections mirroring the Beacon default data model:
variants
analyses
(which covers parameters from both Beaconanalysis
andrun
entity schemas)biosamples
individuals
Databases are implemented in an existing MongoDB setup using utility applications
contained in the importers
directory by importing data from tab-delimited data
files. In principle, only 2 import files are needed for inserting and updating of records:
* a file for the non-variant metadata1 with specific header values, where as
the absolute minimum id values for the different entities have to be provided
* a file for genomic variants, again with specific headers but also containing
the upstream ids for the corresponding analysis, biosample and individual
Examples¶
Minimal metadata file¶
individual_id biosample_id analysis_id
BRCA-patient-001 brca-001 brca-001-cnv
BRCA-patient-001 brca-001 brca-001-snv
BRCA-patient-002 brca-002 brca-002-cnv
Variant file¶
Further and optional procedures¶
- Create database and variants collection
- update the local
bycon
installation for your database information andlocal parameters- database name(s)
filter_definitions
for parameter mapping
- Create metadata collections -
analyses
,biosamples
andindividuals
- Create
statusmaps
and CNV statistics for the analyses collection- only relevant for CNV database use cases
- Create the
collations
collection which usesfilter_definitions
and the corresponding values to aggregate information for query matching, term expansion ... - Create
frequencymaps
for binned CNV data- relies on existence of
statusmaps
inanalyses
andcollations
- only needed for CNV data
- relies on existence of
Data maintenance scripts¶
Please see the helper apps documentation.
-
Metadata in biomedical genomics is "everything but the sequence variation" ↩