Key to insitro’s approach to rethinking drug development is the production of high quality genomic data and its integration with patient genotypes and molecular phenotypes. As a Genomic Data Scientist, you will lead the development and application of cutting edge approaches and workflows to analyze multi-dimensional, multi-modality genomic data. You will work with high quality functional genomic data such as bulk and single-cell RNA-seq and ATAC-seq of cells from diverse genetic backgrounds under multiple genetic and chemical perturbations. You will also analyze whole-genome and whole-exome NGS sequencing data to characterize patients and cell lines. You will play a key role in helping develop and refine novel functional genomics assays and sequencing data pipelines. You will work closely with a cross-functional team of life scientists, bioengineers and machine learning scientists to design and analyze our in-house experiments to collect high-throughput in vitro genomic data. You will also analyze human level data from clinical trials (including genetics, transcriptomics and pathology) and integrate it with our in-house genomic data to identify therapeutic targets and develop drugs that have high efficacy and low toxicity.
You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
M.S. or Ph.D. in computational biology, genetics or a related discipline, or equivalent practical experience
3-5+ years demonstrated experience using and developing cutting-edge methods for analyzing NGS sequencing datasets, including both variant calling from WGS/WES and extracting signal from functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc)
Experience with modeling sequencing artifacts (e.g. GC content, fragment length bias, overdispersion, etc.) and interpretation of QC measurements to guide assay development
Significant expertise with NGS data processing tools (samtools, GATK, IGV, etc)
Experience working with large numbers of samples and modern workflow management frameworks (Snakemake, Cromwell/CWL/WDL, NextFlow, etc) running on either HPC systems (e.g. using SLURM) or on cloud services (e.g. AWS or GPC)
Strong fundamentals in applied multivariate statistics
Proficiency in working with large-scale datasets in Linux/Bash and Python; experience with R, C/C++ or other compiled, statically typed languages is a plus
Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
Passion for making a difference in the world
Nice to Have
Experience working with genetic data, genotype imputation, and LD expansion
Domain knowledge in immunological, neurological or metabolic disorders
Experience working with single-cell sequencing analysis
Experience with quantitative trait loci (QTL) analysis
Experience developing or helping develop novel genomic assays
Experience with database languages (e.g., SQL) and experience with version control practices and tools (e.g. Git)
Benefits at insitro
Your application was submitted successfully.