(Senior) Genomic Data Scientist - Computational Biologist

Computing · South San Francisco, California
Department Computing
Employment Type Full-Time

The Opportunity

Key to insitro’s approach to rethinking drug development is the use of cutting edge machine learning to model the space of cell states in health and disease. To enable that, we produce high-quality, high throughput, genomic data of iPSC-derived cellular disease models under genetic and chemical perturbation. We integrate this data with patient genotypes and clinical and molecular phenotypes to identify molecular targets for impactful therapeutics.


As a Genomic Data Scientist, you will develop and apply cutting edge methods to analyze multi-dimensional, multi-modality genomic data to uncover new disease biology. You will work with high quality functional genomic data, such as single-cell RNA-seq, of cells from diverse genetic backgrounds under multiple genetic and chemical perturbations. You will work closely with a cross-functional team of life scientists, bioengineers and machine learning scientists to design and analyze in-house experiments. You will integrate genomic data with other data modalities, such as microscopy and genetics, to investigate iPSC-derived cellular models of various diseases. Finally, you will analyze data from human cohorts and combine it with our in-house results to identify therapeutic targets and develop drugs that have high efficacy and low toxicity. 


You will be joining an agile and fast growing biotech startup that has long-term stability due to significant funding. You will have ample opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!


About You

  • M.S. or Ph.D. in computational biology, genetics or a related discipline, or equivalent practical experience
  • 3-5+ years demonstrated experience using and developing cutting-edge methods for analyzing NGS sequencing datasets, and especially single-cell RNA sequencing based assays
  • Strong fundamentals in applied multivariate statistics
  • Strong programming skills in Python, or strong programming skills in R and experience in Python
  • Interest in uncovering novel disease biology
  • Experience working in cloud-based computing environments (especially AWS) or HPC systems (e.g. using SLURM) 
  • Experience with modeling sequencing artifacts (e.g. GC content, fragment length bias, overdispersion, etc.) and interpretation of QC measurements to guide assay development
  • Proficiency in working with large-scale datasets in Linux/Bash and experience working with large numbers of samples and modern workflow management frameworks (Snakemake, Cromwell/CWL/WDL, NextFlow, etc)
  • Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions in a fast-paced startup environment
  • Passion for providing better medicine to patients in need


Nice to Have

  • Experience working in CRISPR based assays and in drug discovery.
  • Expertise in deep-learning modeling and computer vision; familiarity with common deep learning toolkits such as tensorflow, pytorch, keras
  • Expertise with NGS data processing tools (samtools, GATK, IGV, etc)
  • Experience analyzing various NGS sequencing datasets, such as: variant calling from WGS/WES and extracting signal from functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc)
  • Experience working with genetic data, genotype imputation, LD expansion and quantitative trait loci (QTL) analysis
  • Domain knowledge in immunological, neurological or metabolic disorders
  • Experience with version control practices and tools (e.g. Git)
  • Experience developing or helping develop novel genomic assays
  • Experience with C/C++ or other compiled, statically typed languages
  • Experience with database languages (e.g., SQL).


Benefits at insitro

  • Excellent medical, dental, and vision coverage
  • Open vacation policy
  • Team lunches (catered daily)
  • Commuter benefits
  • Paid parental leave


About insitro

insitro is a data-driven drug discovery and development company using machine learning and high-throughput biology to transform the way that drugs are discovered and delivered to patients. The company is applying state-of-the-art technologies from bioengineering to create massive data sets that enable the power of modern machine learning methods to be brought to bear on key bottlenecks in pharmaceutical R&D. The resulting predictive models are used to accelerate target selection, to design and develop effective therapeutics, and to inform clinical strategy. insitro was launched in 2018 with a Series A of $100M funded by top investors including a16z, Arch Venture Partners, Foresite Capital, GV, and Third Rock Ventures.

The company has announced collaborations with Gilead Sciences in the area of NASH (2019) and Bristol Myers Squibb in the area of ALS (2020) and, in mid 2020, completed a Series B financing of $143M including current investors and new investors Canada Pension Plan Investment Board (CPP Investments), T. Rowe Price, BlackRock, Casdin Capital and other leading investors. The company is located in South San Francisco, CA. For more information about insitro, please visit the company’s website at www.insitro.com 

Thank You

Your application was submitted successfully.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

  • Location
    South San Francisco, California
  • Department
    Computing
  • Employment Type
    Full-Time