Data Engineer

Data Engineering · South San Francisco, California
Department Data Engineering
Employment Type Full-Time
Minimum Experience Experienced

 The Opportunity

Data Engineering plays a key role in insitro’s approach to rethinking drug development. Our team is responsible for ensuring our biological data factory’s robots and instruments produce high quality data, optimizing storage, queries, and analysis of petabytes of scientific experimental results, and building the infrastructure to train powerful models that solve key problems in the drug development process. You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to identify areas where data engineering can make a difference, by developing data architectures and systems on cutting edge, high throughput platforms that enable our scientists to be maximally productive. You will design, implement, and deploy novel methods that use a broad spectrum of data engineering approaches, including techniques at the forefront of the field. You will work as part of a team to rigorously design our data platform, identify key architectural performance improvements and support ongoing discovery and automation platforms.

You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!


About You

  • BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience
  • Expertise in one or more general-purpose programming languages (such as Python, C/C++, or Go) 
  • Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs)
  • Familiarity with cloud computing services (AWS or GCP)
  • Familiarity with database technologies, data pipelines, workflow engines, distributed computing technologies (Spark, Hadoop, etc).
  • Familiarity with web services and application frameworks (Django, Flask).
  • Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
  • Proficiency in Linux environment (including shell scripting), experience with database languages (e.g., SQL, No-SQL) and experience with version control practices and tools (Git, Mercurial, etc.)
  • Passion for making a difference in the world


Nice to Have

  • Experience with biological data (DNA sequences, RNAseq, proteomics, microscopy images, etc.) 
  • Experience with medium-sized data sets (100TB+)
  • Experience with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask, etc.)
  • Demonstrated ability to develop novel data engineering methods that go beyond putting together of existing code, and to apply problem-solving skills to complex issues
  • 4+ years of real-world work experience in software development for high-end data processing engines


Benefits at insitro

  • Excellent medical, dental, and vision coverage
  • Open vacation policy
  • Team lunches (catered daily)
  • Commuter benefits
  • Paid parental leave



About insitro

insitro is a data-driven drug discovery and development company using machine learning and high-throughput biology to transform the way that drugs are discovered and delivered to patients. The company is applying state-of-the-art technologies from bioengineering to create massive data sets that enable the power of modern machine learning methods to be brought to bear on key bottlenecks in pharmaceutical R&D. The resulting predictive models are used to accelerate target selection, to design and develop effective therapeutics, and to inform clinical strategy. insitro was launched in 2018 with a Series A of $100M funded by top investors including a16z, Arch Venture Partners, Foresite Capital, GV, and Third Rock Ventures. In 2019 the company announced a collaboration with Gilead Sciences in the area of NASH and, in mid 2020, announced a Series B financing of $143M including current investors and new investors Canada Pension Plan Investment Board (CPP Investments), T. Rowe Price, BlackRock, Casdin Capital and other leading investors. The company is located in South San Francisco, CA. For more information about insitro, please visit the company’s website at  www.

Thank You

Your application was submitted successfully.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

  • Location
    South San Francisco, California
  • Department
    Data Engineering
  • Employment Type
  • Minimum Experience