Ginkgo Bioworks Launches Model API to Democratize Access to Biological AI Models

by Roman Kasianov       News

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or BiopharmaTrend.com.
Contributors are fully responsible for assuring they own any required copyright for any content they submit to BiopharmaTrend.com. This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

  
Topics: AI & Digital   
Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email   |  

Ginkgo Bioworks has announced the launch of its model API, a tool aimed at making biological AI models accessible to researchers, developers, and machine learning scientists. This API, developed in partnership with Google Cloud, provides a new way to interact with models trained on Ginkgo’s proprietary data, including complex protein and DNA sequences.

Bringing AI Models to the Biological Research Community

The new model API is designed to be programmer-friendly and cost-effective, enabling users to access Ginkgo's AI tools directly through its website, with plans to include these models on Google’s Model Garden.

The API provides a scalable platform for working with models trained on large datasets, beginning with Ginkgo’s first release: AA-0, a machine learning model trained on over 2 billion protein sequences from Ginkgo’s proprietary data. Leveraging Ginkgo’s expertise in synthetic biology, AA-0 is tailored for tasks such as iterative protein design and feature extraction for clustering algorithms.

Understanding the Technology: How the Model API Works

Ginkgo’s API includes capabilities such as Masked Language Modeling and Embedding Calculation:

  1. Masked Language Modeling: Given a sequence of amino acids with missing segments indicated by mask tokens, the model predicts and fills in the gaps, allowing for precise protein sequence generation.

  2. Embedding Calculation: This function computes valuable representations of protein sequences by extracting data from the model's final hidden layer. These embeddings can be used for downstream tasks, such as clustering and classification, enhancing the ability to derive insights from protein data.

By making these models available, Ginkgo hopes to empower the scientific community to develop new tools and applications, such as designing novel proteins or optimizing research pipelines. This is part of Ginkgo's broader mission to make biology easier to engineer, opening up advanced machine learning tools to a wider audience and promoting innovation in biological research.

Impact on Drug Discovery and Beyond

The launch of Ginkgo’s model API has the potential to greatly accelerate advancements in drug discovery, synthetic biology, and genomics. By harnessing AI to analyze complex protein structures and interactions, researchers can streamline processes such as lead identification and optimization, potentially bringing life-saving treatments to market more rapidly and efficiently.

The availability of models trained on Ginkgo’s proprietary data may offer a competitive edge, enabling companies to uncover hidden patterns and therapeutic targets that might remain elusive with public datasets alone.

See also: A Busy Day for Ginkgo Bioworks

Ginkgo’s flexible approach includes access to both proprietary and publicly available models, like ESM2, allowing users to explore various methodologies within a single platform. This flexibility is designed to cater to diverse research needs and encourage experimentation without the barrier of high costs.

Accessibility and Competitive Pricing

Ginkgo is committed to making these advanced tools accessible and affordable. The API comes with a competitive pricing structure and a free tier, which includes 2,000 sequences (about 1 million tokens) of free inference in the initial language model.

The introductory pricing is set at approximately $0.18 per million tokens, allowing users to perform predictions on around 2,000 protein sequences for just 20 cents, making it a cost-effective solution for small-scale and exploratory research projects.

Topics: AI & Digital   

Share:   Share in LinkedIn  Share in Reddit  Share in X  Share in Hacker News  Share in Facebook  Send by email