Databricks offers automation during the machine learning lifecycle
The Databricks Unified Analytics Platform now offers automation and augmentation throughout the machine learning lifecycle.
The broader augmented analytics offering not only automates machine learning model building, but also extends to automated data preparation and model deployment. The new automated machine learning (AutoML) capabilities empower expert and citizen data scientists alike.
“Gartner predicts by 2020, more than 40% of data science tasks will be automated, resulting in increased productivity and broader use by citizen data scientists”. To accelerate this automation and help data science teams provide value to their business, Databricks’ Unified Analytics Platform is using machine learning to augment data preparation, visualization, feature engineering, hyperparameter tuning, model search, automatic model tracking, reproducibility, and deployment. Centered around an integration with the open source framework MLflow, this AutoML offering enables citizen data scientists, not just experts, to augment their data science and machine learning workflows at scale.
“Data scientists and machine learning engineers are continuously looking for ways to accelerate and scale their machine learning initiatives,” said Adam Conway, vice president of product management at Databricks. “By introducing the concept of ‘low-code’ and ‘no-code’, AutoML represents a fundamental shift in the way organizations approach machine learning and data science. With the right automation, AutoML can dramatically shorten time-to-value for data science teams.”
This offering provides AutoML capabilities at different levels of control and automation.
AutoML Toolkit: Automated end-to-end machine learning pipeline, including feature engineering, model search, and deployment, is available via Databricks Labs custom solutions. AutoML Toolkit executions are automatically tracked in MLflow.
Automated Model Search: Optimized and distributed conditional hyperparameter search with enhanced Hyperopt and automated tracking to MLflow.
Automated Hyperparameter Tuning: Optimized and distributed hyperparameter search with enhanced Hyperopt and automated tracking to MLflow. Deep integration with PySpark MLlib’s Cross Validation to automatically track MLlib experiments in MLflow.
Integration with Azure Machine Learning: Building upon the open source MLflow collaboration between Databricks and Microsoft announced in April, this integration allows customers access to the automated machine learning capabilities offered by Azure Machine Learning.