MLOps 101: Why Your Business Needs It and How to Get Started
Artificial intelligence (AI) recently got a boost with the release of ChatGPT from OpenAI. Now, everyone’s talking about the potential of AI to transform enterprise processes across business functions.
Less often discussed is what it takes to move from developing AI models to productionizing them with machine learning (ML) to deliver significant enterprise value. To date, data scientists have spent much of their time preparing and managing data used in AI models, a slow, costly processes that has limited machine learning gains. In addition, many ML models have failed to make it out of production into full deployment.
Enterprises that set up MLOps disciplines can productionize and scale ML models, gaining new advantages such as improved forecasting and operational efficiencies. As a result, there’s never been a better time for enterprises to evolve their MLOps capabilities to move faster with AI and leverage it for more use cases.
To date, just 50 percent of enterprises have adopted AI in at least one business unit, deploying an average of 3.8 capabilities. That journey can accelerate with MLOps.
A machine learning (ML) model is a computer program that has been trained to recognize patterns. By doing so, it can then make predictions about what will happen next.
Machine learning operations (MLOps) combines data engineering, machine learning, and DevOps into a single discipline. MLOps encompasses the skills, frameworks, technologies, and best practices that equip data engineering, data science, and IT teams to industrialize ML models and evolve processes over time. MLOps also integrates data sources and data sets, an AI model repository, an automated ML platform, and software containers.
“MLOps is the key to making machine learning projects successful at scale,” says John 'JG' Chirapurath, VP of Azure at Microsoft.
“What does MLOps mean? MLOps is the short form of the phrase machine learning and information technology operations."
IT innovation and operations have gotten more complicated with the industry trend of naming every discipline “ops.” Let’s explore the difference between three similar sounding terms: DevOps, AIOps, and MLOps.
Development operations (DevOps) is the practice of using Agile, lean processes to increase the pace of IT development, while taking a systems approach. Organizations using DevOps seek to create a strong development culture, use automation to streamline processes, and provide iterative value with each code release.
Gartner coined the term AIOps, short for AI operations. According to the analyst firm, “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination.” Or to put it more simply, AIOps automates IT processes to make them more efficient.
Machine learning operations, or MLOps, standardizes the process of developing, deploying, and maintaining ML models. MLOps seeks to industrialize or productionize these capabilities, so that teams can progress from deploying ML models using bespoke processes to ones built for scale.
The global MLOps market size was valued at $983.6 million in 2021, but is growing at a heady of CAGR of 37.5%. As a result, it’s expected to reach $23.1 billion by 2031, according to Allied Market Research. That’s because enterprises are investing in developing MLOps architectures, deploying MLOps tools and platforms, and leveraging MLOps best practices to rapidly evolve their competency with this important emerging discipline.
Any organization that uses big-data analytics to drive decision-making and optimize processes needs ML models and machine learning ops. To date, because of the cost and commitment of standing up an MLOps function, this discipline has been embraced by well-funded research organizations, startups, and enterprises.
Organizational leaders should consider investing in MLOps when they are ready to develop and productionize ML models for multiple use cases. Developing a strong MLOps practice can help organizations overcome the following issues:
Enterprises that deploy ML models unlock significant new business advantages, including:
Gaining a repeatable process for ML deployment: As with DevOps, MLOps teams seek to automate as much as possible, creating standardized processes are easy to reproduce and scale.
Focusing time on activities that generate business value: Teams focus less time on data preparation and more time on ML model deployment and optimization, activities that improve enterprise ML value creation.
Ensuring ML models are deployed: Industry statistics show that up to 90 percent of ML models don’t make it to production, representing wasted investment. With such low success rates, business stakeholders lose interest in funding ML projects. Using standardized processes and automation decreases project risk and error, ensuring that ML models are deployed and realize intended business value.
Increasing operational efficiency Teams that use machine learning ops become faster at producing and deploying ML models. While many enterprises may currently have a low number of models in production, leaders want to be able to deploy and manage more and more ML use cases.
Reducing ML development costs: Currently, ML models are resource-intensive, requiring large teams of data engineers, data scientists, and IT professionals to develop, deploy, and maintain. Organizations can reduce costs by using standardized processes; cloud-based platforms that can scale with business growth; and partners who provide critical capabilities, such as frameworks, MLOps platforms, and teams to manage ML programs.
Accelerating time to market: Being able to industrialize ML capabilities and launch new models in days to weeks is critical to achieving business transformation goals.
By 2024, IDC predicts that 60% of enterprises will have operationalized their MLOps workflows. Those that move ahead now can create competitive advantage by using data, analytics, and automation to predict and respond to business changes.
So, who’s adopting AI and ML and what have they accomplished thus far with this exciting new discipline?
The worldwide market for AI is growing fast. Sales of AI software, hardware, and services are expected to break the $500 billion mark in 2023.
However, deploying AI and ML is still an emerging discipline. Just one in three enterprises have adopted clear frameworks for AI governance (38%), follow standard protocols for building and delivering AI tools (36%), and have capability-building programs to enhance team member skills (36%).
Despite challenges, such as insufficient talent and difficulties productionizing ML models, more than half (56 percent) have adopted AI in at least one business function, finds McKinsey.
Machine learning operations helps organizations overcome challenges that have plagued AI and ML initiatives to-date. These issues include:
Competing for skilled talent: Enterprises’ appetite to pursue AI and ML programs is creating an enormous demand for data scientists. LinkedIn jobs for data science have skyrocketed 650%. However, only 20% of enterprise leaders believe their data science teams are ready for AI. Enterprises need to hire, develop, and retain data engineers and data scientists, necessitating new cross-skilling and upskilling
Overcoming data challenges: Enterprises have a wealth of data that’s siloed in repositories and business processes. Migrating, ingesting, preparing and provisioning data on modern cloud platforms like Databricks and Snowflake is a necessary prerequisite to moving forward with AI and ML.
Download our eBook, Migrate to Modernize Your Data and Analytics, to learn how you can do this.
Overcoming data challenges: ML models are costly to deploy in part due to the high cost of talent and lack of repeatable processes. The number of data scientists employed by the average organization grew from 28 to 50 in 2021. With a one-off approach, this expense is incurred with every model.
Machine learning ops enables enterprises to develop scalable frameworks that reduce costs over time. Enterprises need to be able to control costs and achieve efficiencies as they scale models.
ML models can be slow to deploy: According to IDC, deploying AI and ML solutions can take up to nine months on average, by which time the business has almost certainly changed.
“MLOps brings model velocity down to weeks —sometimes days,” says IDC analyst Sriram Subramanian. “Just like the average time to build an application is accelerated with DevOps, this is why you need MLOps.”
Too many ML models fail in production: The ability to use AI and ML across industry challenges will be critical for future success in business. Those organizations that can automate end-to-end processes, predict events, and respond to demand changes, as well as continuously learn from data, will outpace those that can’t. However, the majority of ML models fail in production, representing wasted time and cost. The reasons why include losing long-term executive support, choosing manual processes over automation, being unable to motivate functions to change operational processes, or being unable to deploy and manage ML models over the long-term.
“Many organizations are constrained by artisanal development and deployment techniques, with star data scientists frequently treated as virtuosos and given considerable creative control.” Source: Deloitte
Machine learning operations provides a framework, standardized processes, and automation and other tools for the entire ML model lifecycle, including development, training, packaging, validation, deployment, monitoring, and evolution.
Ensuring ML models are reproducible: Even the best ML models will be of limited use if they aren’t reproducible. Manual processes often aren’t repeatable, which is another argument for MLOps. MLOps centralizes resources, helps orchestrate the many pipelines required to develop and deploy MLOps, and automates critical processes, among other gains.
Planning for monitoring and training: ML models and data drift over time, so they need to be closely monitored and trained on new data. After being retrained, models will need to be revalidated and redeployed. So, enterprises need to ensure they have the teams, processes, and automated capabilities in place to move to a continuous improvement model.
Scaling ML requires close collaboration: Collaborators want to work together to explore data, conduct model experiments, perform feature engineering, manage models and get them ready for deployment and ongoing monitoring. With an MLOps framework, everyone uses the same processes, ensuring consistency across work.
Ensuring enterprise readiness: Enterprise teams need to lay some important groundwork before they deploy ML models, including ensuring adequate security for data and models; applying governance; and capturing data on decision making and processes for compliance. The top ML model risks include adversarial attacks that cause models to make false predictions; data poisoning, or manipulation; training manipulation; transfer learning attacks; and malicious data extraction from models. In addition, enterprises need a clear audit trail for all ML activities, to demonstrate compliance with relevant requirements and regulations.
Types of ML Models
|Universal/One fits all
|Specific/One fits one
|Specific for most/ One fits one
|Models can be scaled
|Models can be scaled
(ease and speed)
|Specific for most/ One fits one
(development and deployment)
(across the board)
(high development costs + high deployment costs)
(medium development costs + low deployment costs)
|Medium to High
MLOps spans myriad use cases and strategies for developing models. However, what they all have in common are:
Navigating an exploratory phase: Teams prepare or acquire data, create AI models, test algorithms to see which ones work, and create models that can be productionized.
Productionizing the model: Next, it’s time to reproduce the process of creating the model, by leveraging CI/CD pipelines and training models. After that, teams will freeze the ML pipeline to prepare for deployment.
Deploying the model: Teams will then push the ML model to a centralized store, package it to run in different environments, validate it, and deploy it to the target system/server.
Implementing MLOps: Teams may use manual processes to develop and deploy a few models. However, to improve speed and scale of deployments, they’ll need to automate the ML lifecycle.
Nearly all (88%) of MLOps providers are startups. MLOps startups have already received $3.8 billion and provide data governance, data monitoring, ML monitoring, ML platforms, and serving platforms. Providers include hyperscalers and data platforms (listed below), as well as Tredence, which recently received $175M from private equity firm Advent International to accelerate data-fueled growth and AI value realization for industry companies.
All of the major cloud providers, including AWS, Microsoft Azure, and Google Cloud Platform provide MLOps tools.
Using MLOPs with Databricks: Databricks Machine Learning is a data-native, collaborative and full-lifecycle ML platform that does data engineering work, so that you can focus on building scalable and replicable models. Built natively with MLflow and Delta Lake — two of the world’s most popular open-source projects — Databricks Machine Learning accelerates machine learning efforts all the way from featurization to training, tuning, serving and monitoring.
Using MLOps with Microsoft Azure: Microsoft Azure provides tools for data scientists and IT professionals to use to accelerate the automation, collaboration, and reproducibility of ML workflows. Teams can build MLOps workflows and models with MLFlow and Azure Machine Learning, easily deploy highly accurate models anywhere, efficiently manage the entire ML lifecycle, and improve team collaboration. In addition, Microsoft Azure helps implement governance across all ML assets.
Using MLOps with AWS: AWS Sagemaker enables teams to deliver high-performance ML models at scale. Sagemaker provides repeatable training workflows to accelerate model development, catalogues ML artifacts centrally to enable model reproducibility and governance, integrates ML workflows with CI/CD pipelines for faster time to production, and continuously monitors data and models in production to maintain quality.
Using MLOps with Google Cloud Platform: Google Cloud Platform provides Vertex AI , a machine learning (ML) platform that empowers teams to train and deploy ML models and AI applications. Teams can use the platform to orchestrate workflows, track metadata, identify the best model for a use case, manage model versions and features, and monitor model quality.
Getting started with MLOps
Here are some general recommendations on capabilities you need to have in place to get started with machine learning ops.
Now, you’re ready to get build. Here’s how to develop and deploy an ML model.
How to improve ML model accuracy?
Data scientists use a variety of strategies to improve ML model accuracy. Only highly accurate ML models can be used for critical processes, such as improving decision making. Here are some strategies from Towards Data Science.
|Functional Responsibilities Before MLOps
|Responsibilities After MLOps
Data governance teams
Many industries are experimenting with MLOps, but just a few have an extensive number of models in production. The software and technology (13.7%) and consumer packaged goods industries (8.2%) are leading other verticals, with having 100 or more ML models in production.
Source: HBR/Capital One report
Here are some examples of how industry companies are using ML and MLOps to create competitive advantage.
All industries: Analyzing data to uncover new customer needs and product development opportunities. Using insights into operational processes to create new efficiencies.
Consumer packaged goods (CPG): CPG companies need to understand customer demand to produce enough goods, place them in the right locations, and innovate products. By deploying more ML models with MLOps, CPG manufacturers can improve forecasting, focus investments on the right product development opportunities, and dynamically adjust inventory levels and pricing across markets as demand fluctuates. They do this by leveraging reinforcement learning, which seeks to optimize behaviors to improve cumulative rewards, or supervised learning models, which teach models to achieve a desired result.
Tredence partnered with a global CPG company to automate 80 percent of ML model development, deploy 100k+ ML models across as 20+ markets, reduce model support costs by over 50% and reduce weekly execution time by 22 percent.
E-Commerce: eCommerce companies can use ML models to categorize products correctly and recommend products to customers. This can mean making product recommendations to customers based on their search history, offering related products, or providing reordering prompts. Companies can improve their personalization strategies by using natural language processing (NLP) algorithms that detect customers’ preferences as they interact with their website and app.
Supply chain and logistics: Supply chain and logistics companies can use ML models to determine the right amount of inventory to keep on hand and automate picking and packing, enabling them to offer same-day shipping. They also can optimize warehouse processes, determine the best routes for delivery trucks to take to minimize traffic and track shipments. MLOps processes bring new visibility and efficiencies to a business that often has razor-thin margins.
Retail: Retailers can use ML to solve a wide array of challenges, such as improving customer segmentation, predicting which customers will churn, cross-selling and upselling goods, detecting and preventing fraud, reducing product returns, optimizing pricing, and more. MLOps enables retailers to manage their businesses strategically, driving profitability.
Tredence helped a global retailer scale its MLOps practice across multiple business functions. The retailer reduced the time to build new models and complete feature engineering by 40 percent and the time to onboard new models by 60 percent.
The 15 percent of companies leading on AI get 3.4X greater returns than laggards.
Tredence offers some compelling advantages to its partners:
We are a data analytics company: Many companies offer data analytics as one of many services. We focus deeply and exclusively on data analytics, which gives us unique insights into enterprises’ pain points with data and the problems they want to solve.
We offer a centralized monitoring tool to industrialize processes. Our MLOps platform, MLWorks, provides enterprises with an ML framework, including automated workflows, pre-built accelerators, and a centralized monitoring tool for MLOps observability. Using MLWorks makes model management simpler and more accessible for team.
Leverage AI/ML accelerators: Tredence offers ATOM.AI, an accelerator ecosystem with pre-built ML models, a standardized data architecture, implementable use cases, and automated infrastructure provisioning solutions to speed the path to value.
Other MLOps accelerators include templatized notebooks with integrated feature store support, standardized frameworks for tracking experimentation, extendable ML libraries that provide a simplified view of model predictions, along with metrics customized for user personas and the most relevant information for key business functions. A visual workflow graph provides end-to-end model visibility and pipeline traceability, while automated alerts enhance production model monitoring and optimization.
Benefit from plug and play capabilities: MLWorks is completely native to all leading cloud platforms. As a result, it can seamlessly integrate into your data and ML ecosystem and automate model monitoring.
Use managed services: When you work with Tredence, you can focus your team on innovating. Tredence data science teams can provide 24/7 model management, to meet production SLAs and improve operational efficiency.
Leverage insights to create a data culture: MLWorks provides a simplified view of model predictions, along with metrics that are customized for user personas and provide the most relevant information for business functions. Additional tools improve end-to-end model visibility, pipeline traceability, and production model monitoring.
The past few years have witnessed incredible advances, as companies have digitized business models, processes, products and services. Now, they stand ready to reap the gains of generative AI, ML, and MLOps.
Now is the time to set up and evolve an MLOps practice with the right tools, processes, and partner. You can move ahead of competitors, creating new visibility into your business and the ability to sense and respond to demand and other changes. Drive revenues, create cost savings, and continually enhance profitability with new capabilities.