Prince Canuma, Autor w serwisie neptune.ai

Machine Learning Model Management: What It Is, Why You Should Care, and How to Implement It

Prince Canuma — Wed, 24 Aug 2022 10:11:58 +0000

Machine learning is on the rise. With that, new issues keep popping up, and ML developers along with tech companies keep building new tools to take care of these issues.

If we look at ML in a very basic way, we can say that ML is conceptually software with a bit of added intelligence but unlike traditional software ML is experimental in nature. Compared to traditional software development, it has some new components in the mix, such as: robust data, model architecture, model code, hyperparameters, features, just to name a few. So, naturally, the tools and development cycles are different, too. Software had DevOps, machine learning has MLOps.

If it sounds unfamiliar, here’s a short overview of DevOps and MLOps:

DevOps is a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.

MLOps is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases end-quality, simplifies the management process, and automates the deployment of machine learning and deep learning models in large-scale production environments. It makes it easier to align models with business needs and regulatory requirements.

MLOps cycle

The key phases of MLOps are:

Data gathering
Data analysis
Data transformation/preparation
Model development
Model training
Model validation
Model serving
Model monitoring
Model re-training

We’re going to do a deep dive into this process, so grab a cup of your favorite drink and let’s go!

What is Machine Learning Model Management?

Model management is a part of MLOps. ML models should be consistent, and meet all business requirements at scale. To make this happen, a logical, easy-to-follow policy for model management is essential. ML model management is responsible for development, training, versioning and deployment of ML models.

Note: Versioning also includes data, so we can track which dataset, or subset of the dataset, we used to train a particular version of the model.

When researchers work on novel ML models, or apply them to a new domain, they run countless experiments (model training & testing) with different model architectures, optimizers, loss functions, parameters, hyperparameters and data. They use these experiments to get to the best model configuration which generalizes the best, or has the best performance-to-accuracy compromise on the dataset.

But, without a way to track model performance and configurations in different experiments, all hell can (and will) break loose, because we won’t be able to compare and choose the best solution. Even if it’s just one researcher experimenting independently, keeping track of all experiments and results is hard.

That’s why we do model management. It lets us, our teams and our organizations:

Proactively address common business concerns (such as regulatory compliance);
Enable reproducible experiments by tracking metrics, losses, code, data and model versioning;
Package and deliver models in repeatable configurations to support reusability.

Why does Machine Learning Model Management matter?

As I mentioned previously Model Management is a fundamental part of any ML pipeline (MLOps). It makes it easier to manage the ML life-cycle from creation, configuration, experimentation, tracking the different experiments, all the way to model deployment.

Now, let’s go a little bit deeper, by making a clear distinction between different parts of ML Model Management. It is important to notice that within ML Model management we manage two things:

Models: Here we take care of model packaging, model lineage, model deployment & deployment strategies (A/B testing and etc), monitoring and model retraining (happens when the deployed model’s performance drops below a set threshold).
Experiments: Here we take of logging training metrics, loss, images, text or any other metadata you might have as well as code, data & pipeline versioning,

Without model management, data science teams would have a very hard time creating, tracking, comparing, recreating, and deploying models.

The alternative to model management are ad-hoc practices, which lead researchers to create ML projects that are not repeatable, unsustainable, unscalable and unorganized.

Now, besides that according to research conducted by AMY X. ZHANG∗, MIT et al. on how DS workers collaborate show that teams of DS workers collaborate extensively on leveraging ML to extract insights from data, as opposed to individual data scientists working alone. And in order to collaborate effectively they employ the best collaborative practices (i.e. documentation, code versioning and so on) and tools between team members with prior being highly dependent on the latter.

MLOps facilitates collaboration but most of today’s understanding of data science collaboration only focuses on the perspective of the data scientist, and how to build tools to support globally dispersed and asynchronous collaborations among data scientists, such as version control of code. The technical collaborations afforded by such tools only scratch the surface of the many ways that collaborations may happen within a data science team, such as:

When stakeholders discuss the framing of an initial problem before any code is written or data collected
Commenting on experiments
Taking over someone elses notebook or code as a baseline to built upon
Researchers and Data Scientist train, evaluate and tag models so that an MLE knows that a model should be reviewed (i.e. A/B testing) and promoted to production (model deployment)
Having a shared repository where business stakeholders can review production models.

Aside

Use neptune.ai reports to share project milestones and experimentation results across the team and organization.

Explain how your model works, monitor performance over time, visualize your findings, discuss bugs, and showcase the progress made.

Check the documentation
Play with an interactive example project
Get in touch if you’d like to go through a custom demo with our product team

What is the extent of collaboration on data science teams?

Source

Rates of Collaboration: Among the five data science roles of Figure above, three roles reported collaboration at rates of 95% or higher. As you can clearly see, these roles are the core roles in a ML team.

The study also shows that Researchers, Data Scientist, ML Engineers collaborate extensively and play a key role throughout the development, training, evaluation (i.e. Accuracy, performance, bias) versioning and deployment of ML models (ML Model Management)

Not convinced yet? Here are six more reasons why model management matters:

Allows for a single source of truth;
Allows for versioning of the code, data, and model artifacts for benchmarking and reproducibility;
It’s easier to debug/mitigate problems ( i.e. overfitting, underfitting, performance and/or bias) — thus making the ML solution easily traceable and compliant with the regulations;
You can do faster, better research and development;
Teams become efficient and have a clear sense of direction.
ML Model management can facilitate intra team and/or inter team collaboration around code, data and documentation through the use of various best practices and tools (JupyterLab, Colab, neptune.ai, MLflow, etc);

ML Model Management components

Before we continue here is a glossary of the common components of ML Model Management workflow:

Data Versioning: Version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa.
Code Versioning/Notebook checkpointing: It is used to manage changes to the model’s source code.
Experiment Tracker: It is used for collecting, organizing, and tracking model training/validation information/performance across multiple runs with different configurations (lr, epochs, optimizers, loss, batch size and so on) and datasets (train/val splits and transforms).
Model Registry: Is simply a centralized tracking system for trained, staged and deployed ML models
Model Monitoring: It is used to track the models inference performance and identify any signs of Serving Skew which is when data changes cause the deployed model performance to degrade below the score/accuracy it displayed in the training environment.

Now that we know the different components of model management and what they do, let’s look into some of the best practices.

Best practices for Machine Learning Model Management

Source

The following is a list of ML model management best practices:

Model

Code

Deployment

ML Model Management vs Experiment Tracking

Experiment tracking is a part of model management, so it’s also a part of the larger MLOps approach. Experiment tracking is about collecting, organizing, and tracking model training/validation information across multiple runs with different configurations (hyperparameters, model size, data splits, parameters, etc).

As mentioned earlier, ML/DL is experimental in nature, and we use experiment tracking tools for benchmarking different models.

Experiment tracking tools have 3 main features:

Logging: log experiment metadata (metrics, loss, configurations, images and so on);
Version Control: track both data and model versions, which is very useful in a production environment and can help with debugging and future improvements;
Dashboard: visualize all logged and versioned data, use visual components (graphs) to compare performance and rank different experiments.

How to implement ML Model Management

Before we move on, let me tell you a short story.

Last year I had a lot of problems with some of my customers because I didn’t track my experiments:

I couldn’t compare different experiments effectively and I did everything from my memory, so projects got delayed.
I relied heavily on ensembling to try to patch the flaws of the individual models which only partially worked but mainly led nowhere.
Not logging the results of experiments also created problems long term, where I couldn’t recall the performance of previous versions of the model.
Deploying the right model was tricky because it was never clear which one was the best, which code, transformations and data was used.
Reproducibility was impossible.
CI/CD and CT were impossible to implement with such artisanal Model Management.

I did some research, found out about ML model management, and decided to try an actual experiment tracking tool to speed up my process. Now, I don’t even start a project without my favorite experiment tracking tool, neptune.ai.

I keep using it both in production and research, such as custom ML model projects I develop for my customers, and in my final year CSE degree project.

There are many other tools out there, some of which are full-blown platforms for managing the whole ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. We’ll discuss these tools in a bit.

So, after using an experiment tracking tool coupled with a model lifecycle platform (in my case, MLflow) on projects with different scale and needs, I found 4 ways of implementing ML model management:

Level-0
- Logging
Level-1
- Logging + Model and Data version control
Level-2
- Logging + Code, Model and Data version control
Level-3
- Logging + Code, Model and Data version control + Model deployment and monitoring

Level-0

Characteristics

I call this ad-hoc research model management. At this level, you’re just using an experiment tracking tool for logging. Great for beginners starting out with ML, or advanced researchers doing rapid prototyping to prove if a hypothesis is worth pursuing.

This level allows individuals, teams, and organizations to record and query their experiment:

Metrics (accuracy, IoU, Bleu score and so on)
Loss (MSE, BCE, CE and so on)
Config (parameters, hyperparameters)
Model performance results from training and testing

Pros

Ad-hoc data science
Research- and rapid prototype-driven

Cons

No data versioning
No model versioning
No notebook checkpoint
No CI/CD Pipeline
Lack of Reproducibility

Challenges

Usually, us data scientists enjoy running multiple experiments to test different ideas, code and model configurations and datasets. At this level, this is quite challenging.

First, you don’t follow any DS Project Management methodology that will give you a clear direction. Therefore, without standardised methodologies for managing data science projects, you will often rely on ad hoc practices that are not repeatable, not sustainable, and unorganized.
Second, datasets are constantly being updated, so even though you log the metrics, loss and configuration, you don’t know which version of the dataset was used to train a specific model.
Third, code also might change with each experiment run, so despite saving all the model configuration, you might not know which code was used in which experiment.
Fourth, even if you save the models weights, you might not know which model was trained using a specific configuration and dataset.

All of these challenges make it impossible to reproduce the results of any particular experiment. In order to address the challenges of this level, a good start is to add versioning to our models and data – some experiment tools do this out-of-the-box. This way we can make partial reproducibility possible.

Level-1

Characteristics

I call this partial model management. Generally used for well-structured teams doing rapid prototyping. At this level, besides experiment tracking, you’re also storing the model and its metadata (configuration), as well as the dataset or data split used to train it, in a central repository that will be used as a single source of truth.

Pros

Has data versioning
Has model versioning
Experiments are partially reproducible
Ad-hoc data science
Research and rapid prototype-driven

Cons

No CI/CD pipeline
Lack of reproducibility
No notebook checkpointing

Challenges

This level is good for testing ideas quickly without fully committing to any of them. It might work great in a research setting, where the goal is to just try out interesting ideas, and compare the experiment across different individuals, teams or companies. We’re not yet thinking about shipping them to production.

Although we can reproduce the experiment from the model metadata and dataset used to train it, at this level we still haven’t fully solved reproducibility. We just partially solved it. In order to go full circle, we need one more component – notebook checkpointing, so that we can track code changes.

Level-2

Characteristics

I call this semi-complete model management. It’s great for individuals, teams and companies who want to not only quickly test their hypothesis, but also deploy their models to a production environment.

This level allows individuals and organizations to keep a full history of experiments by storing and versioning their notebooks/code, data and model, besides just logging metadata. This takes us full circle, making reproducibility a reality and easy to achieve regardless of the ML/DL frameworks or toolset used. At this level, you usually also apply standardised methodologies for managing data science projects.

Pros

Has data versioning
Has model versioning
Has notebook checkpointing
Experiments are fully reproducible
Coupled with a DS project management approach
Production-driven

Cons

No CI/CD pipeline

Challenges

You have automated everything at this level, except one thing: model deployment. This creates stress. Every time you have a new trained model ready for deployment, you have to manually deploy it. In order to complete the ML model management pipeline, you need to integrate CI/CD.

Level-3

Characteristics

I call this end-to-end model management. At this level, you have a completely automated pipeline, from model development, versioning, to deployment. This level offers a production-grade setup, and is great for individuals, teams and organizations looking for a complete, automated workflow. Once you set it up, you don’t have to do ops work anymore. You can focus on tweaking and improving the model and data sources.

Pros

Has data versioning
Has model versioning
Has notebook checkpointing
Experiments are fully reproducible
Coupled with a DS project management approach
Production-driven
CI/CD pipeline

Cons

No CT pipeline

Challenges

ML lifecycle

There is only one thing missing at this point – a way to continuously monitor deployed models. Also known as a CT (continuous testing) pipeline, it’s used to monitor a deployed model, and automatically retrain and serve a new model if the currently deployed model’s performance drops below a set threshold. Let’s take a computer vision model, like ResNet, in a production environment. In order to add CT, it would be as simple as monitoring and logging the following:

Data sent to server (image, video, mp3, test, and so on)
The model’s prediction
Confidence score
Class activation maps (CAM), or the improved Grad-CAM for better explainability as to why it predicted a certain label and where it focused

To add this functionality to the mix, you can re-use the same code from Level-0 or Level-1 for logging metadata during training, and use it for inference.

Tools like Neptune and MLflow let you install their software locally, so you can add this capability to your deployment server. Neptune is more robust here, and offers a second option with a lightweight web version of their software for both individuals and teams, so no need to install and configure anything just create a new project on their dashboard. Just Add a few lines to your deployment code and it’s done.

Building vs using existing ML Model Registry tools

Source

What is Machine Learning (ML) model registry?

An ML model registry is simply a centralized tracking system for trained, staged and deployed ML models. It also tracks who created the model, as well as the data used to train it. It does this by using a database to store model lineage, versioning, metadata and configuration.

It’s relatively easy to build your own simple model registry. You can do it by using a few native or cloud services like the AWS S3 Bucket, RDBMS (Postgresql, Mongo…), and writing a simple python API to make it easy to update the database records in case changes or updates.

Although relatively easy to build a model registry does it mean you should do it? Is it really worth your time, money and resources?

To answer this questions let’s first look at the reasons why you might want to build your own ML model registry:

Privacy: Your data can’t leave your premises.
Curiosity: Like me, you enjoy building things.
Business: You run or work for a company that builds ML tools, and you want to add it to an existing product, or as a new service for customers.
Cost: Existing tools are too expensive for your budget.
Performance: Existing tools don’t meet your performance requirements.

All valid reasons, except maybe cost, because most existing tools are open-source or freemium.

If your concern is performance, some tools offer great performance because they offer dedicated cloud server instances, with very little setup on your part.

Now, if your concern is privacy, most tools also offer an on-prem version of their software, which you can download and install in your organisation’s server to get full control over the data coming in and out. This way you can comply with laws and regulations, and keep your data safe.

In my honest opinion I think there is a common misconception when it comes to build vs buy. Something which usually more mature teams/devs understand right off the bat, but the ML community at large still doesn’t really get it.

The cost of hosting, maintaining, documenting, fixing, updating and adjusting the open-source software is usually orders of magnitude larger than the cost of vendor tools.

The thing is, it is usually relatively easy to build a simple, not-scalable and not-documented, system for yourself.

… but going from this to a system that you can have your entire team work on it very quickly becomes awfully expensive.

Also when you decide to build it (not even open-source) you will end up with someone needing to build/maintain it and ML engineers and devops folks salaries are not cheap.

Generally there is a good rule of thumb -> if the system (like ML model registry) is not your core business, and it usually isn’t than you should focus on your core business (for example building models for autonomous cars) and hire/buy a solution for the part that you don’t build your competitive advantage on.

Think of it this way, would you go and build a gmail because you can?

Or mail-sending system like mailchimp?

Or CMS like wordpress?

Some companies do, even though it is not their business. And it is usually a big mistake as you are focusing on building shovels rather than digging for gold :).

Companies have invested billions of dollars to create great, free and/or premium tools. Most of these tools you can easily extend to fit your own use case, saving your time, money, resources and headaches.

Now, let’s take a detailed look at some of the most popular tools.

Tools for Machine Learning Model Management

Keep in mind, I have my personal preference when it comes to the tools described below, but I tried to be as objective as possible.

neptune.ai

Neptune is the experiment tracker for teams that train foundation models with a strong focus on collaboration and scalability. The tool is known for its user-friendly interface and flexibility, enabling teams to adopt it into their existing workflows with minimal disruption. Neptune gives users a lot of freedom when defining data structures and tracking metadata.

With Neptune, ML/AI researchers and engineers can monitor, visualize, compare, and query all their model-building metadata in a single place. It handles data such as model metrics and parameters, model checkpoints, images, videos, audio files, dataset versions, and visualizations. Furthermore, Neptune makes sharing results with team members, outside collaborators, and stakeholders easy.

Key advantages

Scalability: Neptune easily tracks tens of thousands of data points, and the UI allows users to compare more than 100,000 runs with millions of data points.
Pricing: Neptune’s pricing model is based on the number of users, allowing them to collaborate on as many projects as they like.
Self-hosting: Neptune is available for self-hosting, which is a first-class offering in the Enterprise tier. Designed to be hosted in a private cloud environment, Neptune integrates with common authentication solutions like SAML or LDAP, allowing seamless integration while keeping sensitive data protected.
Support and documentation: All plans (including the Free tier) provide access to chat and email support, with SLAs reserved for the Enterprise plan. Neptune’s documentation is comprehensive and includes many examples.
One standout feature of Neptune is the ability to fork experiment runs from any intermediate step. This is particularly important for large-scale deep learning experiments – such as training foundation models – where training failures due to hardware or network issues are unavoidable. It’s also common to try different parameters and training configurations over the course of a month-long training process.

MLflow

Source

MLflow is an open-source platform for managing the whole machine learning lifecycle (MLOps). Experimentation, reproducibility, deployment, central model registry, it does it all. MLflow is suitable for individuals and for teams of any size.

The tool is library-agnostic. You can use it with any machine learning library, and any programming language.

Launched in 2018, MLflow quickly became the industry standard because of its easy integration with major ML frameworks, tools, and libraries such as Tensorflow, Pytorch, Scikit-learn, Kubernetes and Sagemaker, just to name a few. It has a big community of users and contributors.

MLflow has four main functions that help track and organize experiments and models:

MLflow Tracking – an API and UI for logging parameters, code versions, metrics, and artifacts when running machine learning code, and for later visualizing and comparing the results;
MLflow Projects – packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production;
MLflow Models – managing and deploying models from different ML libraries to a variety of model serving and inference platforms;
MLflow Model Registry – central model store to collaboratively manage the full lifecycle of an MLflow model, including model versioning, stage transitions, and annotations.

MLflow is not only available as an open-source tool you can host yourself, it is also available in a managed format within MLOps platforms:

Since June 2024, Amazon SageMaker no longer maintains its own dedicated experiment tracking SDK. Instead, it offers a managed MLflow capability for experiment tracking. Rather than offering its own dedicated experiment tracking layer, SageMaker now allows users to log experiments using MLflow APIs, enabling greater flexibility and interoperability with external tools in the MLflow ecosystem, while still benefiting from the managed infrastructure and autoscaling capabilities.

MLflow can also be used within Azure Machine Learning, which supports experiment tracking via the MLflow client. You can configure any MLflow code to log runs to an Azure ML workspace. While the backend is proprietary and some MLflow features are limited, the integration enables smooth interoperability with the MLflow API. Additionally, Azure ML benefits from deep integration with the Azure ecosystem, including access to Azure OpenAI models like GPT-4 and DALL-E, as well as enterprise-grade security through Microsoft Entra and Azure’s RBAC model.

These integrations are useful for teams already working in cloud environments but provide partial MLflow compatibility rather than a full replacement of the open-source experience.

Key advantages

Robust experiment tracking with support for logging parameters, metrics, artifacts, etc.
Easily integrates with other tools and libraries (e.g., PyTorch, TensorFlow, Scikit-learn)
Intuitive UI to visualize and compare runs
Large and active community offering support
Free managed service option (MLflow Community edition) with preconfigured ML environments that includes: Pytorch, TF keras and other libraries, ideal for individuals.
Paid managed service option through cloud providers is ideal for teams. It comes with pre-configured compute and SQL storage servers, billed per second:
- Amazon Web Services (AWS) – via Amazon SageMaker
- Azure – via Azure Machine Learning
- Google Cloud – via integrated MLflow-compatible services

Conclusion

Machine Learning Model Management is a fundamental part of the MLOps workflow. It lets us take a model from the development phase to production, making every experiment and/or model version reproducible.

Finally, to recap, there are 4 levels of ML model management:

Level-0, ad-hoc research model management
Level-1, partial model management
Level-2, semi-complete model management
Level-4, complete (end-to-end) model management

At each level, you will be faced with different challenges. The best practices of ML model management are centered around 3 components:

Model
Code
Deployment

As far as tools go, we have a plethora to choose from, but in this article, I described a few popular ones:

neptune.ai
MLflow

I hope this helps you choose the right tool.

With that, thank you for reading this article, and stay tuned for more!

How to Deal With Imbalanced Classification and Regression Data

Prince Canuma — Fri, 22 Jul 2022 06:42:24 +0000

Data imbalance is predominant and inherent in the real world. Data often demonstrates skewed distributions with a long tail. However, most of the machine learning algorithms currently in use were designed around the assumption of a uniform distribution over each target category (classification).

On the other hand, we must not forget that many tasks involve continuous targets and even infinite values (regression), where hard boundaries between classes do not exist (i.e. age prediction, depth estimation, and so on).

Data imbalance | Source: Author

In this article, I’m going to walk you through how to deal with imbalanced data in classification and regression tasks as well as talk about the performance measures you can use for each task in such a setting.

There are 3 main approaches to learning from imbalanced data:

1 Data approach
2 Algorithm approach
3 Hybrid (ensemble) approach

Imbalanced classification data

SMOTE for regression | Source

SMOTE Imbalanced classification is a well explored and understood topic.

In real-life applications, we face many challenges where we only have uneven data representations in which the minority class is usually the more important one and hence we require methods to improve its recognition rates. This issue poses a serious challenge to predictive modeling because learning algorithms will be biased towards the majority class.

Important day-to-day tasks in our lives such as preventing malicious attacks, detecting life-threatening diseases, or handling rare cases in monitoring systems face extreme class imbalance with ratios ranging from 1:1000 up to 1:5000 and one must design intelligent systems that can adjust and overcome such extreme bias.

How to handle an imbalanced dataset – data approach

How would you handle an imbalanced dataset? | Source

It concentrates on modifying the training set to make it suitable for a standard learning algorithm. This can be done by balancing the distributions of the dataset which can be categorized in two ways:

Oversampling
Undersampling

1. Oversampling

Oversampling | Source

In this approach, we synthesize new examples from the minority class.

There are several methods available to oversample a dataset used in a typical classification problem. But the most common data augmentation technique is known as Synthetic Minority Oversampling Technique or SMOTE for short.

Scatter plot of the class distribution before and after SMOTE | Source

As the name suggests, SMOTE creates “synthetic” examples rather than over-sampling with replacement. Specifically, SMOTE works the following way. It starts by randomly selecting a minority class example and finding its k nearest minority class neighbors at random. Then a synthetic example is created at a randomly selected point in the line that connects two examples in feature space.

SMOTE | Source

The created synthetic examples from SMOTE for the minority class when added to the training set, balance the class distributions and cause the classifier to create larger and less specific decision regions helping the classifier generalize better and mitigate overfitting, rather than smaller and more specific regions which will cause the model to overfit to the majority class.

Decision boundaries | Source

This approach is inspired by data augmentation techniques that proved successful in handwritten character recognition where operations like rotation and skew were natural ways to perturb the training data.

Now, let’s take a look at the performance of SMOTE.

Confusion matrix of classifiers trained on data synthetic examples and tested on the imbalanced test set | Source

From the confusion matrix we can notice a few things:

The classifiers trained on synthetic examples generalize well.
The classifiers Identify the minority class well (True Negatives).
They have fewer False Positives compared to undersampling.

Advantages

It improves the overfitting caused by random oversampling as synthetic examples are generated rather than a copy of existing examples.
No loss of information.
It’s simple.

Disadvantages

While generating synthetic examples, SMOTE does not take into consideration neighboring examples that can be from other classes. This can increase the overlapping of classes and can introduce additional noise.
SMOTE is not very practical for high-dimensional data.

2. Undersampling

Undersampling | Source

In this approach, we reduce the number of samples from the majority class to match the number of samples in the minority class.

Scatter plot of the class distribution before and after applying NearMiss-2 | S ource

This can be done in a couple of ways:

Random sampler: It is the easiest and fastest way to balance the data by randomly selecting a few samples from the majority class.
NearMiss: Adds some common sense rules to the selected samples by implementing 3 different heuristics, but in this article, we will only focus on one.
- NearMiss-2 Majority class examples with a minimum average distance to three furthest minority class examples.

Confusion matrix of classifiers trained on undersampled examples and tested on the imbalanced test set | Source

From the confusion matrix we can notice a few things:

Undersampling performs poorly compared to oversampling when it comes to identifying the majority class (True Positive). But besides that, it identifies the minority class better than oversampling and has fewer False Negatives.

Advantages

Data scientists can balance the dataset and reduce the risk of their analysis or machine learning algorithm skewing toward the majority. Because without resampling, scientists might come up with what is known as the accuracy paradox where they run a classification model with 90% accuracy. On closer inspection, though, they will find the results are heavily within the majority class.
Fewer storage requirements and better run times for analyses. Less data means you or your business needs less storage and time to gain valuable insights.

Disadvantages

Removing enough majority examples to make the majority class the same or similar size to the minority class results in a significant loss of data.
The sample of the majority class chosen could be biased, meaning, it might not accurately represent the real world, and the result of the analysis may be inaccurate. Therefore, it can cause the classifier to perform poorly on real unseen data.

Because of these disadvantages, some scientists might prefer oversampling. It doesn’t lead to any loss of information, and in some cases, may perform better than undersampling. But oversampling isn’t perfect either. Because oversampling often involves replicating minority events, it can lead to overfitting.

“The combination of SMOTE and under-sampling performs better than plain under-sampling.”

SMOTE: Synthetic Minority Over-sampling Technique, 2011

To balance these issues, certain scenarios might require a combination of both over and undersampling to obtain the most lifelike dataset and accurate results.

How to handle imbalanced data – algorithm approach

Algorithm approach – best models for imbalanced classification | Source

This approach concentrates on modifying existing models to alleviate their bias towards the majority groups. This requires good insight into the modified learning algorithm and precise identification of reasons for its failure in learning the representations of skewed distributions.

The most popular techniques are cost-sensitive approaches (weighted learners). Here, the given model is modified to incorporate varying penalties for each considered group of examples. In other words, we use Focal loss where we assign a higher weight to the minority class in our cost function which will penalize the model for misclassifying the minority class while at the same time reducing the weight of the majority class, causing the model to pay more attention to the underrepresented class. Thus, boosting its importance during the learning process.

Another interesting algorithm-level solution is to apply one-class learning or one-class classification(OCC for short) that focuses on the target group, creating a data description. This way we eliminate bias towards any group, as we concentrate only on a single set of objects.

OCC can be useful in imbalanced classification problems because it provides techniques for outlier and anomaly detection. It does this by fitting the model on the majority class data (also known as positive examples) and predicting whether new data belong to the majority class or belong to the minority class(also known as negative examples) meaning it’s an outlier/anomaly.

OCC problems usually are practical classification tasks where majority class data is easily available but minority class is hard, expensive, and even impossible to gather, i.e. work of an engine, fraudulent transactions, intrusion detection for the computer system, and so on.

How to deal with imbalanced data – hybrid approach

Hybrid approach | Source

Hybridization is an approach that exploits the strengths of individual components. When it comes to dealing with imbalanced classification data, some works proposed hybridization of sampling and cost-sensitive learning. In other words, combining both data and algorithm level approaches. This idea of two-stage training that merges data-level solutions with algorithm-level solutions (i.e. classifier ensemble), resulting in robust and efficient learners is highly popular.

Example scheme of the hybrid approach | Source

It works by applying a data-level approach first. As you remember the data level approach works by modifying the training set to balance the class distribution between the majority class and the minority by using either oversampling or undersampling.

Then the pre-processed data with balanced class distribution is used to train a classifier ensemble, in other words, a collection of multiple classifiers from which a new classifier is derived which performs better than any constituent classifier. Thus, creating a robust and efficient learner that inherits the strong points of both data and algorithm level approaches while reducing their weaknesses at the same time.

Confusion matrix of hybrid classifiers trained and tested on the imbalanced test set | Source

From the confusion matrix we can notice a few things:

The hybrid classifiers perform better than undersampling when it comes to identifying the majority class
And, is almost as good as both undersampling and oversampling when it comes to identifying the minority class.

Basically takes the best of both worlds!

Performance measures for imbalanced classification

In this section, we review the common performance measures used and their effectiveness when addressing imbalanced classification data.

Confusion matrix
ROC and AUC
Precision recall
F-score

May interest you

F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?

1. Confusion matrix

For binary classification problems, the confusion matrix defines the base for performance measures. Most of the performance metrics are derived from the confusion matrix, i.e., accuracy, misclassification rate, precision, and recall.

Confusion matrix | Source

However, accuracy is not appropriate when the data is imbalanced. Because the model can achieve higher accuracy by just predicting accurately the majority class while performing poorly on the minority class which in most cases is the class we care about the most.

2. ROC and AUC imbalanced data

ROC and AUC imbalanced data | Source

To accommodate the minority class, the Receiver Operating Characteristic (ROC) curve is proposed as a measure over a range of tradeoffs between the True Positive (TP) Rate and False Positive (FP) Rate. Another important performance measure is Area Under the Curve (AUC) is a commonly used performance metric for summarizing the ROC curve in a single score. Moreover, AUC is not biased towards the model’s performance on either the majority or minority class, which makes this measure more appropriate when dealing with imbalanced data.

3. Precision and recall

From the confusion matrix, we can also derive precision and recall performance metrics.

Precision and recall | Source

Precision is great for class imbalance and it’s not affected by it because it doesn’t include the number of True Negatives in its calculation.

One drawback of precision and recall is that like accuracy there might be some imbalance between the two where we want to improve TP for the minority class, however, the number of FP can also increase.

4. F-score

To balance the recall and precision, i.e., improving recall, while keeping precision low, the F-score is proposed as a harmonic mean of the precision and recall.

F-score | Source

Since the F-score weights, precision, and recall equally and balances both concerns, it is less likely to be biased to the majority or minority class. [2]

Check this experiment with the 3 imbalance classification approaches code examples in the Colab notebook I prepared for you.

Imbalanced regression data

Imbalanced regression data | Source

Regression over imbalanced data is not well explored. And, many important real-life applications like the economy, crisis management, fault diagnosis, or meteorology require us to apply regression over imbalanced data which means predicting rare and extreme continuous target values from input data.

Because dealing with imbalanced data is a relevant problem that has been studied mostly in the context of classification tasks, there are scarce mature or suitable strategies to address it in the context of regression.

Let’s first look at the typical approaches adopted from Imbalanced Classification then we will look into some of the best Imbalanced Regression techniques currently being used.

Approachas adopted from imbalanced classification

Data approach

Adopted from Imbalanced classification | Author: Prince Canuma

When it comes to data approaches for imbalanced regression we have two techniques that were heavily inspired on imbalanced classification:

SMOTER
SMOGN

1. SMOTER

SMOTER is an adaptation for regression of the well-known SMOTE algorithm.

It works by defining frequent (majority) and rare (minority) regions using the original label density and then applying random undersampling to the majority region and oversampling to the minority region, where the user has to pre-determine the percentage of over and undersampling to be carried out by the SMOTER algorithm.

When it comes to oversampling the minority regions it not only generates new synthetic examples it also applies an interpolation strategy that combines inputs and targets of different examples. Precisely, this interpolation is carried out using two rare cases where one is a seed case and the other is randomly selected from the k-nearest neighbors to the seed. The features of the two cases are interpolated, and the new target variable is determined as a weighted average of the target variables of the two rare cases used.

Why do we have to average the target variables you might ask? Remember that in the original SMOTE algorithm, this was a trivial question, because as all rare cases have the same region (the target minority region), but in the case of regression the answer is not so trivial because when a pair of examples are used to generate a new synthetic case, they will not have the same target variable value.

2. SMOGN

SMOGN takes after SMOTER but it further adds Gaussian Noise to the oversampling phase alongside the one SMOTER already has.

The key idea of SMOGN algorithm is to combine both SMOTER and Gaussian Noise strategies for generating synthetic examples to simultaneously limit the risks that SMOTER can incur such as lack of diverse examples by using the more conservative strategy of introducing Gaussian Noise because SMOTER will not use the most distant examples in the interpolation process. It works by generating new synthetic examples with SMOTER only when the seed example and the k-nearest neighbor selected are close enough and using the Gaussian noise when the two examples are more distant.

Algorithm approach

Algorithm approach | Source: Author

Like in imbalanced classification this approach also includes adjusting the loss function to compensate for region imbalance (re-weighting) and other relevant learning paradigms such as transfer learning, metric learning, two-stage training, and meta-learning [4]. But we will focus on the first 2 paradigms:

Error-aware loss
Cost-sensitive re-weighting

1. Error-aware loss

It is the regression version of the Focal Loss for classification called Focal-R. Focal loss is a dynamically weighted cross-entropy loss, where the weighting factor(alpha) decays to zero as confidence in the correct class increases.

The focal loss down weights easy examples with a weighting factor of – (1- pt)^γ | Source

Focal-R replaces the weighting factor by a continuous function that maps the absolute error(L1 distance) into values in the range of 0 to 1.

Precisely, Focal-R loss based on L1 distance can be written as:

Focal-R loss based on L1 distance | Source

Where ei is the L1 error for i-th sample, σ(·) is the Sigmoid function, and β, γ are hyper-parameters.

2. Cost-sensitive re-weighting

Since the target space can be divided into finite bins, classic re-weighting schemes can be directly plugged in, such as inverse-frequency weighting(INV) and its square-root weighting variant(SQINV) both of which are based on the label distribution.

Hybrid approach

Hybrid approach | Source

It takes after the hybrid approach for imbalanced classification.

Like the hybrid approach for imbalanced classification, the imbalanced regression hybrid approach also combines data level and algorithm level approaches in order to produce robust and efficient learners.

An example of this approach is the Bagging-based ensemble.

Bagging-based ensemble

This algorithm incorporates data pre-processing strategies for addressing imbalanced domains in regression tasks.

Precisely, a paper entitled “REBAGG: REsampled BAGGing for Imbalanced Regression” proposes an algorithm that obtains diversity on the generated models while simultaneously biasing them towards the least represented and more important cases.

It has two main steps:

Build a number of models using pre-processed samples of the training set.
Use the trained models to obtain predictions on unseen data by applying an averaging strategy (basically averaging models’ predictions to obtain the final predictions).

Regarding the first step, the authors developed four main types of resampling methods to apply to the original training set: balance, balance.SMT, variation, and variation.SMT. The key distinguishing feature of these methods is related with:

i) the ratio between the number of minority and majority examples used in the new sample; and,

ii) how new minority examples are obtained.

On the resampling methods labeled with the prefix “balance”, the new modified training set will have the same number of minority and majority examples. On the other hand, for resampling methods with the prefix “variation”, the ratio of minority to majority examples in the new training set will vary.

When the resampling method has no suffix appended, then the new synthetic examples for minority region are obtained by using exact copies of randomly selected minority examples. And when the suffix “SMT” is appended the new synthetic examples for the minority region are obtained using the SMOTER algorithm.

Deep Imbalanced Regression (DIR)

The methods adopted from imbalanced classification work; however, there are several drawbacks to using them alone.

Allow me to make a case!

Figure 1. Comparison on the test error distribution (bottom) using the same training label distribution (top) on two different datasets | Source

The above datasets have intrinsically different label spaces (a) CIFAR-100 exhibits categorical label space where the target is a class index while (b) IMDB-WIKI exhibits continuous label space where the target is age.

As you can see the label density distribution is the same for both but the error distribution is very different. The error distribution for IMDB-WIKI is much smoother and doesn’t correlate well with the label density distribution and this affects how imbalanced learning methods work because directly or indirectly, they operate by compensating for the imbalance in the empirical label density distribution. This approach works well for imbalanced classification but not for continuous labels. Instead, you have to find a way to smooth the label distribution.

Label distribution smoothing (LDS) for imbalanced data density estimation

Figure 2. Label distribution smoothing (LDS) convolves a symmetric kernel with the empirical label density to estimate the effective label density distribution that accounts for the continuity of labels | Source

From figure 2 above we can see that in the continuous space empirical label distribution does not match the real label density distribution. Why is this? Because of the dependence between data samples at nearby labels, in this case, we are talking about images of close age.

LDS uses kernel density estimation to learn the effective imbalance in datasets that corresponds to continuous targets. Precisely, LDS convolves a symmetric kernel with the empirical density distribution to extract a kernel-smoothed version that accounts for the overlap in the information of data samples of nearby labels.

Note: Gaussian or a Laplacian kernel is a symmetric kernel.

The symmetric kernel characterizes the similarity between target values y’ and y w.r.t their distance in the target space.

Figure 2 at the beginning of this section shows that LDS captures the real imbalance that affects regression. By applying LDS we get a label density distribution that correlates well with error distribution (-0.83).

Once you have the effective label density, you can then use the adapted techniques for addressing imbalanced classification that we talked about earlier (i.e. cost-sensitive re-weighting method).

Feature distribution smoothing (FDS)

Top: Cosine similarity of the feature means at a particular age w.r.t its value at the anchor age. Bottom: Cosine similarity of the feature variance at a particular age w.r.t its value at the anchor age. The color of the background refers to data density in a particular target range | Source

The above figure displays the feature statistics similarity for age 30 (anchor). And you can right away notice that the bins that surround the anchor are highly similar to the anchors, especially the closest ones. But examining the figure further you will notice that there is a problem with regions with very few data samples (i.e. age 0-6 years). Due to data imbalance, the mean and variance show an unjustified high similarity to age 30.

The creators of the Feature distribution smoothing (FDS) algorithm were inspired by these observations and proposed this algorithm that performs distribution smoothing on the feature space, or in other words, transfers feature statistics between nearby target bins. Thus calibrating the potentially biased estimates of feature distribution, especially for underrepresented target values in training data.

Feature distribution smoothing (FDS) | Source

And one great thing about FDS is that you can integrate it into deep neural networks by inserting a feature calibration layer after the final feature map.

Benchmarking

Benchmarking results on STS-B-DIR | Source

Results reported on the Semantic Textual Similarity Benchmark (STS-B-DIR) dataset using various algorithms.

The authors show that when LDS and FDS are coupled with other existing methods to address regression over imbalanced data significantly improves the performance [4].

Performance measures for imbalance regression

When it comes to evaluation metrics for this kind of problem, you can use the common metrics for regression such as MAE, MSE, Pearson, Geometric Mean(GM) alongside the techniques we explored in this section.

Crucial open issues to address when developing novel methods for Imbalanced regression

Development of cost-sensitive regression solutions that can adapt the cost to the degree of importance assigned to rare observations. To allow for more flexibility in predicting rare events of differing importance it would be rather interesting to investigate the possibility of adapting the cost not only to the minority group but to each individual observation.
Methods that will allow distinguishing between minority and noisy samples must be proposed.
Development of better ensemble learning methods as in classification may offer a significant improvement in both robustness to skewed distributions and predictive power.

Conclusion

Source

Canonical ML algorithms assume that the number of objects in considered classes is roughly similar. However, in many real-life problems that we can apply ML to, the distribution of examples is skewed since the events that we care the most about and want to predict happen rarely and for the most part, we collect data points of normal events which represent the normal state and majority group. This poses a difficulty for learning algorithms, as they will be biased towards the majority group.

But in this article, you learned about the different approaches to learning from imbalanced classification and regression data.

Thank you for reading! And as always I have a well-researched reference section that you can use to dive deeper into what you read below as well as a colab notebook.

References

MLflow vs Kubeflow vs neptune.ai: What Are the Differences?

Prince Canuma — Thu, 21 Jul 2022 13:52:46 +0000

As a Data Scientist, ML/DL Researcher, or Engineer you might have come across or heard about MLflow, Kubeflow, and neptune.ai. Due to the large adoption of ML and DL, many questions arose around deployment, scalability, and reproducibility. Thus MLOps was born as a hybrid of Data engineering, DevOps, and Machine Learning.

We had to come up with this new way of doing this for ML because ML Development is complex.

The natural question is why?

Naturally, you might think it’s because Math, Algorithms, resources needed (GPUs, TPUs, CPUs…), data, APIs, libraries, and frameworks. Well, some of it is true but not entirely because nowadays most of it abstracted away for us. If we take Hugging face or fast.ai for example you just call an instance of a particular class and boom the framework/library does all the heavy lifting for you. Furthermore, with the development of transfer learning we no longer need vast amounts of data to train a model.

Then where does the complexity come from?

The complexity comes from a few of things:

ML is experimental in nature
It has more parts to account for, such as: data (gathering, labelling, versioning), model (training, eval, versioning, and deployment), and configuration (hyperparameters and so on).
The paradigm of how we do traditional software development (DevOps) is different from how we do ML (MLOps).

As MLOps matures many tools have been and are being created to address different parts of the workflow and of the many these 3 tools play key roles in an MLOps workflow to reduce the complexity and solve problems which we are going to talk about in later sections.

Source

Now, what exactly do they do and how do they compare against each other?

In this article, we are going to answer those questions and more. The following are the points we are addressing:

Tools
- MLflow
- Kubeflow
- neptune.ai
Which one should you use and when?
High-level feature comparison table

Let’s dive right in!

MLflow

It is an open-source MLOps platform that was born from learning the standards of Big Tech with the focus on creating transferable knowledge, ease of use, modularity and compatibility with popular ML libraries and frameworks. It was designed for a 1 or 1000+ person organisation.

MLFlow allows you to develop, track (and compare experiments), package and deploy locally or remotely. It handles everything from data versioning, model management, experiment tracking till deployment except data sourcing, labeling and pipelining.

It is pretty much the jack of all trades and/or swiss knife of the MLOps workflow.

Source

This platform is made of a of 4 components:

MLflow Tracking
MLflow Projects
MLflow Models
Just Model Registry

Let’s go deeper and see the importance of every single one of these components and how they work.

MLflow Tracking

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing and comparing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs.

As mentioned before MLFlow allows for local or remote development, therefore both entity and artifact store are customisable meaning you can save locally or on the cloud ( AWS s3, GCP and so on)

Key concepts in Tracking

Parameters: key-value inputs to your code
Metrics: numeric values (can be update overtime)
Tags & Notes: information about the run
Artifacts: Files, Data & Models
Source: what code ran?
Version: what version of the code ran?
Run: an instance of code that run by MLFlow where metrics and parameters will be logged

Tracking APIs

Fluent MLFlow APIs (High-level)
MLFlow client (Low-level)

MLflow Projects

An MLflow Project is a self contained unit of execution that bundles the following:

Code
Config
Dependencies
Data

To deploy it either locally or on a remote server.

This format helps with reproducibility and allows for the creation of a multi-step workflow with separate projects (or entry points in the same project) as the individual steps.

Source

In other words MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject file, which is basically a YAML formatted text file.

MLflow Models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.

Source

Flavors are the key concept that makes MLFlow Models powerful: they are a convention that deployment tools can use to understand the model. Basically we abstract the model by creating an intermediate format that packages the model that you want to deploy into a variety of environments — much like a docker file for models or a lambda function that you can deploy to a desired environment and just invoke its scoring function called predict.

Model Registry

Source

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.

Kubeflow

Kubeflow is an open-source project that leverages Kubernetes to build scalable MLOps pipelines and orchestrate complicated workflows. You can view it as a machine learning (ML) toolkit for Kubernetes.

Note: Kubernetes (or K8s for short) is a container orchestration tool.

Source

Now, two questions arise:

Why containerize your ML applications?
Why ML on K8s?

Why containerize you ML applications

Usually environments are different for different people in a team setting and these differences can go as far as:

Dependencies (Libraries, Frameworks and versions)
Code (helper functions, Training and evaluation)
Configurations (data transformations, network architecture, batch size and so on)
Software and Hardware

This can cause various problems if two or members are to collaborate or take after someone’s work and make improvements.

But through containers one can simply send a docker image and as long as the other person has docker installed locally or in his cloud env. he can easily recreate the same environment, experiments and results.

Benefits of containers

Packages:
- Code
- Dependencies
- Configurations

Helps create ML envs that are:
- Lightweight
- Portable
- Scalable

Why ML on K8s?

Source

As I mentioned before K8s is a container orchestration tool. It makes automating deployment, scaling, and management of containerized applications. But the trouble is in managing k8s itself which can be heptic. But nowadays there exist different providers of managed k8s as a service such as: AWS EKS, Google GKE and Azure AKS.

Using a managed k8s as a service allows ML practitioners to take full advantage of the benefits that k8s bring such as:

Composability
Portability
Scalability
Or it’s already part of the company or team workflow

Now that we got that out of the way, let’s take a more detailed look at Kubeflow.

Kubeflow components

Kubeflow is composed of various projects/tools but here we are going to focus on the 4 major ones:

Notebooks
Pipelines
Training
Serving

Notebooks

Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you’re ready.

Pipelines

Source

This is perhaps the most famous project and the reason a lot of teams opt for kubeflow. In a nutshell kubeflow pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers – it is available as a kubeflow component or as a standalone installation.

At the heart of this project lie two components:

Pipeline – is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each pipeline component.
Pipeline component – is a self-contained set of user code, packaged as a Docker image, that performs one step in the pipeline. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on.

Pipeline features

A user interface (UI) for managing and tracking experiments, jobs, and runs.
An engine for scheduling multi-step ML workflows.
An SDK for defining and manipulating pipelines and components.
Notebooks for interacting with the system using the SDK.
Reusability: enabling you to re-use components and pipelines without having to rebuild each time.

Training

This project offers you different frameworks for training ML models such as:

Chainer Training
MPI Training
MXNet Training
PyTorch Training
Job Scheduling
TensorFlow Training (TFJob)

Here you can execute training jobs, monitor the training and much more. One of the cool features is actually being able to easily define and take advantage of kubernetes replicas which allows you to spin multiple identical versions of a container image. Therefore, if one or more replicas fails during a training job your progress is not completely lost because you have another version running in parallel.

Serving

When it comes to serving models kubeflow offers great support.

Kubeflow has a component called KFServing that enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.

KFServing can be used to do the following:

Provide a Kubernetes Custom Resource Definition for serving ML models on arbitrary frameworks.
Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments.
Enable a simple, pluggable, and complete story for your production ML inference server by providing prediction, pre-processing, post-processing and explainability out of the box.

Furthermore, besides KFserving, Kubeflow supports TensorFlow Serving containers to export trained TensorFlow models to Kubernetes. It is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale. Finally, it also supports BentoML, an open-source platform for high-performance ML model serving. It makes building production API endpoint for your ML model easy and supports all major machine learning training frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn and etc

But it doesn’t end there, on top of everything you can run Kubeflow on Kubernetes Engine and AWS, GCP or Azure. Let’s take AWS for example, Kubeflow has an integration with AWS Sagemaker that allows you to take full advantages of scale that come with such a managed service.

In my opinion I don’t think end-to-end ML platforms are the way to go. For more details you can later read this article where I explain this in detail, once you finish this one.

I believe microservices give you more flexibility to plug in any new service to your pipeline or replace a broken service/component or tool but such integrations as kubeflow and these different cloud providers can let you build more robust solutions.

neptune.ai

neptune.ai is a metadata store for MLOps, built for research and production teams that run a lot of experiments.

It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.

Thousands of ML engineers and researchers use Neptune for experiment tracking and model registry both as individuals and inside teams at large organizations.

Now, a question might arise: why a metadata store?

Why a metadata store?

Unlike notes, organization protocols or open-source tools a metadata store is as I mentioned before a centralized place but it is also lightweight, automatic, and maintained by the organization (in this case Neptune) or community so that people can focus on actually doing ML rather than metadata bookkeeping.

Furthermore, a metadata store is a tool that serves as a connector between different parts/phases/tools of the MLOps workflow.

Benefit of a metadata store

Log and display all metadata types including Parameters, Images, HTML, Audio, Video
Organize and compare experiments in a dashboard
See model training live
Have it (metadata store) maintained and backed up by someone (not you)
Debug and compare experiments and models with no extra effort
Both database and dashboard scale with thousands of experiments
Help ease the transition from research to production
Easy to build custom libs/tools on top of it

Now that we got that out of the way, let’s take a more detailed look at Neptune.

Neptune components

Neptune is made of 3 major components:

Data versioning
Experiment tracking
Model registry

Data versioning

Comparing datasets in neptune.ai

Version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa. In other words this feature helps track which dataset, or subset of the dataset, we used to train a particular version of the model and thus enabling and facilitating experiment reproducibility.

With the data versioning functionality in Neptune, you can:

Keep track of a dataset version in your model training runs with artifacts
Query the dataset version from previous runs to make sure you are training on the same dataset version
Group your Neptune Runs by the dataset version they were trained on

Experiment tracking

This feature of Neptune helps you to organize your ML experimentation in a single place by:

Logging and displaying metrics, parameters, images, and other ML metadata
Searching, grouping, and comparing experiments with no extra effort
Visualizing and debugging experiments live as they are running
Sharing results by sending a persistent link
Querying experiment metadata programmatically

Model registry

This feature allows you to have your model development under control by organizing your models in a central model registry, making them repeatable and traceable.

Meaning you can version, store, organize, and query models during the model development till deployment. The metadata saved includes:

Dataset, code, env config versions
Parameters and evaluation metrics
Model binaries, descriptions, and other details
Testset prediction previews and model explanations

Furthermore, it also enables teams either geographically close or distant to collaborate on experiments because everything that your team logs to Neptune is automatically accessible to every team member. So reproducibility is no longer a problem.

You can access model training run information like the code, parameters, model binary, or other objects via an API.

With Neptune, you can replace folder structures, spreadsheets, and naming conventions with a single source of truth where all your model building metadata is organized, easy to find, share, and query.

This tool gives you control over models and experiments by keeping a record of everything that happens during model development.

This equals less time spent looking for configs and files, context switching, unproductive meetings and more time for quality ML work. With Neptune, you don’t have to implement loggers, maintain databases or dashboards, or teach people how to use them.

You can get the most out of your computational resources by keeping track of all ideas you have already tried and how much resources you used. Monitor your ML runs live and react quickly when runs fail, or models stop converging.

Finally, Neptune allows you to build reproducible, compliant, and traceable models by versioning all your model training runs, and also allows you to know who built the production model, which dataset and parameters were used, and how it performed at any time.

Now, just tell me which one and when to use it

MLflow

If you want a MLOps platforms that is powered by the open-source community that allows you to:

Track, visualize and compare experiment metadata
UI that allows you to visualize and compare experiment results
Develop (package and deploy) models
A platform that allows you to create a multi-step workflow (much like Kubeflow pipelines but without using containers)

And a way to abstract the model thus allowing to easily deploy it into a variety of environments then MLflow is the way to go.

Kubeflow

If you want a end-to-end open-source platform that allows you to:

Manage and set resource quotas across different teams as well as to code, run and track experiment metadata either locally or in the cloud
The ability to build reproducible pipelines with components that span the entire ML Lifecycle (from data gathering all the way to model building and deployment) then kubeflow is the way to go
UI that allows you to visualize your pipeline and experiment metadata as well as compare experiment results.
Built-in Notebook server service

Finally, your K8s environment might have limited resources but both K8s and kubeflow have an integration with AWS Sagemaker that enable the use of fully managed Sagemaker ML tools across the ML workflow natively from Kubernetes or Kubeflow which means you can take advantage of it’s capability to scale resources (i.e. GPU instances) and it’s services (i.e. Sagemaker Ground Truth, Model Monitor etc).

This eliminates the need for you to manually manage and optimize your Kubernetes-based ML infrastructure while still preserving control over orchestration and flexibility.

neptune.ai

If you want centralized place:

To store all your metadata (data versioning, experiment tracking and model registry)
That has Intuitive and customizable UI that allows you to visualize and compare experiment results as well as arrange the displayed data as you wish
Has a project wiki that facilitates sharing reports, insights, and remarks about the project’s progress, runs and data exploration Notebooks
Notebook checkpointing (for Jupyter)
That has easy and seamless integrations with most of best tools as well as MLOps platforms in the industry
- For example, Neptune has an integration with MLflow and many other libraries, tools and ML/DL Frameworks.
- If an integration is not available you can add it to your notebook, .py project or containerized ML project (in case you are using Kubernetes or Kubeflow) powered by your favorite libraries, tools and framework such as Pytorch using the python client.

Finally, if you want a fully managed service or if you want more control there is the server version, then Neptune is the way to go.

High-level feature comparison table

	MLflow	Kubeflow	neptune.ai
Pricing	Free	Free	Freemium
Free Plan limitations	No limits	No limits	Free for individuals, non-profit and educational research Paid for teams
Open-source	Yes	Yes	No
Easy to use	Easy	There is a learning curve	Easy
Composability	Yes	Yes	Yes
Portability	Yes	Yes	Yes
Scalability	Yes	Yes	Yes
Customizable	Yes	Limited	Yes
On-prem version	Yes	Yes	Yes
Managed service version	No	Yes	Yes

Conclusion

In the end, the choice is in your hands, it depends on your requirements and needs but I want you to know that this is not an either-or situation. These tools are not mutually exclusive from one another, you can mix and match them as per your requirements and wishes.

It could be Kubeflow with MLflow or Kubeflow with neptune.ai as well as MLflow with neptune.ai.

Let me elaborate, for example Kubeflow and MLflow or Kubeflow and Neptune, in these two cases Kubeflow might not have a direct integration but you can add MLflow or Neptune to the pipeline component (aka containerized app).

Now when it comes to MLflow and Neptune it is much easier because Neptune has an integration with MLflow.

Thus, you are not stuck using only one tool.

With that we have come full circle, below is a ton of references for you to check out and devour. Have fun!

Thank you!

References

MLflow

Kubeflow

neptune.ai

MLOps: What It Is, Why It Matters, and How to Implement It

Prince Canuma — Thu, 21 Jul 2022 13:40:38 +0000

What is this MLOps thing?

It was the question I had on my mind, but until recently (I’m writing it in the late 2020) , I had only heard about MLOps a few times at big AI conferences, I saw some mentions in papers I read over the years, but I didn’t know anything specific.

Interestingly enough, around the same time, I had a conversation with a friend who works as a Data Mining Specialist in Mozambique, Africa. Recently they started to create their in-house ML pipeline, and coincidentally I was starting to write this article while doing my own research into the mysterious area of MLOps to put everything in one place.

In this conversion, I’ve learned more about the many pain points that both legacy companies (and many tech companies doing commercial ML) have regarding:

Moving to the cloud;
Creating and managing ML pipelines;
Scaling;
Dealing with sensitive data at scale;
And about a million other problems.

And so I made it my duty to dive in deep and conduct extensive research and learn as much as I could as I was writing down my own notes and ideas.

The result is this article.

But why research this topic now?

According to techjury, every person created at least 1.7 MB of data per second in 2020. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed.

But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:

acquiring & cleaning large amounts of data;
setting up tracking and versioning for experiments and model training runs;
setting up the deployment and monitoring pipelines for the models that do get to production.

And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.

There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps’ solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.

That brings us to MLOps. It was born at the intersection of DevOps, Data Engineering, and Machine Learning, and it’s a similar concept to DevOps, but the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.

Let’s dig in!

What is MLOps?

MLOps (Machine Learning Operations) is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases the quality, simplifies the management process, and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments. It’s easier to align models with business needs, as well as regulatory requirements.

MLOps is slowly evolving into an independent approach to ML lifecycle management. It applies to the entire lifecycle – data gathering, model creation (software development lifecycle, continuous integration/continuous delivery), orchestration, deployment, health, diagnostics, governance, and business metrics.

The key phases of MLOps are:

Data gathering
Data analysis
Data transformation/preparation
Model training & development
Model validation
Model serving
Model monitoring
Model re-training.

DevOps vs MLOps

Source: NealAnalytics

DevOps and MLOps have fundamental similarities because MLOps principles were derived from DevOps principles. But they’re quite different in execution:

Unlike DevOps, MLOps is much more experimental in nature. Data Scientists and ML/DL engineers have to tweak various features – hyperparameters, parameters, and models – while also keeping track of and managing the data and the code base for reproducible results. Besides all the efforts and tools, the ML/DL industry still struggles with the reproducibility of experiments. This topic is out of the scope of this article, so for more information check the reproducibility subsection in references at the end.

Hybrid team composition: the team needed to build and deploy models in production won’t be composed of software engineers only. In an ML project, the team usually includes data scientists or ML researchers, who focus on exploratory data analysis, model development, and experimentation. They might not be experienced software engineers who can build production-class services.

Testing: testing an ML system involves model validation, model training, and so on – in addition to the conventional code tests, such as unit testing and integration testing.

Automated Deployment: you can’t just deploy an offline-trained ML model as a prediction service. You’ll need a multi-step pipeline to automatically retrain and deploy a model. This pipeline adds complexity because you need to automate the steps that data scientists do manually before deployment to train and validate new models.

Production performance degradation of the system due to evolving data profiles or simply Training-Serving Skew: ML models in production can have reduced performance not only due to suboptimal coding but also due to constantly evolving data profiles. Models can decay in more ways than conventional software systems, and you need to plan for it. This can be caused by:

A discrepancy between how you handle data in the training and serving pipelines.
A change in the data between when you train and when you serve.
Feedback loop – when you choose the wrong hypothesis (i.e. objective) to optimize, which makes you collect biased data for training your model. Then, without knowing, you collect newer data points using this flawed hypothesis, it’s fed back in to retrain/fine-tune future versions of the model, making the model even more biased, and the snowball keeps growing. For more information read Fastbook’s section on Limitations Inherent To Machine Learning.

Monitoring: models in production need to be monitored. Similarly, the summary statistics of data that built the model need to be monitored so that you can refresh the model when needed. These statistics can and will change over time, you need notifications or a roll-back process when values deviate from your expectations.

MLOps and DevOps are similar when it comes to continuous integration of source control, unit testing, integration testing, and continuous delivery of the software module or the package.

However, in ML there are a few notable differences:

Continuous Integration (CI) is no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models.
Continuous Deployment (CD) is no longer about a single software package or service, but a system (an ML training pipeline) that should automatically deploy another service (model prediction service) or roll back changes from a model.
Continuous Testing (CT) is a new property, unique to ML systems, that’s concerned with automatically retraining and serving the models.

End-to-end machine learning platform | Source

MLOps vs experiment tracking vs ML model management

We’ve defined what MLOps is, what about experiment tracking and ML model management?

Experiment tracking

Experiment tracking is a part (or process) of MLOps focused on collecting, organizing, and tracking model training information across multiple runs with different configurations (hyperparameters, model size, data splits, parameters, and so on).

As mentioned earlier, because ML/DL is so experimental in nature, we use experiment tracking tools for benchmarking different models created either by different companies, teams or team members.

Model management

To ensure that ML models are consistent and all business requirements are met at scale, a logical, easy-to-follow policy for model management is essential.

MLOps methodology includes a process for streamlining model training, packaging, validation, deployment, and monitoring. This way you can run ML projects consistently from end-to-end.

By setting a clear, consistent methodology for Model Management, organizations can:

Proactively address common business concerns (such as regulatory compliance);
Enable reproducible models by tracking data, models, code, and model versioning;
Package and deliver models in repeatable configurations to support reusability.

Why does MLOps matter?

MLOps is fundamental. Machine learning helps individuals and businesses deploy solutions that unlock previously untapped sources of revenue, save time, and reduce cost by creating more efficient workflows, leveraging data analytics for decision-making, and improving customer experience.

These goals are hard to accomplish without a solid framework to follow. Automating model development and deployment with MLOps means faster go-to-market times and lower operational costs. It helps managers and developers be more agile and strategic in their decisions.

MLOps serves as the map to guide individuals, small teams, and even businesses to achieve their goals no matter their constraints, be it sensitive data, fewer resources, small budget, and so on.

You decide how big you want your map to be because MLOps are practices that are not written in stone. You can experiment with different settings and only keep what works for you.

MLOps best practices

At first, I wanted to just list 10 best practices, but after some research, I came to the conclusion that it would be best to cover the best practices for different components of an ML pipeline, namely: Team, Data, Objective, Model, Code, and Deployment.

The following list is distilled from various sources mentioned in the references:

Team

Data

Objective (Metrics & KPIs)

Model

Code

Deployment

These best practices will serve as the foundation on which you will build your MLOps solutions, with that said we can now dive into the implementation details.

How to implement MLOps

According to Google, there are three ways you can go about implementing MLOps:

MLOps level 0 (Manual process)
MLOps level 1 (ML pipeline automation)
MLOps level 2 (CI/CD pipeline automation)

MLOps level 0

This is typical for companies that are just starting out with ML. An entirely manual ML workflow and the data-scientist-driven process might be enough if your models are rarely changed or trained.

Characteristics

Manual, script-driven, and interactive process: every step is manual, including data analysis, data preparation, model training, and validation. It requires manual execution of each step and manual transition from one step to another.
Disconnect between ML and operations: the process separates data scientists who create the model, and engineers who serve the model as a prediction service. The data scientists hand over a trained model as an artifact for the engineering team to deploy on their API infrastructure.
Infrequent release iterations: the assumption is that your data science team manages a few models that don’t change frequently—either changing model implementation or retraining the model with new data. A new model version is deployed only a couple of times per year.
No Continuous Integration (CI): because few implementation changes are assumed, you ignore CI. Usually, testing the code is part of the notebooks or script execution.
No Continuous Deployment (CD): because there aren’t frequent model version deployments, CD isn’t considered.
Deployment refers to the prediction service (i.e. a microservice with REST API)
Lack of active performance monitoring: the process doesn’t track or log model predictions and actions.

The engineering team might have their own complex setup for API configuration, testing, and deployment, including security, regression, and load + canary testing.

Challenges

In practice, models often break when they’re deployed in the real world. Models fail to adapt to changes in the dynamics of the environment or changes in the data that describes the environment. Forbes has a great article on this: Why Machine Learning Models Crash and Burn in Production.

To address the challenges of this manual process, it’s good to use MLOps practices for CI/CD and CT. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline

MLOps level 1

The goal of MLOps level 1 is to perform continuous training (CT) of the model by automating the ML pipeline. This way, you achieve continuous delivery of model prediction service.

This scenario may be helpful for solutions that operate in a constantly changing environment and need to proactively address shifts in customer behavior, price rates, and other indicators.

Characteristics

Rapid experiment: ML experiment steps are orchestrated and done automatically.
CT of the model in production: the model is automatically trained in production, using fresh data based on live pipeline triggers.
Experimental-operational symmetry: the pipeline implementation that’s used in the development or experiment environment is used in the preproduction and production environment, which is a key aspect of MLOps practice for unifying DevOps.
Modularized code for components and pipelines: to construct ML pipelines, components need to be reusable, composable, and potentially shareable across ML pipelines (i.e. using containers).
Continuous delivery of models: the model deployment step, which serves the trained and validated model as a prediction service for online predictions, is automated.
Pipeline deployment: in level 0, you deploy a trained model as a prediction service to production. For level 1, you deploy a whole training pipeline, which automatically and recurrently runs to serve the trained model as the prediction service.

Additional components

Data and model validation: the pipeline expects new, live data to produce a new model version that’s trained on the new data. Therefore, automated data validation and model validation steps are required in the production pipeline.
Feature store: a feature store is a centralized repository where you standardize the definition, storage, and access of features for training and serving.
Metadata management: information about each execution of the ML pipeline is recorded in order to help with data and artifacts lineage, reproducibility, and comparisons. It also helps you debug errors and anomalies
ML pipeline triggers: you can automate ML production pipelines to retrain models with new data, depending on your use case:
- On-demand
- On a schedule
- On availability of new training data
- On model performance degradation
- On significant changes in the data distribution (evolving data profiles).

Challenges

This setup is suitable when you deploy new models based on new data, rather than based on new ML ideas.

However, you need to try new ML ideas and rapidly deploy new implementations of the ML components. If you manage many ML pipelines in production, you need a CI/CD setup to automate the build, test, and deployment of ML pipelines.

MLOps level 2

For a rapid and reliable update of pipelines in production, you need a robust automated CI/CD system. With this automated CI/CD system, your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters.

This level fits tech-driven companies that have to retrain their models daily, if not hourly, update them in minutes, and redeploy on thousands of servers simultaneously. Without an end-to-end MLOps cycle, such organizations just won’t survive.

This MLOps setup includes the following components:

Source control
Test and build services
Deployment services
Model registry
Feature store
ML metadata store
ML pipeline orchestrator.

Characteristics

Development and experimentation: you iteratively try out new ML algorithms and new modeling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps, which are then pushed to a source repository.
Pipeline continuous integration: you build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage.
Pipeline continuous delivery: you deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.
Automated triggering: the pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a newly trained model that is pushed to the model registry.
Model continuous delivery: you serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.
Monitoring: you collect statistics on model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.

The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process.

Building vs buying vs hybrid MLOps infrastructure

Cloud computing companies have invested hundreds of billions of dollars in infrastructure and management.

To give you a bit of context, a canalys report states that public cloud infrastructure spending reached $77.8 billion in 2018, and it grew to $107 billion in 2019. According to another study by IDC, with a five-year compound annual growth rate (CAGR) of 22.3%, cloud infrastructure spending is estimated to grow to nearly $500 Billion by 2023.

Spending on cloud infrastructure services reached a record $30 billion in the second quarter of 2020, with Amazon Web Services (AWS), Microsoft, and Google Cloud accounting for half of customer spend.

From a vendor perspective, AWS market share remained at a “long-standing mark” of around 33% during the second quarter of 2020, followed by Microsoft at 18%, and Google Cloud at 9%. Meanwhile, Chinese cloud providers now account for over 12% of the worldwide market, led by Alibaba, Tencent and Baidu.

These companies invest in research & development of specialized hardware, software, and SaaS applications, but also MLOps software. Two great examples come to mind:

AWS with its Sagemaker, a fully managed end-to-end cloud ML-platform that enables developers to create, train, and deploy machine-learning models in the cloud, embedded systems, and edge-devices.
Google with its recently announced AI Platform Pipelines for building and managing ML pipelines, leveraging TensorFlow Extended (TFX’s) pre-built components and templates that do a lot of model deployment work for you.

Now, should you build or buy your infrastructure? Maybe you should go hybrid?

Tech companies that want to survive long-term usually have in-house teams and build custom solutions. If they have the skills, knowledge, and tools to tackle complex problems, there’s nothing wrong with that approach. But there are other factors that are worth taking into account, like:

time and effort
human resources
time to profit
opportunity cost.

Time and effort

According to a survey by cnvrg.io, data scientists often spend their time building solutions to add to their existing infrastructure in order to complete projects. 65% of their time was spent on engineering heavy, non-data science tasks such as tracking, monitoring, configuration, compute resource management, serving infrastructure, feature extraction, and model deployment.

This wasted time is often referred to as ‘hidden technical debt’, and is a common bottleneck for machine learning teams. Building an in-house solution, or maintaining an underperforming solution can take from 6 months to 1 year. Even once you’ve built a functioning infrastructure, just to maintain the infrastructure and keep it up-to-date with the latest technology requires lifecycle management and a dedicated team.

Human resources

Operationalizing machine learning requires a lot of engineering. For a smooth machine learning workflow, each data science team must have an operations team that understands the unique requirements of deploying machine learning models.

Investing in an end-to-end MLOps platform, these processes can be completely automated, making it easier for operations teams to focus on optimizing their infrastructure.

Cost

Having a dedicated operations team to manage models can be expensive on its own. If you want to scale your experiments and deployments, you’d need to hire more engineers to manage this process. It’s a major investment, and a slow process to find the right team.

An out-of-the-box MLOps solution is built with scalability in mind, at a fraction of the cost. After calculating all the different costs associated with hiring and onboarding an entire team of engineers, your return on investment drops, which brings us to our next factor.

Time to profit

It can take over a year to build a functioning machine learning infrastructure. It can take even longer to build a data pipeline that can produce value for your organization.

Companies like Uber, Netflix, and Facebook have dedicated years and massive engineering efforts to scale and maintain their machine learning platforms to stay competitive.

For most companies, an investment like this is not possible, and also not necessary. The machine learning landscape has matured since Uber, Netflix and Facebook originally built their in-house solutions.

There are more pre-built solutions that offer all you need out-of-the-box, at a fraction of the cost. For example, cnvrg.io customers can deliver profitable models in less than 1 month. Instead of building all the infrastructure necessary to make their models operational, data scientists can focus on research and experimentation to deliver the best model for their business problem.

Opportunity cost

As mentioned above, one survey shows that 65% of a data scientist’s time is spent on non-data science tasks. Using an MLOps platform automates technical tasks and reduces DevOps bottlenecks.

Data scientists can spend their time doing more of what they were hired to do – deliver high-impact models – while the cloud provider takes care of the rest.

Adopting an end-to-end MLOps platform has a considerable competitive advantage that allows your machine learning development to scale massively.

What about Hybrid MLOps infrastructure?

Some companies have been entrusted with private & sensitive data. It can’t leave their servers because in the chance of a small vulnerability, the ripple effect would be catastrophic. This is where Hybrid cloud infrastructure for MLOps comes in.

At the moment, cloud infrastructure exists side-by-side with on-premise systems in most cases.

Hybrid cloud management is complex, but often necessary. According to the 2020 Cloud infrastructure report by Cloudcheckr, today’s infrastructure is a mix of cloud and on-prem.

Cloud infrastructure is increasingly popular, but it’s still rare to find a large company that has completely abandoned on-premise infrastructure (most of them for obvious reasons, like sensitive data).

Another study by RightScale shows that Hybrid cloud adoption grew to 58% in 2019 from 51% in 2018. It’s understandable because there’s a wide range of reasons for continuing to keep infrastructure on-prem.

Why does your company keep maintaining on-prem infrastructure?

Source

Managing hybrid infrastructure is challenging

It’s not a walk in the park to manage any type of enterprise technology infrastructure. There are always issues related to security, performance, availability, cost, and much more.

Hybrid cloud environments add an additional layer of complexity that makes managing IT even more challenging.

The vast majority of cloud stakeholders (96%) face challenges managing both on-prem and cloud infrastructure.

What challenges does your company face in managing both on-prem and cloud infrastructure?

Source

“Other” issues reported included the need for a completely different skill set, lack of access to specialized compute and storage. Also, having to shift existing employees roles to dedicate them to manage the on-prem systems and finally dealing with ongoing reliability issues of the same (i.e. Timeout, Data resource missing, Computing resource missing, Software failure, Database failure, Hardware failure, and Network failure).

Building your own platform and infrastructure will take more and more of your focus and attention as demand increases. The time that could be spent on model R&D and data collection will be taken by infrastructure management. This isn’t great unless it’s part of your core business (if you’re a cloud service provider, PaaS or IaaS).

Buying a fully managed platform gives you great flexibility and scalability, but then you’re faced with compliance, regulations, and security issues.

Hybrid cloud infrastructure for MLOps is the best of both worlds, but it poses unique challenges, so it’s up to you to decide if it fits your business model.

Note: I have a few ideas on possible future directions on securing, streaming, allowing statistical studies on sensitive data, but that’s a different topic for a future article perhaps.

Conclusion

Now that you have identified which level your company is at, you can go with one of two MLOps solutions:

End-to-end
Custom-built MLOps solution (the ecosystem of tools)

End-to-end MLOps solution

These are fully managed services that provide developers and data scientists with the ability to build, train, and deploy ML models quickly. The top commercial solutions are:

Amazon Sagemaker, a suite of tools to build, train, deploy, and monitor machine learning models
Microsoft Azure MLOps suite:
- Azure Machine Learning to build, train, and validate reproducible ML pipelines
- Azure Pipelines to automate ML deployments
- Azure Monitor to track and analyze metrics
- Azure Kubernetes Services and other additional tools.
Google Cloud MLOps suite:
- Dataflow to extract, validate, and transform data as well as to evaluate models
- AI Platform Notebook to develop and train models
- Cloud Build to build and test machine learning pipelines
- TFX to deploy ML pipelines
- Kubeflow Pipelines to arrange ML deployments on top of Google Kubernetes Engine (GKE).

Custom-built MLOps solution (the ecosystem of tools)

End-to-end solutions are great, but you can also build your own with your favorite tools, by dividing your MLOps pipeline into multiple microservices.

This approach can help you avoid a single point of failure (SPOF), and make your pipeline robust — this makes your pipeline easier to audit, debug, and more customizable. In case a microservice provider is having problems, you can easily plug in a new one.

The most recent example of SPOF was the AWS outage, it’s very rare but it can happen. Even Goliath can fall.

Microservices ensure that each service is interconnected instead of embedded together. For example, you can have separate tools for model management and experiment tracking.

Finally, there are many MLOps tools available, I’m just going to mention my top 7 picks with one honorable mention:

Project Jupyter
Nbdev
Airflow
Kubeflow
MLflow
Optuna
Cortex
Honorable mention: neptune.ai (for its easy and scalable experiment tracking and compatibility with a lot of tools like Sagemaker and MLflow; if there isn’t an integration guide or pre-built solution, you can use their Python client API to build a custom integration)

By leveraging these and many other tools, you can build an end-to-end solution by joining various micro-services together.

For more detailed information on the best MLOps tools available, see Best MLOps Tools by Jakub Czakon.

MLOps is a fresh area that’s rapidly developing, with new tools and processes coming out all the time. If you get on the MLOps train now, you’re gaining a huge competitive advantage.

In order to help you do so, below is a ton of references for you to check out and devour. Have fun!

Acknowledgments

Special thanks to my dear friend Richaldo Elias whom I mentioned in the introduction. He always brings up topics or problems that inspire my creativity, and this article wouldn’t have been the same without him sharing some of the issues that he has had while building ML Projects at Scale.

Prince Canuma, Autor w serwisie neptune.ai

Machine Learning Model Management: What It Is, Why You Should Care, and How to Implement It

What is Machine Learning Model Management?

Why does Machine Learning Model Management matter?

ML Model Management components

Best practices for Machine Learning Model Management

Model

Code

Deployment

ML Model Management vs Experiment Tracking

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

How to implement ML Model Management

Level-0

Characteristics

Pros

Cons

Challenges

Level-1

Characteristics

Pros

Cons

Challenges

Level-2

Characteristics

Pros

Cons

Challenges

Level-3

Characteristics

Pros

Cons

Challenges

Building vs using existing ML Model Registry tools

What is Machine Learning (ML) model registry?

Tools for Machine Learning Model Management

neptune.ai

MLflow

Key advantages

Conclusion

How to Deal With Imbalanced Classification and Regression Data

Imbalanced classification data

How to handle an imbalanced dataset – data approach

1. Oversampling

Advantages

Disadvantages

2. Undersampling

Advantages

Disadvantages

How to handle imbalanced data – algorithm approach

How to deal with imbalanced data – hybrid approach

Performance measures for imbalanced classification

May interest you

1. Confusion matrix

2. ROC and AUC imbalanced data

3. Precision and recall

4. F-score

Imbalanced regression data

Approachas adopted from imbalanced classification

Data approach

1. SMOTER

2. SMOGN

Algorithm approach

1. Error-aware loss

2. Cost-sensitive re-weighting

Hybrid approach

Bagging-based ensemble

Deep Imbalanced Regression (DIR)

Label distribution smoothing (LDS) for imbalanced data density estimation

Feature distribution smoothing (FDS)

Benchmarking

Performance measures for imbalance regression

Crucial open issues to address when developing novel methods for Imbalanced regression

Conclusion

References

MLflow vs Kubeflow vs neptune.ai: What Are the Differences?

MLflow

MLflow Tracking

The Best MLflow Alternatives

MLflow Projects

MLflow Models