<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Prince Canuma, Autor w serwisie neptune.ai</title>
	<atom:link href="https://neptune.ai/blog/author/prince-canuma/feed" rel="self" type="application/rss+xml" />
	<link></link>
	<description>The experiment tracker for foundation model training.</description>
	<lastBuildDate>Tue, 29 Apr 2025 12:26:27 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://i0.wp.com/neptune.ai/wp-content/uploads/2022/11/cropped-Signet-1.png?fit=32%2C32&#038;ssl=1</url>
	<title>Prince Canuma, Autor w serwisie neptune.ai</title>
	<link></link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">211928962</site>	<item>
		<title>Machine Learning Model Management: What It Is, Why You Should Care, and How to Implement It</title>
		<link>https://neptune.ai/blog/machine-learning-model-management</link>
		
		<dc:creator><![CDATA[Prince Canuma]]></dc:creator>
		<pubDate>Wed, 24 Aug 2022 10:11:58 +0000</pubDate>
				<category><![CDATA[ML Tools]]></category>
		<category><![CDATA[MLOps]]></category>
		<guid isPermaLink="false">https://neptune.test/machine-learning-model-management/</guid>

					<description><![CDATA[Machine learning is on the rise. With that, new issues keep popping up, and ML developers along with tech companies keep building new tools to take care of these issues.&#160; If we look at ML in a very basic way, we can say that ML is conceptually software with a bit of added intelligence but&#8230;]]></description>
										<content:encoded><![CDATA[
<p>Machine learning is on the rise. With that, new issues keep popping up, and ML developers along with tech companies keep building new tools to take care of these issues.&nbsp;</p>



<p>If we look at ML in a very basic way, we can say that ML is conceptually software with a bit of added <strong>intelligence</strong> but unlike traditional software ML is experimental in nature. Compared to traditional software development, it has some new components in the mix, such as: robust data, model architecture, model code, hyperparameters, features, just to name a few. So, naturally, the tools and development cycles are different, too. Software had DevOps, machine learning has MLOps.</p>



<p>If it sounds unfamiliar, here’s a short overview of <a href="/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective" target="_blank" rel="noreferrer noopener">DevOps and MLOps</a>:</p>



<p><strong>DevOps</strong> is a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.</p>



<p><strong>MLOps</strong> is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases end-quality, simplifies the management process, and automates the deployment of machine learning and deep learning models in large-scale production environments. It makes it easier to align models with business needs and regulatory requirements.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img data-recalc-dims="1" fetchpriority="high" decoding="async" width="1200" height="628" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=1200%2C628&#038;ssl=1" alt="MLOps cycle" class="wp-image-40345" srcset="https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?w=1200&amp;ssl=1 1200w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=768%2C402&amp;ssl=1 768w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=200%2C105&amp;ssl=1 200w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=220%2C115&amp;ssl=1 220w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=120%2C63&amp;ssl=1 120w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=160%2C84&amp;ssl=1 160w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=300%2C157&amp;ssl=1 300w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=480%2C251&amp;ssl=1 480w, https://i0.wp.com/neptune.ai/wp-content/uploads/2024/08/MLOps_cycle.png?resize=1020%2C534&amp;ssl=1 1020w" sizes="(max-width: 1000px) 100vw, 1000px" /><figcaption class="wp-element-caption">MLOps cycle</figcaption></figure>
</div>


<p>The key phases of MLOps are:</p>



<ul class="wp-block-list">
<li>Data gathering</li>



<li>Data analysis</li>



<li>Data transformation/preparation</li>



<li>Model development</li>



<li>Model training</li>



<li>Model validation&nbsp;</li>



<li>Model serving&nbsp;</li>



<li>Model monitoring&nbsp;</li>



<li>Model re-training</li>
</ul>



<p>We’re going to do a deep dive into this process, so grab a cup of your favorite drink and let’s go!</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-what-is-machine-learning-model-management">What is Machine Learning Model Management?</h2>



<p>Model management is a part of MLOps. ML models should be consistent, and meet all business requirements at scale. To make this happen, a logical, easy-to-follow policy for model management is essential. ML model management is responsible for <strong>development, training, versioning</strong> and <strong>deployment</strong> of ML models.&nbsp;</p>



<p><strong><em>Note</em></strong><em>: Versioning also includes </em><strong><em>data</em></strong><em>, so we can track which dataset, or subset of the dataset, we used to train a particular version of the model.</em></p>



<p>When researchers work on novel ML models, or apply them to a new domain, they run countless experiments (model training &amp; testing) with different model architectures, optimizers, loss functions, parameters, hyperparameters and data. They use these experiments to get to the best model configuration which generalizes the best, or has the best performance-to-accuracy compromise on the dataset.&nbsp;</p>



<p>But, without a way to track model performance and configurations in different experiments, all hell can (and will) break loose, because we won’t be able to compare and choose the best solution. Even if it’s just one researcher experimenting independently, keeping track of all experiments and results is hard.</p>



<p>That’s why we do model management. It lets us, our teams and our organizations:</p>



<ul class="wp-block-list">
<li>Proactively address common business concerns (such as regulatory compliance);</li>



<li>Enable reproducible experiments by tracking metrics, losses, code, data and model versioning;</li>



<li>Package and deliver models in repeatable configurations to support reusability.</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-why-does-machine-learning-model-management-matter">Why does Machine Learning Model Management matter?</h2>



<p>As I mentioned previously Model Management is a fundamental part of any ML pipeline (MLOps). It makes it easier to manage the ML life-cycle from creation, configuration, experimentation, tracking the different experiments, all the way to model deployment.&nbsp;</p>



<p>Now, let’s go a little bit deeper, by making a clear distinction between different parts of ML Model Management. It is important to notice that within ML Model management we manage two things:</p>



<ul class="wp-block-list">
<li><strong>Models</strong>: Here we take care of model packaging, model lineage, model deployment &amp; deployment strategies (A/B testing and etc), monitoring and model retraining (happens when the deployed model’s performance drops below a set threshold).</li>



<li><strong>Experiments:</strong> Here we take of logging training metrics, loss, images, text or any other metadata you might have as well as&nbsp;code, data &amp; pipeline versioning,&nbsp;</li>
</ul>



<p>Without model management, data science teams would have a very hard time creating, tracking, comparing, recreating, and deploying models.&nbsp;</p>



<p>The alternative to model management are ad-hoc practices, which lead researchers to create ML projects that are not repeatable, unsustainable, unscalable and unorganized.</p>



<p>Now, besides that according to research conducted by AMY X. ZHANG∗, MIT et al. on <a href="https://arxiv.org/abs/2001.06684v2" target="_blank" rel="noreferrer noopener nofollow">how DS workers collaborate</a> show that teams of DS workers collaborate extensively on leveraging ML to extract insights from data, as opposed to individual data scientists working alone. And in order to collaborate effectively they employ the best collaborative practices (i.e. documentation, code versioning and so on) and tools between team members with prior being highly dependent on the latter.</p>



<p>MLOps facilitates collaboration but most of today’s understanding of data science collaboration only focuses on the perspective of the data scientist, and how to build tools to support globally dispersed and asynchronous collaborations among data scientists, such as version control of code. The technical collaborations afforded by such tools only scratch the surface of the many ways that collaborations may happen within a data science team, such as:</p>



<ul class="wp-block-list">
<li>When stakeholders discuss the framing of an initial problem before any code is written or data collected</li>



<li>Commenting on experiments</li>



<li>Taking over someone elses notebook or code as a baseline to built upon</li>



<li>Researchers and Data Scientist train, evaluate and tag models&nbsp;so that an MLE knows that a model should be reviewed (i.e. A/B testing) and promoted to production (model deployment)</li>



<li>Having a shared repository where business stakeholders can review production models.</li>
</ul>



<section
	id="i-box-block_3da5b78887d4e288ad7ffb03e5922a7f"
	class="block-i-box  l-margin__top--large l-margin__bottom--x-large">

			<header class="c-header">
			<img
				src="https://neptune.ai/wp-content/themes/neptune/img/image-ratio-holder.svg"
				data-src="https://neptune.ai/wp-content/themes/neptune/img/blocks/i-box/header-icon.svg"
				width="24"
				height="24"
				class="c-header__icon lazyload"
				alt="">

			
            <h2 class="c-header__text animation " style='max-width: 100%;'   >
                <strong>Aside</strong>
            </h2>		</header>
	
	<div class="block-i-box__inner">
		

<div
    id="custom-text-block_7867d9c1ba64d5281442f995b17a9a59"
    class="block-custom-text  white l-padding__top--0 l-padding__bottom--0"
    style="max-width: 100%; font-size: 1rem; line-height: 1.33; font-weight: 600;"
    >
    
    Use neptune.ai reports to share project milestones and experimentation results across the team and organization.
    </div>



<div id="group-of-boxes-block_98b1a575e655a42b6e200af751e3d90d" class="b-group-of-boxes  l-padding__top--large l-padding__bottom--large">

<div
    class="c-wrapper c-wrapper--align-auto c-wrapper--align-vertical-auto" >
    <div class="b-group-of-boxes__grid l-grid--cols-2  l-grid--boxes">
        

	<div
		class="c-box c-box--transparent c-box--dark c-box--no-hover c-box--micro c-box--vertical-center c-box--horizontal-flex-start c-box--paddings-none  l-margin__top--0 l-margin__bottom--0">
		

<p>Explain how your model works, monitor performance over time, visualize your findings, discuss bugs, and showcase the progress made.</p>



<ul
    id="arrow-list-block_35ff9ddaa6eb54840c8c7252b1acbbc5"
    class="block-arrow-list block-list-item--font-size-regular">
    

<li class="block-list-item ">
    <img decoding="async"
        src="https://neptune.ai/wp-content/themes/neptune/img/image-ratio-holder.svg"
        data-src="https://neptune.ai/wp-content/themes/neptune/img/blocks/list-item/arrow.svg"
        width="10"
        height="10"
        class="block-list-item__arrow lazyload"
        alt="">

    

<p>Check the <a previewlistener="true" href="https://docs-beta.neptune.ai/reports" target="_blank" rel="noreferrer noopener">documentation</a></p>


</li>



<li class="block-list-item ">
    <img decoding="async"
        src="https://neptune.ai/wp-content/themes/neptune/img/image-ratio-holder.svg"
        data-src="https://neptune.ai/wp-content/themes/neptune/img/blocks/list-item/arrow.svg"
        width="10"
        height="10"
        class="block-list-item__arrow lazyload"
        alt="">

    

<p>Play with an <a href="https://scale.neptune.ai/o/examples/org/LLM-Pretraining/reports/9e6a2cad-77e7-42df-9d64-28f07d37e908" target="_blank" rel="noreferrer noopener nofollow">interactive example project</a></p>


</li>



<li class="block-list-item ">
    <img decoding="async"
        src="https://neptune.ai/wp-content/themes/neptune/img/image-ratio-holder.svg"
        data-src="https://neptune.ai/wp-content/themes/neptune/img/blocks/list-item/arrow.svg"
        width="10"
        height="10"
        class="block-list-item__arrow lazyload"
        alt="">

    

<p><a previewlistener="true" href="/contact-us" target="_blank" rel="noreferrer noopener">Get in touch</a>&nbsp;if you’d like to go through a custom demo with our product team</p>


</li>


</ul>


	</div>



	<div
		class="c-box c-box--transparent c-box--dark c-box--no-hover c-box--micro c-box--vertical-flex-start c-box--horizontal-flex-start c-box--paddings-none  l-margin__top--0 l-margin__bottom--0">
		

<div id="app-screenshot-block_c5d427e1e7daed6a53e8ca0115dcc017"
	class="block-app-screenshot js-block-with-image-full-screen-modal "
	data-video-url=""
	data-show-controls="false"
	data-unmute="false"
	data-button-icon="https://neptune.ai/wp-content/themes/neptune/img/icon-close.svg"
	data-image-full-screen-modal="https://i0.wp.com/neptune.ai/wp-content/uploads/2024/11/Reporting.png?fit=1020%2C577&#038;ssl=1"
>

			<div class="block-app-screenshot__image-wrapper">
			<div class="block-app-screenshot__bar">
				<figure class="block-app-screenshot__bar-buttons-wrapper">
					<img
						src="https://neptune.ai/wp-content/themes/neptune/img/blocks/app-screenshot/bar-buttons.svg"
						width="34"
						height="9"
						class="block-app-screenshot__bar-buttons"
						alt="">
				</figure>
			</div>

			
				<img
					srcset="
					https://i0.wp.com/neptune.ai/wp-content/uploads/2024/11/Reporting.png?fit=480%2C271&#038;ssl=1 480w,					https://i0.wp.com/neptune.ai/wp-content/uploads/2024/11/Reporting.png?fit=768%2C434&#038;ssl=1 768w,					https://i0.wp.com/neptune.ai/wp-content/uploads/2024/11/Reporting.png?fit=1020%2C577&#038;ssl=1 1020w"
					alt=""
					style=""
					width="1020"
					height="577"
					class="block-app-screenshot__image"
				>

			
			<div class="block-app-screenshot__overlay">

				
					<a
						href="https://scale.neptune.ai/o/examples/org/LLM-Pretraining/reports/9e6a2cad-77e7-42df-9d64-28f07d37e908"
						class="c-button c-button--primary c-button--small c-button--cta">
						<img
							decoding="async"
							loading="lazy"
							src="https://neptune.ai/wp-content/themes/neptune/img/icon-button--test-tube.svg"
							width="16"
							height="19"
							target="_blank" rel="nofollow noopener noreferrer"							class="c-button__icon"
							alt=""
						/>

													<span class="c-button__text">
								See in app							</span>
						
					</a>

				
														<button
						class="js-c-image-full-screen-modal c-button c-button--tertiary c-button--small">
						<img
							decoding="async"
							loading="lazy"
							src="https://neptune.ai/wp-content/themes/neptune/img/icon-zoom.svg"
							width="16"
							height="17"
							class="c-button__icon"
							alt="zoom"
						/>

						<span class="c-button__text">
							Full screen preview						</span>
						
					</button>
									
			</div>

		</div>

			
</div>


	</div>


    </div>
</div>


</div>


	</div>

</section>



<p><strong>What is the extent of collaboration on data science teams?</strong></p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Data-science-team.png?ssl=1" alt="" class="wp-image-41760"/><figcaption class="wp-element-caption"><em><a href="https://arxiv.org/pdf/2001.06684v2.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p><strong>Rates of Collaboration</strong>: Among the five data science roles of Figure above, three roles reported collaboration at rates of 95% or higher. As you can clearly see, these roles are the core roles in a ML team.</p>



<p>The study also shows that Researchers, Data Scientist, ML Engineers collaborate extensively and play a key role throughout the <strong>development, training, evaluation (i.e. Accuracy, performance, bias) versioning</strong> and <strong>deployment</strong> of ML models (ML Model Management)</p>



<p>Not convinced yet? Here are six more reasons why model management matters:</p>



<ul class="wp-block-list">
<li>Allows for a single source of truth;</li>



<li>Allows for versioning of the code, data, and model artifacts for benchmarking and reproducibility;</li>



<li>It’s easier to debug/mitigate problems ( i.e. overfitting, underfitting, performance and/or bias) &#8212; thus making the ML solution easily traceable and compliant with the regulations;</li>



<li>You can do faster, better research and development;</li>



<li>Teams become efficient and have a clear sense of direction.</li>



<li>ML Model management can facilitate intra team and/or inter team collaboration around code, data and documentation through the use of various best practices and tools (JupyterLab, Colab, neptune.ai, MLflow,  etc);</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-ml-model-management-components">ML Model Management components</h2>



<p>Before we continue here is a glossary of the common components of ML Model Management workflow:</p>



<ul class="wp-block-list">
<li><a href="/blog/best-7-data-version-control-tools-that-improve-your-workflow-with-machine-learning-projects" target="_blank" rel="noreferrer noopener"><strong>Data Versioning</strong></a>: Version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa.</li>



<li><strong>Code Versioning/Notebook checkpointing</strong>: It is used to manage changes to the model’s source code.</li>



<li><a href="/blog/best-ml-experiment-tracking-tools"><strong>Experiment Tracker</strong></a>: It is used for collecting, organizing, and tracking model training/validation information/performance across multiple runs with different configurations (lr, epochs, optimizers, loss, batch size and so on) and datasets (train/val splits and transforms).</li>



<li><a href="/blog/ml-model-registry-best-tools"><strong>Model Registry</strong></a><strong>:</strong> Is simply a centralized tracking system for trained, staged and deployed ML models</li>



<li><a href="/blog/ml-model-monitoring-best-tools"><strong>Model Monitoring</strong></a><strong>:</strong> It is used to track the models inference performance and identify any signs of Serving Skew which is when data changes cause the deployed model performance to degrade below the score/accuracy it displayed in the training environment.&nbsp;</li>
</ul>



<p>Now that we know the different components of model management&nbsp;and what they do, let&#8217;s look into some of the best practices.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-best-practices-for-machine-learning-model-management">Best practices for Machine Learning Model Management</h2>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Model-management-best-practices.png?ssl=1" alt="Model management best practices" class="wp-image-41761" style="width:563px;height:450px"/><figcaption class="wp-element-caption"><em><a href="https://www.sqlsoldier.com/wp/wp-content/uploads/2012/02/BestPractices.jpg" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>The following is a list of <strong>ML model management best practices</strong>:</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-model">Model&nbsp;</h3>



<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_4_keep_the_first_model_simple_and_get_the_infrastructure_right" target="_blank" rel="noreferrer noopener nofollow">Keep the first model simple and get the infrastructure right</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_14_starting_with_an_interpretable_model_makes_debugging_easier" target="_blank" rel="noreferrer noopener nofollow">Starting with an interpretable model makes debugging easier</a></li>



<li><strong>Training</strong>
<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/best_practices/02-train_metric/" target="_blank" rel="noreferrer noopener nofollow">Capture the Training Objective in a Metric that is Easy to Measure and Understand</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-archive_old_feature/" target="_blank" rel="noreferrer noopener nofollow">Actively Remove or Archive Features That are Not Used</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-peer_review_mdl/" target="_blank" rel="noreferrer noopener nofollow">Peer Review Training Scripts</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-parallel_training/" target="_blank" rel="noreferrer noopener nofollow">Enable Parallel Training Experiments</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-auto_hyperparams/" target="_blank" rel="noreferrer noopener nofollow">Automate Hyper-Parameter Optimisation</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-measure_mdl_quality/" target="_blank" rel="noreferrer noopener nofollow">Continuously Measure Model Quality and Performance</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-data_version/" target="_blank" rel="noreferrer noopener nofollow">Use Versioning for Data, Model, Configurations and Training Scripts</a></li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-code">Code</h3>



<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/blog/2020/regr_test/" target="_blank" rel="noreferrer noopener nofollow">Run Automated Regression Tests</a></li>



<li><a href="https://se-ml.github.io/best_practices/03-use_static_analysis/" target="_blank" rel="noreferrer noopener nofollow">Use Static Analysis to Check Code Quality</a></li>



<li><a href="https://se-ml.github.io/best_practices/03-cont-int/" target="_blank" rel="noreferrer noopener nofollow">Use Continuous Integration</a></li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-deployment">Deployment</h3>



<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_16_plan_to_launch_and_iterate" target="_blank" rel="noreferrer noopener nofollow">Plan to launch and iterate</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-auto_model_packaging/" target="_blank" rel="noreferrer noopener nofollow">Automate Model Deployment</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-monitor_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Continuously Monitor the Behaviour of Deployed Models</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-rollback_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Enable Automatic Rollbacks for Production Models</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_41_when_performance_plateaus_look_for_qualitatively_new_sources_of_information_to_add_rather_than_refining_existing_signals" target="_blank" rel="noreferrer noopener nofollow">When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-shadow_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Enable Shadow Deployment</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_40_keep_ensembles_simple" target="_blank" rel="noreferrer noopener nofollow">Keep ensembles simple</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-log_production/" target="_blank" rel="noreferrer noopener nofollow">Log Production Predictions with the Model&#8217;s Version, Code Version and Input Data</a></li>



<li><strong>Human Analysis of the System &amp; Training-Serving Skew</strong>
<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_23_you_are_not_a_typical_end_user" target="_blank" rel="noreferrer noopener nofollow">You are not a typical end user</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_24_measure_the_delta_between_models" target="_blank" rel="noreferrer noopener nofollow">Measure the delta between models</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_25_when_choosing_models_utilitarian_performance_trumps_predictive_power" target="_blank" rel="noreferrer noopener nofollow">When choosing models, utilitarian performance trumps predictive power.</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_37_measure_trainingserving_skew" target="_blank" rel="noreferrer noopener nofollow">Perform evolving data profiles checks</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_33_if_you_produce_a_model_based_on_the_data_until_january_5th_test_the_model_on_the_data_from_january_6th_and_after" target="_blank" rel="noreferrer noopener nofollow">If you produce a model based on the data until January 5th, test the model on the data from January 6th and after</a></li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-ml-model-management-vs-experiment-tracking">ML Model Management vs Experiment Tracking</h2>



<p><a href="/experiment-tracking" target="_blank" rel="noreferrer noopener">Experiment tracking</a> is a part of model management, so it’s also a part of the larger MLOps approach. Experiment tracking is about collecting, organizing, and tracking model training/validation information across multiple runs with different configurations (hyperparameters, model size, data splits, parameters, etc).&nbsp;</p>



<p>As mentioned earlier, ML/DL is experimental in nature, and we use experiment tracking tools for benchmarking different models.</p>



<p>Experiment tracking tools have 3 main features:</p>



<ul class="wp-block-list">
<li><strong>Logging</strong>: log experiment metadata (metrics, loss, configurations, images and so on);</li>



<li><strong>Version Control: </strong>track both data and model versions, which is very useful in a production environment and can help with debugging and future improvements;</li>



<li><strong>Dashboard</strong>: visualize all logged and versioned data, use visual components (graphs) to compare performance and rank different experiments.</li>
</ul>


    <a
        href="/blog/ml-experiment-tracking"
        id="cta-box-related-link-block_3ffa03ddd12b6acc799cb152bbdd5f7b"
        class="block-cta-box-related-link  l-margin__top--standard l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-ml-experiment-tracking-what-it-is-why-it-matters-and-how-to-implement-it">                 ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-how-to-implement-ml-model-management">How to implement ML Model Management</h2>



<p>Before we move on, let me tell you a short story.</p>



<p>Last year I had a lot of problems with some of my customers because I didn’t track my experiments:</p>



<ul class="wp-block-list">
<li>I couldn’t compare different experiments effectively and I did everything from my memory, so projects got delayed.</li>



<li>I relied heavily on ensembling to try to patch the flaws of the individual models which only partially worked but mainly led nowhere.</li>



<li>Not logging the results of experiments also created problems long term, where I couldn’t recall the performance of previous versions of the model.</li>



<li>Deploying the right model was tricky because it was never clear which one was the best, which code, transformations and data was used.</li>



<li>Reproducibility was impossible.</li>



<li>CI/CD and CT were impossible to implement with such artisanal Model Management.</li>
</ul>



<p>I did some research, found out about ML model management, and decided to try an actual experiment tracking tool to speed up my process. Now, I don’t even start a project without my favorite experiment tracking tool, <a previewlistener="true" href="/" target="_blank" rel="noreferrer noopener"><strong>neptune.ai</strong></a>.&nbsp;</p>



<p>I keep using it both in production and research, such as custom ML model projects I develop for my customers, and in my final year CSE degree project.</p>



<p>There are many other tools out there, some of which are full-blown platforms for managing the whole ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. We’ll discuss these tools in a bit.</p>



<p>So, after using an experiment tracking tool coupled with a model lifecycle platform (in my case, MLflow) on projects with different scale and needs, I found 4 ways of implementing ML model management:</p>



<ul class="wp-block-list">
<li><strong>Level-0</strong><strong>&nbsp;</strong>
<ul class="wp-block-list">
<li>Logging</li>
</ul>
</li>



<li><strong>Level-1</strong>
<ul class="wp-block-list">
<li>Logging + Model and Data version control</li>
</ul>
</li>



<li><strong>Level-2&nbsp;</strong>
<ul class="wp-block-list">
<li>Logging + Code, Model and Data version control</li>
</ul>
</li>



<li><strong>Level-3&nbsp;</strong>
<ul class="wp-block-list">
<li>Logging + Code, Model and Data version control +&nbsp;Model deployment and monitoring</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-level-0">Level-0</h3>



<h4 class="wp-block-heading">Characteristics</h4>



<p>I call this <em>ad-hoc research model management</em>. At this level, you’re just using an experiment tracking tool for <strong>logging</strong>. Great for beginners starting out with ML, or advanced researchers doing rapid prototyping to prove if a hypothesis is worth pursuing.</p>



<p>This level allows individuals, teams, and organizations to record and query their experiment:</p>



<ul class="wp-block-list">
<li>Metrics (accuracy, IoU, Bleu score and so on)</li>



<li>Loss (MSE, BCE, CE and so on)</li>



<li>Config (parameters, hyperparameters)</li>



<li>Model performance results from training and testing</li>
</ul>



<h4 class="wp-block-heading">Pros</h4>



<ul class="wp-block-list">
<li>Ad-hoc data science</li>



<li>Research- and rapid prototype-driven</li>
</ul>



<h4 class="wp-block-heading">Cons</h4>



<ul class="wp-block-list">
<li>No data versioning</li>



<li>No model versioning</li>



<li>No notebook checkpoint&nbsp;</li>



<li>No CI/CD Pipeline</li>



<li>Lack of Reproducibility</li>
</ul>



<h4 class="wp-block-heading">Challenges</h4>



<p>Usually, us data scientists enjoy running multiple experiments to test different ideas, code and model configurations and datasets. At this level, this is quite challenging.</p>



<ul class="wp-block-list">
<li>First, you don’t follow any DS Project Management methodology that will give you a clear direction. Therefore, without <a href="https://www.kdnuggets.com/2019/02/data-science-agile-cycles-method-managing-projects-hi-tech-industry.html" target="_blank" rel="noreferrer noopener nofollow">standardised methodologies for managing data science projects,</a> you will often rely on ad hoc practices that are not repeatable, not sustainable, and unorganized.</li>



<li>Second, datasets are constantly being updated, so even though you log the metrics, loss and configuration, you don’t know which version of the dataset was used to train a specific model.</li>



<li>Third, code also might change with each experiment run, so despite saving all the model configuration, you might not know which code was used in which experiment. </li>



<li>Fourth, even if you save the models weights, you might not know which model was trained using a specific configuration and dataset.</li>
</ul>



<p>All of these challenges make it impossible to reproduce the results of any particular experiment. In order to address the challenges of this level, a good start is to add versioning to our models and data &#8211; some experiment tools do this out-of-the-box. This way we can make partial reproducibility possible.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-level-1">Level-1</h3>



<h4 class="wp-block-heading">Characteristics</h4>



<p>I call this <em>partial model management</em>. Generally used for well-structured teams doing rapid prototyping. At this level, besides experiment tracking, you’re also storing the model and its metadata (configuration), as well as the dataset or data split used to train it, in a central repository that will be used as a single source of truth.&nbsp;</p>



<h4 class="wp-block-heading">Pros</h4>



<ul class="wp-block-list">
<li>Has data versioning</li>



<li>Has model versioning</li>



<li>Experiments are partially reproducible</li>



<li>Ad-hoc data science</li>



<li>Research and rapid prototype-driven</li>
</ul>



<h4 class="wp-block-heading">Cons</h4>



<ul class="wp-block-list">
<li>No CI/CD pipeline</li>



<li>Lack of reproducibility</li>



<li>No notebook checkpointing</li>
</ul>



<h4 class="wp-block-heading">Challenges&nbsp;</h4>



<p>This level is good for testing ideas quickly without fully committing to any of them. It might work great in a research setting, where the goal is to just try out interesting ideas, and compare the experiment across different individuals, teams or companies. We’re not yet thinking about shipping them to production.</p>



<p>Although we can reproduce the experiment from the model metadata and dataset used to train it, at this level we still haven’t fully solved reproducibility. We just partially solved it. In order to go full circle, we need one more component &#8211; notebook checkpointing, so that we can track code changes.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-level-2">Level-2</h3>



<h4 class="wp-block-heading">Characteristics</h4>



<p>I call this <em>semi-complete model management</em>. It’s great for individuals, teams and companies who want to not only quickly test their hypothesis, but also deploy their models to a production environment.</p>



<p>This level allows individuals and organizations to keep a full history of experiments by storing and versioning their notebooks/code, data and model, besides just logging metadata. This takes us full circle, making reproducibility a reality and easy to achieve regardless of the ML/DL frameworks or toolset used. At this level, you usually also apply standardised methodologies for managing data science projects.</p>



<h4 class="wp-block-heading">Pros</h4>



<ul class="wp-block-list">
<li>Has data versioning</li>



<li>Has model versioning</li>



<li>Has notebook checkpointing</li>



<li>Experiments are fully reproducible</li>



<li>Coupled with a DS project management approach</li>



<li>Production-driven</li>
</ul>



<h4 class="wp-block-heading">Cons</h4>



<ul class="wp-block-list">
<li>No CI/CD pipeline</li>
</ul>



<h4 class="wp-block-heading">Challenges&nbsp;</h4>



<p>You have automated everything at this level, except one thing: model deployment. This creates stress. Every time you have a new trained model ready for deployment, you have to manually deploy it. In order to complete the ML model management pipeline, you need to integrate CI/CD.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-level-3">Level-3</h3>



<h4 class="wp-block-heading">Characteristics&nbsp;</h4>



<p>I call this <em>end-to-end model management</em>. At this level, you have a completely automated pipeline, from model development, versioning, to deployment. This level offers a production-grade setup, and is great for individuals, teams and organizations looking for a complete, automated workflow. Once you set it up, you don’t have to do ops work anymore. You can focus on tweaking and improving the model and data sources.&nbsp;&nbsp;</p>



<h4 class="wp-block-heading">Pros</h4>



<ul class="wp-block-list">
<li>Has data versioning</li>



<li>Has model versioning</li>



<li>Has notebook checkpointing</li>



<li>Experiments are fully reproducible</li>



<li>Coupled with a DS project management approach</li>



<li>Production-driven</li>



<li>CI/CD pipeline</li>
</ul>



<h4 class="wp-block-heading">Cons</h4>



<ul class="wp-block-list">
<li>No CT pipeline</li>
</ul>



<h4 class="wp-block-heading">Challenges&nbsp;</h4>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/ML-lifecycle-model-management.png?ssl=1" alt="ML lifecycle model management" class="wp-image-41775"/><figcaption class="wp-element-caption"><em><a href="https://towardsdatascience.com/model-management-in-productive-ml-software-110d2d2cb456" target="_blank" rel="noreferrer noopener nofollow">ML lifecycle</a></em></figcaption></figure>
</div>


<p>There is only one thing missing at this point &#8211; a way to continuously monitor deployed models. Also known as a CT (continuous testing) pipeline, it’s used to monitor a deployed model, and automatically retrain and serve a new model if the currently deployed model’s performance drops below a set threshold. Let’s take a computer vision model, like ResNet, in a production environment. In order to add CT, it would be as simple as monitoring and logging the following:</p>



<ul class="wp-block-list">
<li>Data sent to server (image, video, mp3, test, and so on)&nbsp;</li>



<li>The model’s prediction</li>



<li>Confidence score</li>



<li><a href="https://heartbeat.fritz.ai/class-activation-maps-visualizing-neural-network-decision-making-92efa5af9a33" target="_blank" rel="noreferrer noopener nofollow">Class activation maps (CAM)</a>, or the improved <a href="https://www.pyimagesearch.com/2020/03/09/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning/" target="_blank" rel="noreferrer noopener nofollow">Grad-CAM</a> for better explainability as to why it predicted a certain label and where it focused</li>
</ul>



<p>To add this functionality to the mix, you can re-use the same code from Level-0 or Level-1 for logging metadata during training, and use it for inference.&nbsp;</p>



<p>Tools like Neptune and MLflow let you install their software locally, so you can add this capability to your deployment server. Neptune is more robust here, and offers a second option with a lightweight web version of their software for both individuals and teams, so no need to install and configure anything just create a new project on their dashboard. Just Add a few lines to your deployment code and it’s done.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-building-vs-using-existing-ml-model-registry-tools">Building vs using existing ML Model Registry tools&nbsp;</h2>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Databrick-model-registry.png?ssl=1" alt="Databrick model registry" class="wp-image-41776"/><figcaption class="wp-element-caption"><a href="https://databricks.com/wp-content/uploads/2019/10/model-registry-dash.png" target="_blank" rel="noreferrer noopener nofollow">Source</a></figcaption></figure>
</div>


<h3 class="wp-block-heading" class="wp-block-heading" id="h-what-is-machine-learning-ml-model-registry">What is Machine Learning (ML) model registry?</h3>



<p>An ML model registry is simply a centralized tracking system for trained, staged and deployed ML models. It also tracks who created the model, as well as the data used to train it. It does this by using a database to store model lineage, versioning, metadata and configuration.</p>



<p>It’s relatively easy to build your own simple model registry. You can do it by using a few native or cloud services like the AWS S3 Bucket, RDBMS (Postgresql, Mongo&#8230;), and writing a simple python API to make it easy to update the database records in case changes or updates.</p>



<p>Although relatively easy to build a model registry does it mean you should do it? Is it really worth your time, money and resources?</p>



<p>To answer this questions let’s first look at the reasons why you might want to build your own ML model registry:</p>



<ul class="wp-block-list">
<li><strong>Privacy:</strong> Your data can’t leave your premises.</li>



<li><strong>Curiosity:</strong> Like me, you enjoy building things.</li>



<li><strong>Business:</strong> You run or work for a company that builds ML tools, and you want to add it to an existing product, or as a new service for customers.</li>



<li><strong>Cost:</strong> Existing tools are too expensive for your budget.</li>



<li><strong>Performance: </strong>Existing tools don’t meet your performance requirements.</li>
</ul>



<p>All valid reasons, except maybe cost, because most existing tools are open-source or freemium.&nbsp;</p>



<p>If your concern is performance, some tools offer great performance because they offer dedicated cloud server instances, with very little setup on your part.</p>



<p>Now, if your concern is privacy, most tools also offer an on-prem<strong> </strong>version of their software, which you can download and install in your organisation&#8217;s server to get full control over the data coming in and out. This way you can comply with laws and regulations, and keep your data safe.</p>



<p>In my honest opinion I think there is a common misconception when it comes to build vs buy. Something which usually more mature teams/devs understand right off the bat, but the ML community at large still doesn&#8217;t really get it.&nbsp;</p>



<p>The cost of hosting, maintaining, documenting, fixing, updating and adjusting the open-source software is usually orders of magnitude larger than the cost of vendor tools.&nbsp;</p>



<p>The thing is, it is usually relatively easy to build a simple, not-scalable and not-documented, system for yourself.&nbsp;</p>



<p>&#8230; but going from this to a system that you can have your entire team work on it very quickly becomes awfully expensive.&nbsp;</p>



<p>Also when you decide to build it (not even open-source) you will end up with someone needing to build/maintain it and ML engineers and devops folks salaries are not cheap.&nbsp;</p>



<p>Generally there is a good rule of thumb -&gt; if the system (like ML model registry) is not your core business, and it usually isn&#8217;t than you should focus on your core business (for example building models for autonomous cars) and hire/buy a solution for the part that you don&#8217;t build your competitive advantage on.&nbsp;</p>



<p>Think of it this way, would you go and build a gmail because you can?</p>



<p>Or mail-sending system like mailchimp?</p>



<p>Or CMS like wordpress?</p>



<p>Some companies do, even though it is not their business. And it is usually a big mistake as you are focusing on building shovels rather than digging for gold :). </p>



<p>Companies have invested billions of dollars to create great, free and/or premium tools. Most of these tools you can easily extend to fit your own use case, saving your time, money, resources and headaches.</p>



<p>Now, let’s take a detailed look at some of the most popular tools.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-tools-for-machine-learning-model-management">Tools for Machine Learning Model Management</h2>



<p>Keep in mind, I have my personal preference when it comes to the tools described below, but I tried to be as objective as possible.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-neptune-ai"><a href="/" target="_blank" rel="noreferrer noopener">neptune.ai</a></h3>



<div id="app-screenshot-block_dcb969b7e87a931ba6730a24ffaf27de"
	class="block-app-screenshot js-block-with-image-full-screen-modal "
	data-video-url=""
	data-show-controls="false"
	data-unmute="false"
	data-button-icon="https://neptune.ai/wp-content/themes/neptune/img/icon-close.svg"
	data-image-full-screen-modal="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/08/Neptune-metrics-charts.jpg?fit=1020%2C700&#038;ssl=1"
>

			<div class="block-app-screenshot__image-wrapper">
			<div class="block-app-screenshot__bar">
				<figure class="block-app-screenshot__bar-buttons-wrapper">
					<img
						src="https://neptune.ai/wp-content/themes/neptune/img/blocks/app-screenshot/bar-buttons.svg"
						width="34"
						height="9"
						class="block-app-screenshot__bar-buttons"
						alt="">
				</figure>
			</div>

			
				<img
					srcset="
					https://i0.wp.com/neptune.ai/wp-content/uploads/2022/08/Neptune-metrics-charts.jpg?fit=480%2C329&#038;ssl=1 480w,					https://i0.wp.com/neptune.ai/wp-content/uploads/2022/08/Neptune-metrics-charts.jpg?fit=768%2C527&#038;ssl=1 768w,					https://i0.wp.com/neptune.ai/wp-content/uploads/2022/08/Neptune-metrics-charts.jpg?fit=1020%2C700&#038;ssl=1 1020w"
					alt=""
					style=""
					width="1020"
					height="700"
					class="block-app-screenshot__image"
				>

			
			<div class="block-app-screenshot__overlay">

				
					<a
						href="https://demo.neptune.ai/o/neptune/org/LLM-training-example/runs/compare?viewId=9c57c497-1131-4644-827f-0fcff4f28ad2&#038;detailsTab=metadata&#038;dash=charts&#038;type=run&#038;experimentsOnly=true&#038;experimentOnly=true&#038;runsLineage=FULL&#038;compare=IwGl7SOrYhmOSYwCwCYSIJyQKy4zyHKmiaipnUqx54gVhV5VXlO1ewDsNpTTA2Co%2B3WpiaJQ0gY2ToS4-stUAGYSq2qk2bIjwR0DKgDZE6ABxhrEXGLCZ7YeKbOVLQA"
						class="c-button c-button--primary c-button--small c-button--cta">
						<img
							decoding="async"
							loading="lazy"
							src="https://neptune.ai/wp-content/themes/neptune/img/icon-button--test-tube.svg"
							width="16"
							height="19"
														class="c-button__icon"
							alt=""
						/>

													<span class="c-button__text">
								See in the app							</span>
						
					</a>

				
														<button
						class="js-c-image-full-screen-modal c-button c-button--tertiary c-button--small">
						<img
							decoding="async"
							loading="lazy"
							src="https://neptune.ai/wp-content/themes/neptune/img/icon-zoom.svg"
							width="16"
							height="17"
							class="c-button__icon"
							alt="zoom"
						/>

						<span class="c-button__text">
							Full screen preview						</span>
						
					</button>
									
			</div>

		</div>

			
</div>



<div id="separator-block_bdc223b7ef1904938b62bedbd1ddbaa5"
         class="block-separator block-separator--10">
</div>



<p>Neptune is the experiment tracker for teams that train foundation models with a <a href="https://neptune.ai/product/team-collaboration" target="_blank" rel="noreferrer noopener">strong focus on collaboration</a> and scalability. The tool is known for its user-friendly interface and flexibility, enabling teams to adopt it into their existing workflows with minimal disruption. Neptune gives users a lot of freedom when defining data structures and tracking metadata. </p>



<p><span style="margin: 0px; padding: 0px;">With Neptune, ML/AI researchers and engineers can monitor, visualize, compa</span>re, and query&nbsp;all their model-building metadata&nbsp;in a single place. It handles data such as model metrics and parameters, model checkpoints, images, videos, audio files, dataset versions, and visualizations.&nbsp; Furthermore, Neptune makes sharing results with team members, outside collaborators, and stakeholders easy.</p>



<p><strong>Key advantages</strong></p>



<ul class="wp-block-list">
<li><strong>Scalability: </strong>Neptune easily tracks tens of thousands of data points, and the UI allows users to compare more than 100,000 runs with millions of data points.<br></li>



<li><strong>Pricing: </strong>Neptune’s <a href="/pricing" target="_blank" rel="noreferrer noopener">pricing model</a> is based on the number of users, allowing them to collaborate on as many projects as they like.<br></li>



<li><strong>Self-hosting: </strong>Neptune is <a href="https://neptune.ai/product/deployment-options" target="_blank" rel="noreferrer noopener">available for self-hosting</a>, which is a first-class offering in the Enterprise tier. Designed to be hosted in a private cloud environment, Neptune integrates with common authentication solutions like SAML or LDAP, allowing seamless integration while keeping sensitive data protected.<br></li>



<li><strong>Support and documentation: </strong>All <a previewlistener="true" href="https://neptune.ai/pricing" target="_blank" rel="noreferrer noopener">plans</a> (including the Free tier) provide access to chat and email support, with SLAs reserved for the Enterprise plan. Neptune’s documentation is comprehensive and includes many examples.</li>



<li>One standout feature of Neptune is the ability to fork experiment runs from any intermediate step. This is particularly important for large-scale deep learning experiments – such as training foundation models – where training failures due to hardware or network issues are unavoidable. It’s also common to try different parameters and training configurations over the course of a month-long training process.</li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow"><a href="https://mlflow.org/" target="_blank" rel="noreferrer noopener nofollow">MLflow</a></h3>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Model-registry-mlflow.png?ssl=1" alt="Model registry MLflow" class="wp-image-41777"/><figcaption class="wp-element-caption"><a href="https://www.oreilly.com/content/wp-content/uploads/sites/2/2019/06/image2-6f4305fe136de120e9b586762eab77b8.gif" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>MLflow is an open-source platform for managing the whole <strong>machine learning lifecycle (MLOps)</strong>. Experimentation, reproducibility, deployment, central model registry, it does it all. MLflow is suitable for individuals and for teams of any size.&nbsp;</p>



<p>The tool is library-agnostic. You can use it with any machine learning library, and any programming language.</p>



<p>Launched in 2018, MLflow quickly became the industry standard because of its easy integration with major ML frameworks, tools, and libraries such as Tensorflow, Pytorch, Scikit-learn, Kubernetes and Sagemaker, just to name a few. It has a big community of users and contributors.</p>



<p>MLflow has four main functions that help track and organize experiments and models:</p>



<ul class="wp-block-list">
<li><strong>MLflow Tracking</strong> – an API and UI for logging parameters, code versions, metrics, and artifacts when running machine learning code, and for later visualizing and comparing the results;</li>



<li><strong>MLflow Projects </strong>– packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production;</li>



<li><strong>MLflow Models</strong> – managing and deploying models from different ML libraries to a variety of model serving and inference platforms;</li>



<li><strong>MLflow Model Registry</strong> – central model store to collaboratively manage the full lifecycle of an MLflow model, including model versioning, stage transitions, and annotations.</li>
</ul>



<p>MLflow is not only available as an open-source tool you can host yourself, it is also available in a managed format within MLOps platforms:</p>



<ul class="wp-block-list">
<li>Since June 2024, <a href="https://aws.amazon.com/sagemaker/" target="_blank" rel="noreferrer noopener nofollow">Amazon SageMaker</a> no longer maintains its own dedicated experiment tracking SDK. Instead, it offers a managed MLflow capability for experiment tracking. Rather than offering its own dedicated experiment tracking layer, SageMaker now allows users to log experiments using MLflow APIs, enabling greater flexibility and interoperability with external tools in the MLflow ecosystem, while still benefiting from the managed infrastructure and autoscaling capabilities.</li>
</ul>



<ul class="wp-block-list">
<li>MLflow can also be used within <a href="https://azure.microsoft.com/en-us/products/machine-learning" target="_blank" rel="noreferrer noopener nofollow">Azure Machine Learning</a>, which supports experiment tracking via the MLflow client. You can configure any MLflow code to log runs to an Azure ML workspace. While the backend is proprietary and some MLflow features are limited, the integration enables smooth interoperability with the MLflow API. Additionally, Azure ML benefits from deep integration with the Azure ecosystem, including access to <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/" target="_blank" rel="noreferrer noopener nofollow">Azure OpenAI</a> models like GPT-4 and DALL-E, as well as <a href="https://learn.microsoft.com/en-us/azure/machine-learning/concept-enterprise-security?view=azureml-api-2" target="_blank" rel="noreferrer noopener nofollow">enterprise-grade security</a> through Microsoft Entra and Azure’s RBAC model.</li>
</ul>



<p>These integrations are useful for teams already working in cloud environments but provide <strong>partial MLflow compatibility</strong> rather than a full replacement of the open-source experience.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-key-advantages">Key advantages</h3>



<ul class="wp-block-list">
<li>Robust experiment tracking with support for logging parameters, metrics, artifacts, etc.</li>



<li>Easily integrates with other tools and libraries (e.g., PyTorch, TensorFlow, Scikit-learn)</li>



<li>Intuitive UI to visualize and compare runs</li>



<li>Large and active community offering support</li>



<li>Free managed service option (MLflow Community edition) with preconfigured ML environments that includes: Pytorch, TF keras and other libraries, ideal for individuals.</li>



<li>Paid managed service option through cloud providers is ideal for teams. It comes with pre-configured compute and SQL storage servers, billed per second:
<ul class="wp-block-list">
<li><a href="https://www.databricks.com/product/aws-pricing" target="_blank" rel="noreferrer noopener nofollow">Amazon Web Services</a> (AWS) &#8211; via Amazon SageMaker</li>



<li><a href="https://www.databricks.com/product/azure-pricing" target="_blank" rel="noreferrer noopener nofollow">Azure</a> &#8211; via Azure Machine Learning</li>



<li><a href="https://www.databricks.com/product/gcp-pricing" target="_blank" rel="noreferrer noopener nofollow">Google Cloud</a> &#8211; via integrated MLflow-compatible services</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-conclusion">Conclusion</h2>



<p>Machine Learning Model Management is a fundamental part of the MLOps workflow. It lets us take a model from the development phase to production, making every experiment and/or model version reproducible.&nbsp;</p>



<p>Finally, to recap, there are 4 levels of ML model management:</p>



<ul class="wp-block-list">
<li>Level-0, ad-hoc research model management</li>



<li>Level-1, partial model management</li>



<li>Level-2, semi-complete model management</li>



<li>Level-4, complete (end-to-end) model management</li>
</ul>



<p>At each level, you will be faced with different challenges. The best practices of ML model management are centered around 3 components:</p>



<ul class="wp-block-list">
<li>Model&nbsp;</li>



<li>Code&nbsp;</li>



<li>Deployment</li>
</ul>



<p>As far as tools go, we have a plethora to choose from, but in this article, I described a few popular ones:</p>



<ul class="wp-block-list">
<li>neptune.ai</li>



<li>MLflow</li>
</ul>



<p>I hope this helps you choose the right tool.</p>



<p>With that, thank you for reading this article, and stay tuned for more!</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">4077</post-id>	</item>
		<item>
		<title>How to Deal With Imbalanced Classification and Regression Data</title>
		<link>https://neptune.ai/blog/how-to-deal-with-imbalanced-classification-and-regression-data</link>
		
		<dc:creator><![CDATA[Prince Canuma]]></dc:creator>
		<pubDate>Fri, 22 Jul 2022 06:42:24 +0000</pubDate>
				<category><![CDATA[ML Model Development]]></category>
		<guid isPermaLink="false">https://neptune.test/how-to-deal-with-imbalanced-classification-and-regression-data/</guid>

					<description><![CDATA[Data imbalance is predominant and inherent in the real world. Data often demonstrates skewed distributions with a long tail. However, most of the machine learning algorithms currently in use were designed around the assumption of a uniform distribution over each target category (classification).&#160; On the other hand, we must not forget that many tasks involve&#8230;]]></description>
										<content:encoded><![CDATA[
<p><strong>Data imbalance</strong> is predominant and inherent in the real world. Data often demonstrates skewed distributions with a long tail. However, most of the machine learning algorithms currently in use were designed around the assumption of a uniform distribution over each target category (classification).&nbsp;</p>



<p>On the other hand, we must not forget that many tasks involve continuous targets and even infinite values (regression), where hard boundaries between classes do not exist (i.e. age prediction, depth estimation, and so on).</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_29-3372336075-1643136560469.jpg?resize=411%2C519&#038;ssl=1" alt="Data imbalance" class="wp-image-61203" width="411" height="519"/><figcaption class="wp-element-caption"><em>Data imbalance | Source: Author</em></figcaption></figure>
</div>


<p>In this article, I’m going to walk you through how to deal with imbalanced data in classification and regression tasks as well as talk about the performance measures you can use for each task in such a setting.</p>



<p>There are 3 main approaches to learning from imbalanced data:</p>



<div id="case-study-numbered-list-block_fc99cd6c4c779c8dc47f955d2484b8f9"
         class="block-case-study-numbered-list ">

    
    <h2 id="h-"></h2>

    <ul class="c-list">
                    <li class="c-list__item">
                <span class="c-list__counter">1</span>
                Data approach            </li>
                    <li class="c-list__item">
                <span class="c-list__counter">2</span>
                Algorithm approach             </li>
                    <li class="c-list__item">
                <span class="c-list__counter">3</span>
                Hybrid (ensemble) approach            </li>
            </ul>
</div>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-imbalanced-classification-data">Imbalanced classification data</h2>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_22.png?resize=402%2C402&#038;ssl=1" alt="SMOTE for regression" class="wp-image-61210" width="402" height="402"/><figcaption class="wp-element-caption"><em>SMOTE for regression | <a href="https://makeameme.org/meme/class-imbalance-i" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>SMOTE Imbalanced classification is a well explored and understood topic.</p>



<p>In real-life applications, we face many challenges where we only have uneven data representations in which the <strong>minority class</strong> is usually the more important one and hence we require methods to improve its recognition rates. This issue poses a serious challenge to predictive modeling because learning algorithms will be biased towards the <strong>majority class</strong>.&nbsp;</p>



<p>Important day-to-day tasks in our lives such as preventing malicious attacks, detecting life-threatening diseases, or handling rare cases in monitoring systems face extreme class imbalance with ratios ranging from 1:1000 up to 1:5000 and one must design intelligent systems that can adjust and overcome such extreme bias.</p>



<h3 class="wp-block-heading" id="how-to-handle-an-imbalanced-dataset-data-approach">How to handle an imbalanced dataset &#8211; data approach</h3>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_26.png?resize=388%2C543&#038;ssl=1" alt="How would you handle an imbalanced dataset?" class="wp-image-61206" width="388" height="543"/><figcaption class="wp-element-caption"><em>How would you handle an imbalanced dataset? | <a href="https://medium.com/sfu-cspmp/winning-against-imbalanced-datasets-14809437aa62" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>It concentrates on modifying the training set to make it suitable for a standard learning algorithm. This can be done by balancing the distributions of the dataset which can be categorized in two ways:</p>



<ul class="wp-block-list">
<li>Oversampling&nbsp;</li>



<li>Undersampling&nbsp;</li>
</ul>



<h4 class="wp-block-heading" id="1-oversampling">1. Oversampling</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_24.png?resize=763%2C403&#038;ssl=1" alt="Oversampling" class="wp-image-61208" width="763" height="403"/><figcaption class="wp-element-caption"><em>Oversampling | <a href="https://dataaspirant.com/10-oversampling/" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>In this approach, we synthesize new examples from the minority class.&nbsp;</p>



<p>There are several methods available to oversample a dataset used in a typical classification problem. But the most common data augmentation technique is known as <strong>Synthetic Minority Oversampling Technique</strong> or <strong>SMOTE</strong> for short.&nbsp;</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_11.png?resize=755%2C303&#038;ssl=1" alt="Scatter plot of the class distribution before and after SMOTE" class="wp-image-61221" width="755" height="303"/><figcaption class="wp-element-caption"><em>Scatter plot of the class distribution before and after SMOTE | <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>As the name suggests, SMOTE creates “synthetic” examples rather than over-sampling with replacement. Specifically, SMOTE works the following way. It starts by randomly selecting a minority class example and finding its <em>k</em> nearest minority class neighbors at random. Then a synthetic example is created at a randomly selected point in the line that connects two examples in feature space.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_18.png?resize=738%2C275&#038;ssl=1" alt="SMOTE" class="wp-image-61214" width="738" height="275"/><figcaption class="wp-element-caption"><em>SMOTE | <a href="https://iq.opengenus.org/smote-for-imbalanced-dataset/" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>The created synthetic examples from SMOTE for the minority class when added to the training set, balance the class distributions and cause the classifier to create larger and less specific decision regions helping the classifier generalize better and mitigate overfitting, rather than smaller and more specific regions which will cause the model to overfit to the majority class.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_8.png?resize=748%2C561&#038;ssl=1" alt="Decision boundaries" class="wp-image-61224" width="748" height="561"/><figcaption class="wp-element-caption"><em>Decision boundaries | <a href="https://slideplayer.com/slide/14454418/" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>This approach is inspired by data augmentation techniques that proved successful in handwritten character recognition where operations like rotation and skew were natural ways to perturb the training data.</p>



<p>Now, let&#8217;s take a look at the performance of SMOTE.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_28.png?resize=751%2C246&#038;ssl=1" alt="Confusion matrix of classifiers trained on data synthetic examples and tested on the imbalanced test set" class="wp-image-61204" width="751" height="246"/><figcaption class="wp-element-caption"><em>Confusion matrix of classifiers trained on data synthetic examples and tested on the imbalanced test set | <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>From the confusion matrix we can notice a few things:</p>



<ul class="wp-block-list">
<li>The classifiers trained on synthetic examples generalize well.</li>



<li>The classifiers Identify the minority class well (True Negatives).</li>



<li>They have fewer False Positives compared to undersampling.</li>
</ul>



<h5 class="wp-block-heading" id="advantages">Advantages&nbsp;</h5>



<ul class="wp-block-list">
<li>It improves the overfitting caused by random oversampling as synthetic examples are generated rather than a copy of existing examples.</li>



<li>No loss of information.</li>



<li>It&#8217;s simple.</li>
</ul>



<h5 class="wp-block-heading" id="disadvantages">Disadvantages&nbsp;</h5>



<ul class="wp-block-list">
<li>While generating synthetic examples, SMOTE does not take into consideration neighboring examples that can be from other classes. This can increase the overlapping of classes and can introduce additional noise.</li>



<li>SMOTE is not very practical for high-dimensional data.</li>
</ul>



<h4 class="wp-block-heading" id="2-undersampling">2. Undersampling</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_7.png?resize=775%2C402&#038;ssl=1" alt="Undersampling" class="wp-image-61225" width="775" height="402"/><figcaption class="wp-element-caption"><em>Undersampling | <a href="https://dataaspirant.com/10-oversampling/" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>In this approach, we reduce the number of samples from the <strong>majority class</strong> to match the number of samples in the <strong>minority class</strong>.&nbsp;</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_15.png?resize=770%2C278&#038;ssl=1" alt="Scatter plot of the class distribution before and after applying NearMiss-2" class="wp-image-61217" width="770" height="278"/><figcaption class="wp-element-caption"><em>Scatter plot of the class distribution before and after applying NearMiss-2 | <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener">S</a><a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">ource</a></em></figcaption></figure>
</div>


<p>This can be done in a couple of ways:</p>



<ol class="wp-block-list">
<li><strong>Random sampler</strong>: It is the easiest and fastest way to balance the data by randomly selecting a few samples from the majority class.</li>



<li><strong>NearMiss</strong>: Adds some common sense rules to the selected samples by implementing <a href="https://imbalanced-learn.org/stable/under_sampling.html#mathematical-formulation" target="_blank" rel="noreferrer noopener nofollow">3 different heuristics</a>, but in this article, we will only focus on one.
<ul class="wp-block-list">
<li><strong>NearMiss-2</strong> Majority class examples with a minimum average distance to three furthest minority class examples.</li>
</ul>
</li>
</ol>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_31.png?resize=742%2C246&#038;ssl=1" alt="Confusion matrix of classifiers trained on undersampled examples and tested on the imbalanced test set" class="wp-image-61201" width="742" height="246"/><figcaption class="wp-element-caption"><em>Confusion matrix of classifiers trained on undersampled examples and tested on the imbalanced test set | <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>From the confusion matrix we can notice a few things:</p>



<ul class="wp-block-list">
<li>Undersampling performs poorly compared to oversampling when it comes to identifying the majority class (True Positive). But besides that, it identifies the minority class better than oversampling and has fewer False Negatives.</li>
</ul>



<h5 class="wp-block-heading" id="advantages">Advantages</h5>



<ul class="wp-block-list">
<li>Data scientists can balance the dataset and reduce the risk of their analysis or machine learning algorithm skewing toward the majority. Because without resampling, scientists might come up with what is known as the accuracy paradox where they run a classification model with 90% accuracy. On closer inspection, though, they will find the results are heavily within the majority class.&nbsp;</li>



<li>Fewer storage requirements and better run times for analyses. Less data means you or your business needs less storage and time to gain valuable insights.&nbsp;</li>
</ul>



<h5 class="wp-block-heading" id="disadvantages">Disadvantages</h5>



<ul class="wp-block-list">
<li>&nbsp;Removing enough majority examples to make the majority class the same or similar size to the minority class results in a significant loss of data.</li>



<li>The sample of the majority class chosen could be biased, meaning, it might not accurately represent the real world, and the result of the analysis may be inaccurate. Therefore, it can cause the classifier to perform poorly on real unseen data.</li>
</ul>



<p>Because of these disadvantages, some scientists might prefer oversampling. It doesn’t lead to any loss of information, and in some cases, may perform better than undersampling. But oversampling isn’t perfect either. Because oversampling often involves replicating minority events, it can lead to overfitting.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><em>“The combination of SMOTE and under-sampling performs better than plain under-sampling.” </em></p>



<p></p>
<cite><p><a href="https://arxiv.org/abs/1106.1813" target="_blank" rel="noreferrer noopener nofollow">SMOTE: Synthetic Minority Over-sampling Technique</a>, 2011</p></cite></blockquote>



<p>To balance these issues, certain scenarios might require a combination of both over and undersampling to obtain the most lifelike dataset and accurate results.&nbsp;</p>



<h3 class="wp-block-heading" id="how-to-handle-imbalanced-data-algorithm-approach">How to handle imbalanced data &#8211; algorithm approach</h3>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_25.png?resize=396%2C559&#038;ssl=1" alt="Algorithm approach – best models for imbalanced classification" class="wp-image-61207" width="396" height="559"/><figcaption class="wp-element-caption"><em>Algorithm approach – best models for imbalanced classification | <a href="https://medium.com/analytics-vidhya/what-precision-recall-f1-score-and-accuracy-can-tell-you-fe1eab1ada5a" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>This approach concentrates on modifying existing models to alleviate their bias towards the majority groups. This requires good insight into the modified learning algorithm and precise identification of reasons for its failure in learning the representations of skewed distributions.&nbsp;</p>



<p>The most popular techniques are cost-sensitive approaches (<strong>weighted learners</strong>). Here, the given model is modified to incorporate varying penalties for each considered group of examples. In other words,<strong> we use Focal loss where we assign a higher weight to the minority class in our cost function which will penalize the model for misclassifying the minority class while at the same time reducing the weight of the majority class, causing the model to pay more attention to the underrepresented class</strong>. Thus, boosting its importance during the learning process.</p>



<p>Another interesting algorithm-level solution is to apply <strong>one-class learning or one-class classification(OCC for short)</strong> that focuses on the target group, creating a data description. This way we eliminate bias towards any group, as we concentrate only on a single set of objects.</p>



<p>OCC can be useful in imbalanced classification problems because it provides techniques for outlier and anomaly detection. It does this by fitting the model on the majority class data (also known as positive examples) and predicting whether new data belong to the majority class or belong to the minority class(also known as negative examples) meaning it’s an outlier/anomaly.&nbsp;</p>



<p>OCC problems usually are practical classification tasks where majority class data is easily available but minority class is hard, expensive, and even impossible to gather, i.e. work of an engine, fraudulent transactions, intrusion detection for the computer system, and so on.</p>



<h3 class="wp-block-heading" id="hybrid-approach">How to deal with imbalanced data &#8211; hybrid approach</h3>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_27.png?resize=466%2C350&#038;ssl=1" alt="Hybrid approach" class="wp-image-61205" width="466" height="350"/><figcaption class="wp-element-caption"><em>Hybrid approach | <a href="https://i.pinimg.com/originals/0a/69/6e/0a696ef01b163532d6de95d04ab6385c.jpg">Source</a></em></figcaption></figure>
</div>


<p>Hybridization is an approach that exploits the strengths of individual components. When it comes to dealing with imbalanced classification data, some works proposed hybridization of sampling and cost-sensitive learning. In other words, combining both <strong>data</strong> and <strong>algorithm</strong> level approaches. This idea of <strong>two-stage training that </strong>merges data-level solutions with algorithm-level solutions (i.e. classifier ensemble), resulting in robust and efficient learners is highly popular.&nbsp;</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_20.png?resize=580%2C606&#038;ssl=1" alt="Example scheme of the hybrid approach" class="wp-image-61212" width="580" height="606"/><figcaption class="wp-element-caption"><em>Example scheme of the hybrid approach | <a href="https://link.springer.com/article/10.1007/s42979-020-0119-4" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>It works by applying a data-level approach first. As you remember the data level approach works by modifying the training set to balance the class distribution between the majority class and the minority by using either oversampling or undersampling.&nbsp;</p>



<p>Then the pre-processed data with balanced class distribution is used to train a classifier ensemble, in other words, a collection of multiple classifiers from which a new classifier is derived which performs better than any constituent classifier. Thus, creating a robust and efficient learner that inherits the strong points of both data and algorithm level approaches while reducing their weaknesses at the same time.&nbsp;</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_10.png?resize=691%2C265&#038;ssl=1" alt="Confusion matrix of hybrid classifiers trained and tested on the imbalanced test set" class="wp-image-61222" width="691" height="265"/><figcaption class="wp-element-caption"><em>Confusion matrix of hybrid classifiers trained and tested on the imbalanced test set | <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>From the confusion matrix we can notice a few things:</p>



<ul class="wp-block-list">
<li>The hybrid classifiers perform better than undersampling when it comes to identifying the majority class&nbsp;</li>



<li>And, is almost as good as both undersampling and oversampling when it comes to identifying the minority class.</li>
</ul>



<p>Basically takes the best of both worlds!</p>



<h3 class="wp-block-heading" id="performance-measures-for-imbalanced-classification">Performance measures for imbalanced classification</h3>



<p>In this section, we review the common performance measures used and their effectiveness when addressing imbalanced classification data.</p>



<ul class="wp-block-list">
<li>Confusion matrix</li>



<li>ROC and AUC</li>



<li>Precision recall</li>



<li>F-score</li>
</ul>



<section id="blog-intext-cta-block_18e57bf88b6cfe590a57d13627f6b357" class="block-blog-intext-cta  c-box c-box--default c-box--dark c-box--no-hover c-box--standard ">

            <h3 class="block-blog-intext-cta__header" class="block-blog-intext-cta__header" id="h-may-interest-you">May interest you </h3>
    
            <p><a href="/blog/f1-score-accuracy-roc-auc-pr-auc" target="_blank" rel="noopener">F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?</a></p>
    
    </section>



<h4 class="wp-block-heading" id="1-confusion-matrix">1. Confusion matrix</h4>



<p>For binary classification problems, the <strong>confusion matrix</strong> defines the base for performance measures. Most of the performance metrics are derived from the confusion matrix, i.e., accuracy, misclassification rate, precision, and recall.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_21.png?resize=605%2C340&#038;ssl=1" alt="Confusion matrix" class="wp-image-61211" width="605" height="340"/><figcaption class="wp-element-caption"><em>Confusion matrix | <a href="https://glassboxmedicine.files.wordpress.com/2019/02/confusion-matrix.png?w=816" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>However, <strong>accuracy is not appropriate when the data is imbalanced</strong>. Because the model can achieve higher accuracy by just predicting accurately the majority class while performing poorly on the minority class which in most cases is the class we care about the most.</p>



<h4 class="wp-block-heading" id="2-roc-and-auc-imbalanced-data">2. ROC and AUC imbalanced data</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_30.png?resize=680%2C382&#038;ssl=1" alt="ROC and AUC imbalanced data" class="wp-image-61202" width="680" height="382"/><figcaption class="wp-element-caption"><em>ROC and AUC imbalanced data | <a href="https://i.ytimg.com/vi/afQ_DyKMxUo/maxresdefault.jpg" target="_blank" rel="noreferrer noopener nofollow">Source</a>&nbsp;</em></figcaption></figure>
</div>


<p>To accommodate the minority class, the Receiver Operating Characteristic (ROC) curve is proposed as a measure over a range of tradeoffs between the True Positive (TP) Rate and False Positive (FP) Rate. Another important performance measure is Area Under the Curve (AUC) is a commonly used performance metric for summarizing the ROC curve in a single score. Moreover, AUC is not biased towards the model&#8217;s performance on either the majority or minority class, which makes this measure more appropriate when dealing with imbalanced data.</p>



<h4 class="wp-block-heading" id="3-precision-and-recall">3. Precision and recall</h4>



<p>From the confusion matrix, we can also derive<strong> precision and recall</strong> performance metrics.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_17.png?resize=831%2C303&#038;ssl=1" alt="Precision and recall" class="wp-image-61215" width="831" height="303"/><figcaption class="wp-element-caption"><em>Precision and recall | <a href="https://towardsdatascience.com/whats-the-deal-with-accuracy-precision-recall-and-f1-f5d8b4db1021" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>Precision is great for class imbalance and it’s not affected by it because it doesn’t include the number of True Negatives in its calculation.</p>



<p>One drawback of precision and recall is that like accuracy there might be some imbalance between the two where we want to improve TP for the minority class, however, the number of FP can also increase.&nbsp;</p>



<h4 class="wp-block-heading" id="4-f-score">4. F-score</h4>



<p>To balance the recall and precision, i.e., improving recall, while keeping precision low, the <strong>F-score</strong> is proposed as a harmonic mean of the precision and recall.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_2.png?resize=614%2C115&#038;ssl=1" alt="F-score" class="wp-image-61230" width="614" height="115"/><figcaption class="wp-element-caption"><em>F-score | <a href="https://arxiv.org/pdf/2104.02240.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>Since the F-score weights, precision, and recall equally and balances both concerns, it is less likely to be biased to the majority or minority class. <a href="https://docs.google.com/document/d/1jlcYg_zmBwEJJOP79bV4PfcftpclGzEEvS20TjgOcBY/edit#heading=h.ubjlh9z2r0v" target="_blank" rel="noreferrer noopener nofollow">[2]</a></p>



<p>Check<a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1?usp=sharing" target="_blank" rel="noreferrer noopener nofollow"> this experiment with the 3 imbalance classification approaches code examples</a> in the Colab notebook I prepared for you.</p>



<h2 class="wp-block-heading" id="imbalanced-regression-data">Imbalanced regression data</h2>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_13.png?resize=479%2C359&#038;ssl=1" alt="Imbalanced regression data" class="wp-image-61219" width="479" height="359"/><figcaption class="wp-element-caption"><em>Imbalanced regression data | <a href="https://i.imgflip.com/sy501.jpg" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p><strong>Regression over imbalanced data</strong> is not well explored. And, many important real-life applications like the economy, crisis management, fault diagnosis, or meteorology require us to apply <a href="https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca" target="_blank" rel="noreferrer noopener nofollow"><strong>regression over imbalanced data</strong></a> which means predicting rare and extreme continuous target values from input data.</p>



<p>Because dealing with imbalanced data is a relevant problem that has been studied mostly in the context of classification tasks, there are scarce mature or suitable strategies to address it in the context of regression.</p>



<p>Let’s first look at the typical approaches adopted from Imbalanced Classification then we will look into some of the best Imbalanced Regression techniques currently being used.</p>



<h3 class="wp-block-heading" id="approachas-adopted-from-imbalanced-classification">Approachas adopted from imbalanced classification</h3>



<h4 class="wp-block-heading" id="data-approach">Data approach</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_23.jpg?resize=401%2C535&#038;ssl=1" alt="Adopted from Imbalanced classification" class="wp-image-61209" width="401" height="535"/><figcaption class="wp-element-caption"><em>Adopted from Imbalanced classification | Author:  <a href="mailto:prince.canuma@neptune.ai" target="_blank" rel="noreferrer noopener nofollow">Prince Canuma</a></em></figcaption></figure>
</div>


<p>When it comes to data approaches for imbalanced regression we have two techniques that were heavily inspired on imbalanced classification:</p>



<ul class="wp-block-list">
<li>SMOTER</li>



<li>SMOGN</li>
</ul>



<h5 class="wp-block-heading" id="1-smoter">1. SMOTER</h5>



<p>SMOTER is an adaptation for regression of the well-known SMOTE algorithm.</p>



<p>It works by defining frequent (majority) and rare (minority) regions using the original label density and then applying random undersampling to the majority region and oversampling to the minority region, where the user has to pre-determine the percentage of over and undersampling to be carried out by the SMOTER algorithm.</p>



<p>When it comes to oversampling the minority regions it not only generates new synthetic examples it also applies an <a href="https://www.investopedia.com/terms/i/interpolation.asp#:~:text=Interpolation%20is%20achieved,haven%27t%20been%20calculated" target="_blank" rel="noreferrer noopener nofollow">interpolation</a> strategy that combines inputs and targets of different examples. Precisely, this interpolation is carried out using two rare cases where one is a seed case and the other is randomly selected from the k-nearest neighbors to the seed. The features of the two cases are interpolated, and the new target variable is determined as a weighted average of the target variables of the two rare cases used.</p>



<p>Why do we have to average the target variables you might ask? Remember that in the original SMOTE algorithm, this was a trivial question, because as all rare cases have the same region (the target minority region), but in the case of regression the answer is not so trivial because when a pair of examples are used to generate a new synthetic case, they will not have the same target variable value.</p>



<h5 class="wp-block-heading" id="2-smogn">2. SMOGN</h5>



<p><a href="http://proceedings.mlr.press/v74/branco17a/branco17a.pdf" target="_blank" rel="noreferrer noopener nofollow">SMOGN </a>takes after SMOTER but it further adds Gaussian Noise to the oversampling phase alongside the one SMOTER already has.</p>



<p>The key idea of SMOGN algorithm is to combine both SMOTER and Gaussian Noise strategies for generating synthetic examples to simultaneously limit the risks that SMOTER can incur such as lack of diverse examples by using the more conservative strategy of introducing Gaussian Noise because SMOTER will not use the most distant examples in the interpolation process. It works by generating new synthetic examples with SMOTER only when the seed example and the k-nearest neighbor selected are close enough and using the Gaussian noise when the two examples are more distant.</p>



<h4 class="wp-block-heading" id="algorithm-approach">Algorithm approach</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_5.png?resize=426%2C436&#038;ssl=1" alt="Algorithm approach" class="wp-image-61227" width="426" height="436"/><figcaption class="wp-element-caption"><em>Algorithm approach | Source: Author</em></figcaption></figure>
</div>


<p>Like in imbalanced classification this approach also includes adjusting the loss function to compensate for region imbalance (re-weighting) and other relevant learning paradigms such as transfer learning, metric learning, two-stage training, and meta-learning <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">[4]</a>. But we will focus on the first 2 paradigms:</p>



<ul class="wp-block-list">
<li>Error-aware loss</li>



<li>Cost-sensitive re-weighting&nbsp;</li>
</ul>



<h5 class="wp-block-heading" id="1-error-aware-loss">1. Error-aware loss</h5>



<p>It is the regression version of the Focal Loss for classification called Focal-R. Focal loss is a dynamically weighted cross-entropy loss, where the weighting factor(alpha) decays to zero as confidence in the correct class increases.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_14.png?resize=646%2C413&#038;ssl=1" alt="The focal loss down weights easy examples with a weighting factor of  - (1-  pt)^γ" class="wp-image-61218" width="646" height="413"/><figcaption class="wp-element-caption"><em>The focal loss down weights easy examples with a weighting factor of&nbsp; &#8211; (1-&nbsp; pt)^γ | <a href="https://arxiv.org/ftp/arxiv/papers/2006/2006.01413.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>Focal-R replaces the weighting factor by a continuous function that maps the absolute error(L1 distance) into values in the range of 0 to 1.</p>



<p>Precisely, Focal-R loss based on L1 distance can be written as:</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_32.png?resize=347%2C79&#038;ssl=1" alt="Focal-R loss based on L1 distance" class="wp-image-61249" width="347" height="79"/><figcaption class="wp-element-caption"><em>Focal-R loss based on L1 distance | <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>Where ei is the L1 error for i-th sample, σ(·) is the Sigmoid function, and β, γ are hyper-parameters.</p>



<h5 class="wp-block-heading" id="2-cost-sensitive-re-weighting">2. Cost-sensitive re-weighting</h5>



<p>Since the target space can be divided into finite bins, classic re-weighting schemes can be directly plugged in, such as inverse-frequency weighting(INV) and its square-root weighting variant(SQINV) both of which are based on the label distribution.</p>



<h4 class="wp-block-heading" id="hybrid-approach">Hybrid approach</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_9.png?ssl=1" alt="Hybrid approach" class="wp-image-61223"/><figcaption class="wp-element-caption"><em>Hybrid approach | <a href="https://comicsandmemes.com/wp-content/uploads/hybrid-animal-006-shorse.jpg" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>It takes after the hybrid approach for imbalanced classification.&nbsp;</p>



<p>Like the hybrid approach for imbalanced classification, the imbalanced regression hybrid approach also combines data level and algorithm level approaches in order to produce robust and efficient learners.</p>



<p>An example of this approach is the Bagging-based ensemble.</p>



<h5 class="wp-block-heading" id="bagging-based-ensemble">Bagging-based ensemble</h5>



<p>This algorithm incorporates data pre-processing strategies for addressing imbalanced domains in regression tasks.</p>



<p>Precisely, a paper entitled “REBAGG: REsampled BAGGing for Imbalanced Regression” proposes an algorithm that obtains diversity on the generated models while simultaneously biasing them towards the least represented and more important cases.</p>



<p>It has two main steps:</p>



<ol class="wp-block-list">
<li>Build a number of models using pre-processed samples of the training set.</li>



<li>Use the trained models to obtain predictions on unseen data by applying an averaging strategy (basically averaging models’ predictions to obtain the final predictions).</li>
</ol>



<p>Regarding the first step, the authors developed four main types of resampling methods to apply to the original training set:<strong> balance</strong>,<strong> balance.SMT</strong>,<strong> variation</strong>,<strong> </strong>and<strong> variation.SMT</strong>. The key distinguishing feature of these methods is related with:&nbsp;</p>



<p>i) the ratio between the number of minority and majority examples used in the new sample; and,</p>



<p>ii) how new minority examples are obtained.</p>



<p>On the resampling methods labeled with the prefix “balance”, the new modified training set will have the same number of minority and majority examples. On the other hand, for resampling methods with the prefix “variation”, the ratio of minority to majority examples in the new training set will vary.</p>



<p>When the resampling method has no suffix appended, then the new synthetic examples for minority region are obtained by using exact copies of randomly selected minority examples. And when the suffix “SMT” is appended the new synthetic examples for the minority region are obtained using the SMOTER algorithm.</p>



<h3 class="wp-block-heading" id="deep-imbalanced-regression-dir">Deep Imbalanced Regression (DIR)</h3>



<p>The methods adopted from imbalanced classification work; however, there are several drawbacks to using them alone.</p>



<p>Allow me to make a case!</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_16.jpg?resize=671%2C464&#038;ssl=1" alt="Figure 1. Comparison on the test error distribution (bottom) using the same training label distribution (top) on two different datasets" class="wp-image-61216" width="671" height="464"/><figcaption class="wp-element-caption"><em>Figure 1. Comparison on the test error distribution (bottom) using the same training label distribution (top) on two different datasets | </em><a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>The above datasets have intrinsically different label spaces (a) CIFAR-100 exhibits categorical label space where the target is a class index while (b) IMDB-WIKI exhibits continuous label space where the target is age.</p>



<p>As you can see the label density distribution is the same for both but the error distribution is very different. The error distribution for IMDB-WIKI is much smoother and doesn’t correlate well with the label density distribution and this affects how imbalanced learning methods work because directly or indirectly, they operate by compensating for the imbalance in the <em>empirical</em> label density distribution. This approach works well for imbalanced classification but not for continuous labels. Instead, you have to find a way to smooth the label distribution.</p>



<h4 class="wp-block-heading" id="label-distribution-smoothing-lds-for-imbalanced-data-density-estimation">Label distribution smoothing (LDS) for imbalanced data density estimation</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_12.jpg?resize=629%2C321&#038;ssl=1" alt="Figure 2. Label distribution smoothing(LDS) convolves a symmetric kernel with the empirical label density to estimate the effective label density distribution that accounts for the continuity of labels" class="wp-image-61220" width="629" height="321"/><figcaption class="wp-element-caption"><em>Figure 2. Label distribution smoothing (LDS) convolves a symmetric kernel with the empirical label density to estimate the effective label density distribution that accounts for the continuity of labels | </em><a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>From figure 2 above we can see that in the continuous space empirical label distribution does not match the real label density distribution. Why is this? Because of the dependence between data samples at nearby labels, in this case, we are talking about images of close age.</p>



<p>LDS uses kernel density estimation to learn the effective imbalance in datasets that corresponds to continuous targets. Precisely, LDS convolves a symmetric kernel with the empirical density distribution to extract a kernel-smoothed version that accounts for the overlap in the information of data samples of nearby labels.</p>



<p><em><strong>Note</strong>: Gaussian or a Laplacian kernel is a symmetric kernel.</em></p>



<p>The symmetric kernel characterizes the similarity between target values y’ and y w.r.t their distance in the target space.&nbsp;</p>



<p>Figure 2 at the beginning of this section shows that LDS captures the real imbalance that affects regression. By applying LDS we get a label density distribution that correlates well with error distribution (-0.83).</p>



<p>Once you have the effective label density, you can then use the adapted techniques for addressing imbalanced classification that we talked about earlier (i.e. cost-sensitive re-weighting method).</p>



<h4 class="wp-block-heading" id="feature-distribution-smoothing-fds">Feature distribution smoothing (FDS)</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_19.jpg?resize=629%2C416&#038;ssl=1" alt="Feature distribution smoothing (FDS)" class="wp-image-61213" width="629" height="416"/><figcaption class="wp-element-caption"><strong><em>Top</em></strong><em>: Cosine similarity of the feature means at a particular age w.r.t its value at the anchor age. </em><strong><em>Bottom</em></strong><em>: Cosine similarity of the feature variance at a particular age w.r.t its value at the anchor age. The color of the background refers to data density in a particular target range | </em><a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>The above figure displays the feature statistics similarity for age 30 (anchor). And you can right away notice that the bins that surround the anchor are highly similar to the anchors, especially the closest ones. But examining the figure further you will notice that there is a problem with regions with very few data samples (i.e. age 0-6 years). Due to data imbalance, the mean and variance show an unjustified high similarity to age 30.&nbsp;</p>



<p>The <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">creators</a> of the Feature distribution smoothing (FDS) algorithm were inspired by these observations and proposed this algorithm that performs distribution smoothing on the feature space, or in other words, transfers feature statistics between nearby target bins. Thus calibrating the potentially biased estimates of feature distribution, especially for underrepresented target values in training data.</p>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_1.jpg?resize=651%2C350&#038;ssl=1" alt="Feature distribution smoothing (FDS)" class="wp-image-61231" width="651" height="350"/><figcaption class="wp-element-caption"><em>Feature distribution smoothing (FDS) | <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>And one great thing about FDS is that you can integrate it into deep neural networks by inserting a feature calibration layer after the final feature map.</p>



<h4 class="wp-block-heading" id="benchmarking">Benchmarking&nbsp;</h4>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-full is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_4.png?resize=564%2C546&#038;ssl=1" alt="Benchmarking results on STS-B-DIR" class="wp-image-61228" width="564" height="546"/><figcaption class="wp-element-caption"><em>Benchmarking results on STS-B-DIR | <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>Results reported on the Semantic Textual Similarity Benchmark (STS-B-DIR) dataset using various algorithms.</p>



<p>The authors show that when LDS and FDS are coupled with other existing methods to address regression over imbalanced data significantly improves the performance <a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">[4]</a>.</p>



<h3 class="wp-block-heading" id="performance-measures-for-imbalance-regression">Performance measures for imbalance regression</h3>



<p>When it comes to evaluation metrics for this kind of problem, you can use the common metrics for regression such as MAE, MSE, Pearson, Geometric Mean(GM) alongside the techniques we explored in this section.</p>



<h3 class="wp-block-heading" id="crucial-open-issues-to-address-when-developing-novel-methods-for-imbalanced-regression">Crucial open issues to address when developing novel methods for Imbalanced regression</h3>



<ul class="wp-block-list">
<li>Development of cost-sensitive regression solutions that can adapt the cost to the degree of importance assigned to rare observations. To allow for more flexibility in predicting rare events of differing importance it would be rather interesting to investigate the possibility of adapting the cost not only to the minority group but to each individual observation.</li>



<li>Methods that will allow distinguishing between minority and noisy samples must be proposed.</li>



<li>Development of better ensemble learning methods as in classification may offer a significant improvement in both robustness to skewed distributions and predictive power.</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-conclusion">Conclusion</h2>


<div class="wp-block-image is-style-default">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" loading="lazy" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/How-to-Deal-With-Imbalanced-Classification-and-Regression-Data_3.png?resize=447%2C472&#038;ssl=1" alt="How to Deal With Imbalanced Classification and Regression Data" class="wp-image-61229" width="447" height="472"/><figcaption class="wp-element-caption"><a href="https://www.meme-arsenal.com/en/create/meme/2299880" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>Canonical ML algorithms assume that the number of objects in considered classes is roughly similar. However, in many real-life problems that we can apply ML to, the distribution of examples is skewed since the events that we care the most about and want to predict happen rarely and for the most part, we collect data points of normal events which represent the normal state and majority group. This poses a difficulty for learning algorithms, as they will be biased towards the majority group.</p>



<p>But in this article, you learned about the different approaches to learning from imbalanced classification and regression data.&nbsp;</p>



<p>Thank you for reading! And as always I have a well-researched reference section that you can use to dive deeper into what you read below as well as a <a href="https://colab.research.google.com/drive/10gViloq5Wet40P1fod2MxYYCo8ou4Yg1#scrollTo=mp8WOh3Zj9wS" target="_blank" rel="noreferrer noopener nofollow">colab notebook</a>.</p>



<h3 class="wp-block-heading" id="references">References</h3>



<ol class="wp-block-list">
<li><a href="https://link.springer.com/content/pdf/10.1007/s13748-016-0094-0.pdf" target="_blank" rel="noreferrer noopener nofollow">https://link.springer.com/content/pdf/10.1007/s13748-016-0094-0.pdf</a></li>



<li><a href="https://arxiv.org/abs/2104.02240" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/2104.02240</a></li>



<li><a href="https://arxiv.org/pdf/1106.1813.pdf" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/pdf/1106.1813.pdf</a></li>



<li>Deep Imbalanced regression
<ol class="wp-block-list">
<li><a href="https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca">ht</a><a href="https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca" target="_blank" rel="noreferrer noopener nofollow">tps://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca</a></li>



<li><a href="https://arxiv.org/pdf/2102.09554.pdf" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/pdf/2102.09554.pdf</a></li>
</ol>
</li>



<li><a href="https://imbalanced-learn.org" target="_blank" rel="noreferrer noopener nofollow">https://imbalanced-learn.org</a></li>



<li><a href="https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/" target="_blank" rel="noreferrer noopener nofollow">https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/</a></li>



<li><a href="https://machinelearningmastery.com/one-class-classification-algorithms/" target="_blank" rel="noreferrer noopener nofollow">https://machinelearningmastery.com/one-class-classification-algorithms/</a></li>



<li><a href="https://dataaspirant.com/handle-imbalanced-data-machine-learning/" target="_blank" rel="noreferrer noopener nofollow">https://dataaspirant.com/handle-imbalanced-data-machine-learning/</a></li>



<li><a href="https://imbalanced-learn.org/stable/auto_examples/ensemble/plot_comparison_ensemble_classifier.html" target="_blank" rel="noreferrer noopener nofollow">https://imbalanced-learn.org/stable/auto_examples/ensemble/plot_comparison_ensemble_classifier.html</a></li>



<li><a href="https://imbalanced-learn.org/stable/ensemble.html" target="_blank" rel="noreferrer noopener nofollow">https://imbalanced-learn.org/stable/ensemble.html</a></li>



<li><a href="https://imbalanced-learn.org/stable/over_sampling.html#from-random-over-sampling-to-smote-and-adasyn" target="_blank" rel="noreferrer noopener nofollow">https://imbalanced-learn.org/stable/over_sampling.html#from-random-over-sampling-to-smote-and-adasyn</a></li>



<li><a href="https://www.fromthegenesis.com/smote-synthetic-minority-oversampling-technique/" target="_blank" rel="noreferrer noopener nofollow">https://www.fromthegenesis.com/smote-synthetic-minority-oversampling-technique/</a></li>



<li><a href="https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data" target="_blank" rel="noreferrer noopener nofollow">https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data</a></li>



<li><a href="https://www.reddit.com/r/datascience/comments/92az1l/how_to_handle_imbalanced_classification_problem/e34e64k" target="_blank" rel="noreferrer noopener nofollow">https://www.reddit.com/r/datascience/comments/92az1l/how_to_handle_imbalanced_classification_problem/e34e64k</a></li>



<li><a href="https://www.sciencedirect.com/topics/engineering/decisions-region" target="_blank" rel="noreferrer noopener nofollow">https://www.sciencedirect.com/topics/engineering/decisions-region</a></li>



<li><a href="https://www.amazon.com/dp/1118074629/ref=as_li_ss_tl?&amp;linkCode=sl1&amp;tag=inspiredalgor-20&amp;linkId=615e87a9105582e292ad2b7e2c7ea339&amp;language=en_US" target="_blank" rel="noreferrer noopener nofollow">https://www.amazon.com/dp/1118074629/ref=as_li_ss_tl?&amp;linkCode=sl1&amp;tag=inspiredalgor-20&amp;linkId=615e87a9105582e292ad2b7e2c7ea339&amp;language=en_US</a></li>



<li>Hybrid Classifiers—Methods of Data, Knowledge, and Classifier Combination. In: Studies in Computational Intelli- gence, vol. 519. Springer, Berlin (2014)</li>



<li><a href="https://www.coursera.org/learn/ml-regression" target="_blank" rel="noreferrer noopener nofollow">https://www.coursera.org/learn/ml-regression</a></li>



<li><a href="https://researchcommons.waikato.ac.nz/bitstream/handle/10289/8518/smoteR.pdf" target="_blank" rel="noreferrer noopener nofollow">https://researchcommons.waikato.ac.nz/bitstream/handle/10289/8518/smoteR.pdf</a></li>



<li><a href="https://arxiv.org/abs/1708.02002v2" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/1708.02002v2</a></li>



<li><a href="https://www.mastersindatascience.org/learning/statistics-data-science/undersampling/" target="_blank" rel="noreferrer noopener nofollow">https://www.masters</a><a href="https://www.mastersindatascience.org/learning/statistics-data-science/undersampling/" target="_blank" rel="noreferrer noopener">indatascience.org/learning/statistics-data-science/undersampling/</a></li>
</ol>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">6482</post-id>	</item>
		<item>
		<title>MLflow vs Kubeflow vs neptune.ai: What Are the Differences?</title>
		<link>https://neptune.ai/blog/mlflow-vs-kubeflow-vs-neptune-differences</link>
		
		<dc:creator><![CDATA[Prince Canuma]]></dc:creator>
		<pubDate>Thu, 21 Jul 2022 13:52:46 +0000</pubDate>
				<category><![CDATA[ML Tools]]></category>
		<guid isPermaLink="false">https://neptune.test/mlflow-vs-kubeflow-vs-neptune-differences/</guid>

					<description><![CDATA[As a Data Scientist, ML/DL Researcher, or Engineer you might have come across or heard about MLflow, Kubeflow, and neptune.ai. Due to the large adoption of ML and DL, many questions arose around deployment, scalability, and reproducibility. Thus MLOps was born as a hybrid of Data engineering, DevOps, and Machine Learning. We had to come&#8230;]]></description>
										<content:encoded><![CDATA[
<p>As a Data Scientist, ML/DL Researcher, or Engineer you might have come across or heard about MLflow, Kubeflow, and neptune.ai. Due to the large adoption of ML and DL, many questions arose around deployment, scalability, and reproducibility. Thus MLOps was born as a hybrid of Data engineering, DevOps, and Machine Learning.</p>



<p>We had to come up with this new way of doing this for ML because ML Development is complex.</p>



<p>The natural question is why?</p>



<p>Naturally, you might think it’s because Math, Algorithms, resources needed (GPUs, TPUs, CPUs&#8230;), data, APIs, libraries, and frameworks. Well, some of it is true but not entirely because nowadays most of it abstracted away for us. If we take Hugging face or fast.ai for example you just call an instance of a particular class and boom the framework/library does all the heavy lifting for you. Furthermore, with the development of <strong><a href="/blog/transfer-learning-guide-examples-for-images-and-text-in-keras" target="_blank" rel="noreferrer noopener">transfer learning</a></strong> we no longer need vast amounts of data to train a model.</p>



<p>Then where does the complexity come from?</p>



<p>The complexity comes from a few of things:</p>



<ol class="wp-block-list">
<li>ML is experimental in nature</li>



<li>It has more parts to account for, such as: data (gathering, labelling, versioning), model (training, eval, versioning, and deployment), and configuration (hyperparameters and so on).</li>



<li>The <a href="/blog/data-science-project-management-in-2021-the-new-guide-for-ml-teams" target="_blank" rel="noreferrer noopener nofollow">paradigm</a> of how we do traditional software development (DevOps) is different from how we do ML (MLOps).</li>
</ol>



<p>As <a href="/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective" target="_blank" rel="noreferrer noopener">MLOps</a> matures many tools have been and are being created to address different parts of the workflow and of the many these 3 tools play key roles in an MLOps workflow to reduce the complexity and solve problems which we are going to talk about in later sections.&nbsp;</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-Kubeflow-Neptune-comparison.png?ssl=1" alt="MLflow Kubeflow Neptune comparison" class="wp-image-44103" style="width:512px;height:288px"/><figcaption class="wp-element-caption"><a href="https://static0.srcdn.com/wordpress/wp-content/uploads/2018/02/Iron-Man-and-Black-Panther.jpg" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>Now, what exactly do they do and how do they compare against each other?</p>



<p>In this article, we are going to answer those questions and more. The following are the points we are addressing:</p>



<ul class="wp-block-list">
<li>Tools
<ul class="wp-block-list">
<li>MLflow&nbsp;</li>



<li>Kubeflow&nbsp;</li>



<li>neptune.ai</li>
</ul>
</li>



<li>Which one should you use and when?</li>



<li>High-level feature comparison table</li>
</ul>



<p>Let&#8217;s dive right in!</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-mlflow">MLflow</h2>



<div class="wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>It is an <strong>open-source</strong> MLOps platform that was born from learning the standards of Big Tech with the focus on creating transferable knowledge, ease of use, modularity and compatibility with popular ML libraries and frameworks. It was designed for a 1 or 1000+ person organisation.&nbsp;</p>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-logo.png?ssl=1" alt="" class="wp-image-29862" style="width:233px;height:132px"/></figure>
</div></div>
</div>



<p>MLFlow allows you to develop, track (and compare experiments), package and deploy locally or remotely. It handles everything from data versioning, model management, <a href="/blog/ml-experiment-tracking" target="_blank" rel="noreferrer noopener">experiment tracking</a> till deployment except data sourcing, labeling and pipelining.</p>



<p>It is pretty much the jack of all trades and/or swiss knife of the MLOps workflow.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-gif.gif?ssl=1" alt="MLflow gif" class="wp-image-44105"/><figcaption class="wp-element-caption"><a href="https://media3.giphy.com/media/J4rXAANmGN0cxvDV4I/giphy.gif" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>This platform is made of a of 4 components:</p>



<ul class="wp-block-list">
<li>MLflow Tracking</li>



<li>MLflow Projects</li>



<li>MLflow Models</li>



<li>Just Model Registry</li>
</ul>



<p>Let&#8217;s go deeper and see the importance of every single one of these components and how they work.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow-tracking">MLflow Tracking</h3>



<p>The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing and comparing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs.</p>



<p>As mentioned before MLFlow allows for local or remote development, therefore both entity and artifact store are customisable meaning you can save locally or on the cloud ( AWS s3, GCP and so on)</p>



<p><strong>Key concepts in Tracking</strong></p>



<ul class="wp-block-list">
<li>Parameters: key-value inputs to your code</li>



<li>Metrics: numeric values (can be update overtime)&nbsp;</li>



<li>Tags &amp; Notes: information about the run</li>



<li>Artifacts: Files, Data &amp; Models&nbsp;</li>



<li>Source: what code ran?</li>



<li>Version: what version of the code ran?</li>



<li>Run: an instance of code that run by MLFlow where metrics and parameters will be logged</li>
</ul>



<p><strong>Tracking APIs</strong></p>



<ul class="wp-block-list">
<li>Fluent MLFlow APIs (High-level)</li>



<li>MLFlow client (Low-level)</li>
</ul>


    <a
        href="/blog/best-mlflow-alternatives"
        id="cta-box-related-link-block_696127d1f801ca7395119b00e94d77e9"
        class="block-cta-box-related-link  l-margin__top--standard l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-the-best-mlflow-alternatives">                The Best MLflow Alternatives            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow-projects">MLflow Projects&nbsp;</h3>



<p>An MLflow Project is a self contained unit of execution that bundles the following:</p>



<ul class="wp-block-list">
<li>Code</li>



<li>Config</li>



<li>Dependencies</li>



<li>Data</li>
</ul>



<p>To deploy it either locally or on a remote server.</p>



<p>This format helps with reproducibility and allows for the creation of a multi-step workflow with separate projects (or entry points in the same project) as the individual steps.&nbsp;&nbsp;</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-projects.png?ssl=1" alt="MLflow projects" class="wp-image-44107" style="width:840px;height:320px"/><figcaption class="wp-element-caption"><a href="https://databricks.com/wp-content/uploads/2018/10/tutorial-multistep-workflow.png" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>In other words MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject file, which is basically a YAML formatted text file.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow-models">MLflow Models&nbsp;</h3>



<p>An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-models.png?ssl=1" alt="MLflow models" class="wp-image-44108"/><figcaption class="wp-element-caption"><a href="https://res.infoq.com/presentations/mlflow-databricks/en/slides/sl21-1566324281761.jpg" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p><strong>Flavors </strong>are the key concept that makes MLFlow Models powerful: they are a convention that deployment tools can use to understand the model. Basically we abstract the model by creating an intermediate format that packages the model that you want to deploy into a variety of environments &#8212; much like a docker file for models or a lambda function that you can deploy to a desired environment and just invoke its scoring function called predict.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-model-registry">Model Registry</h3>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLflow-model-registry.png?ssl=1" alt="MLflow model registry" class="wp-image-34579" style="width:903px;height:262px"/><figcaption class="wp-element-caption"><a href="https://mlflow.org/docs/latest/_images/oss_registry_3_overview.png" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.</p>


    <a
        href="/vs/mlflow"
        id="cta-box-related-link-block_6c13aae54aad52f75da07ec0501aa319"
        class="block-cta-box-related-link  l-margin__top--standard l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
        <div class="block-cta-box-related-link__image-wrapper">
            <figure class="c-image__wrapper">

                
                <img
                    src="https://i0.wp.com/neptune.ai/wp-content/uploads/2021/12/blog_feature_image_045427_7_5_4_4.jpg?fit=200%2C105&amp;ssl=1"
                    loading="lazy"
                    decoding="async"
                    width="200"
                    height="105"
                    class="c-image"
                    alt="">
            </figure>
        </div>

    
    <div class="block-cta-box-related-link__description-wrapper">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--resource.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Recommended                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-feature-by-feature-comparison-between-mlflow-and-neptune-ai">                Feature-By-Feature Comparison Between MLflow and neptune.ai            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Learn more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-kubeflow">Kubeflow&nbsp;</h2>



<div class="wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Kubeflow is an <strong>open-source</strong> project that leverages Kubernetes to build scalable MLOps pipelines and orchestrate complicated workflows. You can view it&nbsp; as a machine learning (ML) toolkit for Kubernetes.</p>



<p><strong><em>Note</em></strong><em>: Kubernetes (or K8s for short) is a container orchestration tool.</em></p>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/kubeflow-2.png?ssl=1" alt="" class="wp-image-15355"/><figcaption class="wp-element-caption"><a href="https://venturebeat.com/wp-content/uploads/2020/03/8a06547d-965d-4981-806a-6c11d559b893-1-e1583173705409.png?w=1200&amp;strip=all" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div></div>
</div>



<p>Now, two questions arise:&nbsp;</p>



<ol class="wp-block-list">
<li>Why containerize your ML applications?</li>



<li>Why ML on K8s?</li>
</ol>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-why-containerize-you-ml-applications">Why containerize you ML applications</h3>



<p>Usually environments are different for different people in a team setting and these differences can go as far as:</p>



<ul class="wp-block-list">
<li>Dependencies (Libraries, Frameworks and versions)</li>



<li>Code (helper functions, Training and evaluation)</li>



<li>Configurations (data transformations, network architecture, batch size and so on)</li>



<li>Software and Hardware</li>
</ul>



<p>This can cause various problems if two or members are to collaborate or take after someone’s work and make improvements.</p>



<p>But through containers one can simply send a docker image and as long as the other person has docker installed locally or in his cloud env. he can easily recreate the same environment, experiments and results.</p>



<h4 class="wp-block-heading">Benefits of containers</h4>



<ul class="wp-block-list">
<li>Packages:
<ul class="wp-block-list">
<li>Code</li>



<li>Dependencies</li>



<li>Configurations</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li>Helps create ML envs that are:
<ul class="wp-block-list">
<li>Lightweight</li>



<li>Portable</li>



<li>Scalable</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-why-ml-on-k8s">Why ML on K8s?</h3>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Orchestra-gif.gif?ssl=1" alt="" class="wp-image-44112"/><figcaption class="wp-element-caption"><em><a href="http://33.media.tumblr.com/bb17f80fb2df629397fe26eedd673059/tumblr_nccngf43vV1tk04foo4_500.gif" target="_blank" rel="noreferrer noopener nofollow">Source</a></em></figcaption></figure>
</div>


<p>As I mentioned before K8s is a container orchestration tool. It makes automating deployment, scaling, and management of containerized applications. But the trouble is in managing k8s itself which can be heptic. But nowadays there exist different providers of managed k8s as a service such as: AWS EKS, Google GKE and Azure AKS.</p>



<p>Using a managed k8s as a service allows ML practitioners to take full advantage of the benefits that k8s bring such as:</p>



<ul class="wp-block-list">
<li>Composability</li>



<li>Portability</li>



<li>Scalability</li>



<li>Or it’s already part of the company or team workflow</li>
</ul>



<p>Now that we got that out of the way, let’s take a more detailed look at Kubeflow.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-kubeflow-components">Kubeflow components</h3>



<p><strong>Kubeflow</strong> is composed of various projects/tools but here we are going to focus on the 4 major ones:</p>



<ul class="wp-block-list">
<li>Notebooks</li>



<li>Pipelines&nbsp;</li>



<li>Training</li>



<li>Serving&nbsp;</li>
</ul>



<p><strong>Notebooks&nbsp;</strong></p>



<p>Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you&#8217;re ready.</p>



<p><strong>Pipelines</strong></p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Kubeflow-pipelines.png?ssl=1" alt="Kubeflow pipelines" class="wp-image-44113"/><figcaption class="wp-element-caption"><a href="https://miro.medium.com/max/1326/1*Swuq1YLAMdbSzYVmzWeBSA.jpeg" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>This is perhaps the most famous project and the reason a lot of teams opt for kubeflow. In a nutshell kubeflow pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers &#8211; it is available as a kubeflow component or as a standalone installation.&nbsp;</p>



<p>At the heart of this project lie two components:</p>



<ul class="wp-block-list">
<li><strong>Pipeline</strong> &#8211; is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each pipeline component.</li>



<li><strong>Pipeline component</strong> &#8211; is a self-contained set of user code, packaged as a Docker image, that performs one step in the pipeline. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on.</li>
</ul>



<p><strong>Pipeline features</strong></p>



<ul class="wp-block-list">
<li>A user interface (UI) for managing and tracking experiments, jobs, and runs.</li>



<li>An engine for scheduling multi-step ML workflows.</li>



<li>An SDK for defining and manipulating pipelines and components.</li>



<li>Notebooks for interacting with the system using the SDK.</li>



<li>Reusability: enabling you to re-use components and pipelines without having to rebuild each time.</li>
</ul>



<p><strong>Training</strong></p>



<p>This project offers you different frameworks for training ML models such as:</p>



<ul class="wp-block-list">
<li>Chainer Training</li>



<li>MPI Training</li>



<li>MXNet Training</li>



<li>PyTorch Training</li>



<li>Job Scheduling</li>



<li>TensorFlow Training (TFJob)</li>
</ul>



<p>Here you can execute training jobs, monitor the training and much more. One of the cool features is actually being able to easily define and take advantage of kubernetes replicas which allows you to spin multiple identical versions of a container image. Therefore, if one or more replicas fails during a training job your progress is not completely lost because you have another version running in parallel.</p>



<p><strong>Serving</strong></p>



<p>When it comes to serving models kubeflow offers great support.</p>



<p>Kubeflow has a component called KFServing that enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.</p>



<p>KFServing can be used to do the following:</p>



<ul class="wp-block-list">
<li>Provide a Kubernetes Custom Resource Definition for serving ML models on arbitrary frameworks.</li>



<li>Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments.</li>



<li>Enable a simple, pluggable, and complete story for your production ML inference server by providing prediction, pre-processing, post-processing and explainability out of the box.</li>
</ul>



<p>Furthermore, besides KFserving, Kubeflow supports TensorFlow Serving containers to export trained TensorFlow models to Kubernetes. It is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale. Finally, it also supports <a href="https://www.bentoml.com/" target="_blank" rel="noreferrer noopener nofollow">BentoML</a>, an open-source platform for high-performance ML model serving. It makes building production API endpoint for your ML model easy and supports all major machine learning training frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn and etc</p>



<p>But it doesn&#8217;t end there, on top of everything you can run Kubeflow on Kubernetes Engine and AWS, GCP or Azure. Let’s take AWS for example, Kubeflow has an integration with AWS Sagemaker that allows you to take full advantages of scale that come with such a managed service.&nbsp;</p>



<p>In my opinion I don’t think end-to-end ML platforms are the way to go. <em>For more details you can later read this </em><a href="https://neptune.ai/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective"><em>article</em></a><em> where I explain this in detail, once you finish this one.</em></p>



<p>I believe microservices give you more flexibility to plug in any new service to your pipeline or replace a broken service/component or tool but such integrations as kubeflow and these different cloud providers can let you build more robust solutions.</p>


    <a
        href="/blog/the-best-kubeflow-alternatives"
        id="cta-box-related-link-block_5ad00f42127a0a63020688fc5b8928c7"
        class="block-cta-box-related-link  l-margin__top--standard l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-the-best-kubeflow-alternatives">                The Best Kubeflow Alternatives            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-neptune-ai">neptune.ai</h2>



<figure class="wp-block-image size-full"><a href="https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?ssl=1" target="_blank" rel="noopener"><img data-recalc-dims="1" loading="lazy" decoding="async" width="1200" height="628" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=1200%2C628&#038;ssl=1" alt="ML Metadata Store" class="wp-image-15676" srcset="https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?w=1200&amp;ssl=1 1200w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=768%2C402&amp;ssl=1 768w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=200%2C105&amp;ssl=1 200w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=220%2C115&amp;ssl=1 220w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=120%2C63&amp;ssl=1 120w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=160%2C84&amp;ssl=1 160w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=300%2C157&amp;ssl=1 300w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=480%2C251&amp;ssl=1 480w, https://i0.wp.com/neptune.ai/wp-content/uploads/2023/01/Metadata-store.png?resize=1020%2C534&amp;ssl=1 1020w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></a></figure>



<p>neptune.ai is a metadata store for MLOps, built for research and production teams that run a lot of experiments.&nbsp;</p>



<p>It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.&nbsp;</p>



<p>Thousands of ML engineers and researchers use Neptune for experiment tracking and model registry both as individuals and inside teams at large organizations.</p>



<p>Now, a question might arise: why a metadata store?</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-why-a-metadata-store">Why a metadata store?</h3>



<p>Unlike notes, organization protocols or open-source tools a metadata store is as I mentioned before a centralized place but it is also lightweight, automatic, and maintained by the organization (in this case Neptune) or community so that people can focus on actually doing ML rather than metadata bookkeeping.&nbsp;</p>



<p>Furthermore, a metadata store is a tool that serves as a connector between different parts/phases/tools of the MLOps workflow.</p>



<h4 class="wp-block-heading">Benefit of a metadata store</h4>



<ul class="wp-block-list">
<li>Log and display all metadata types including Parameters, Images, HTML, Audio, Video</li>



<li>Organize and compare experiments in a dashboard</li>



<li>See model training live</li>



<li>Have it (metadata store) maintained and backed up by someone (not you)</li>



<li>Debug and compare experiments and models with no extra effort</li>



<li>Both database and dashboard scale with thousands of experiments&nbsp;&nbsp;</li>



<li>Help ease the transition from research to production</li>



<li>Easy to build custom libs/tools on top of it&nbsp;</li>
</ul>



<p>Now that we got that out of the way, let’s take a more detailed look at Neptune.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-neptune-components">Neptune components</h3>



<p><strong>Neptune</strong> is made of 3 major components:</p>



<ul class="wp-block-list">
<li>Data versioning</li>



<li>Experiment tracking</li>



<li>Model registry</li>
</ul>



<h4 class="wp-block-heading">Data versioning</h4>


<div class="wp-block-image">
<figure class="aligncenter size-large"><a href="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/artifacts-compare-runs-on-dataset.png?ssl=1"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/artifacts-compare-runs-on-dataset.png?ssl=1" alt="artifacts-compare-runs-on-dataset" class="wp-image-51777"/></a><figcaption class="wp-element-caption"><em><a href="https://docs.neptune.ai/how-to-guides/data-versioning/compare-datasets" target="_blank" rel="noreferrer noopener">Comparing datasets in neptune</a></em>.ai</figcaption></figure>
</div>


<p>Version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa. In other words this feature helps track which dataset, or subset of the dataset, we used to train a particular version of the model and thus enabling and facilitating experiment reproducibility.</p>



<p>With the <a href="https://docs.neptune.ai/how-to-guides/data-versioning" target="_blank" rel="noreferrer noopener">data versioning functionality in Neptun</a>e, you can: </p>



<ul class="wp-block-list">
<li>Keep track of a dataset version in your model training runs with artifacts</li>



<li>Query the dataset version from previous runs to make sure you are training on the same dataset version</li>



<li>Group your Neptune Runs by the dataset version they were trained on</li>
</ul>



<h4 class="wp-block-heading">Experiment tracking</h4>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Neptune-Experiment-tracking-1.png?ssl=1" alt="Neptune experiment tracking" class="wp-image-45580"/></figure>
</div>


<p>This feature of Neptune helps you to <a href="https://docs.neptune.ai/you-should-know/organizing-and-filtering-runs" target="_blank" rel="noreferrer noopener">organize your ML experimentation</a> in a single place by:&nbsp;</p>



<ul class="wp-block-list">
<li>Logging and displaying metrics, parameters, images, and other ML metadata</li>



<li>Searching, grouping, and comparing experiments with no extra effort</li>



<li>Visualizing and debugging experiments live as they are running</li>



<li>Sharing results by sending a persistent link</li>



<li>Querying experiment metadata programmatically&nbsp;</li>
</ul>



<h4 class="wp-block-heading">Model registry</h4>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Neptune-model-registry.png?ssl=1" alt="Neptune model registry" class="wp-image-45582"/></figure>
</div>


<p>This feature allows you to have your model development under control by organizing your models in a <a href="https://docs.neptune.ai/how-to-guides/model-registry" target="_blank" rel="noreferrer noopener">central model registry</a>, making them repeatable and traceable.</p>



<p>Meaning you can version, store, organize, and query models during the model development till deployment. The metadata saved includes:</p>



<ul class="wp-block-list">
<li>Dataset, code, env config versions</li>



<li>Parameters and evaluation metrics</li>



<li>Model binaries, descriptions, and other details</li>



<li>Testset prediction previews and model explanations</li>
</ul>



<p>Furthermore, it also enables teams either geographically close or distant to collaborate on experiments because everything that your team logs to Neptune is automatically accessible to every team member. So reproducibility is no longer a problem.</p>



<p>You can access model training run information like the code, parameters, model binary, or other objects via an API.</p>



<p>With Neptune, you can replace folder structures, spreadsheets, and naming conventions with a single source of truth where all your model building metadata is organized, easy to find, share, and query.&nbsp;&nbsp;</p>



<p>This tool gives you control over models and experiments by keeping a record of everything that happens during model development. </p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Product_logging-metadata.gif?ssl=1" alt="Neptune Logging metadata" class="wp-image-38014"/></figure>
</div>


<p>This equals less time spent looking for configs and files, context switching,&nbsp; unproductive meetings and more time for quality ML work. With Neptune, you don’t have to implement loggers, maintain databases or dashboards, or teach people how to use them.&nbsp;</p>



<p>You can get the most out of your computational resources by keeping track of all ideas you have already tried and how much resources you used. Monitor your ML runs live and react quickly when runs fail, or models stop converging.&nbsp;&nbsp;</p>



<p>Finally, Neptune allows you to build reproducible, compliant, and traceable models by versioning all your model training runs, and also allows you to know who built the production model, which dataset and parameters were used, and how it performed at any time.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-now-just-tell-me-which-one-and-when-to-use-it">Now, just tell me which one and when to use it</h2>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow">MLflow</h3>



<p>If you want a MLOps platforms that is powered by the <strong>open-source</strong> community that allows you to:</p>



<ul class="wp-block-list">
<li>Track, visualize and compare experiment metadata</li>



<li>UI that allows you to visualize and compare experiment results</li>



<li>Develop (package and deploy) models</li>



<li>A platform that allows you to create a multi-step workflow (much like Kubeflow pipelines but without using containers)</li>
</ul>



<p>And a way to abstract the model thus allowing to easily deploy it into a variety of environments then MLflow is the way to go.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-kubeflow">Kubeflow&nbsp;</h3>



<p>If you want a end-to-end <strong>open-source</strong> platform that allows you to:</p>



<ul class="wp-block-list">
<li>Manage and set resource quotas across different teams as well as to code, run and track experiment metadata either locally or in the cloud&nbsp;</li>



<li>The ability to build reproducible pipelines with components that span the entire ML Lifecycle (from data gathering all the way to model building and deployment) then kubeflow is the way to go</li>



<li>UI that allows you to visualize your pipeline and experiment metadata as well as compare experiment results.</li>



<li>Built-in Notebook server service</li>
</ul>



<p>Finally, your K8s environment might have limited resources but both K8s and kubeflow have an integration with AWS Sagemaker that enable the use of fully managed Sagemaker ML tools across the ML workflow natively from Kubernetes or Kubeflow which means you can take advantage of it’s capability to scale resources (i.e. GPU instances) and it’s services (i.e. Sagemaker Ground Truth, Model Monitor etc).</p>



<p>This eliminates the need for you to manually manage and optimize your Kubernetes-based ML infrastructure while still preserving control over orchestration and flexibility.</p>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-neptune-ai">neptune.ai</h3>



<p>If you want centralized place:</p>



<ul class="wp-block-list">
<li>To store all your metadata (data versioning, experiment tracking and model registry)&nbsp;</li>



<li>That has Intuitive and customizable UI that allows you to visualize and compare experiment results as well as arrange the displayed data as you wish</li>



<li>Has a project wiki that facilitates sharing reports, insights, and remarks about the project’s progress, runs and data exploration Notebooks</li>



<li>Notebook checkpointing (for Jupyter)</li>



<li>That has easy and seamless integrations with most of best tools as well as MLOps platforms in the industry
<ul class="wp-block-list">
<li>For example, Neptune has an integration with MLflow and many other libraries, tools and ML/DL Frameworks.</li>



<li>If an integration is not available you can add it to your notebook, .py project or containerized ML project (in case you are using Kubernetes or Kubeflow) powered by your favorite libraries, tools and framework such as Pytorch using the python client.</li>
</ul>
</li>
</ul>



<p>Finally, if you want a fully managed service or if you want more control there is the server version, then Neptune is the way to go.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-high-level-feature-comparison-table">High-level feature comparison table</h2>



<div id="separator-block_82af4eaf2617b17afa4edfdb227d6da9"
         class="block-separator block-separator--10">
</div>



<div id="medium-table-block_b2a969fe6153922ed73dcbbbecfb28aa"
     class="block-medium-table c-table__outer-wrapper  l-padding__top--0 l-padding__bottom--0 l-margin__top--unset l-margin__bottom--unset">

    <table class="c-table">
                    <thead class="c-table__head">
            <tr>
                                    <td class="c-item"
                        style="">
                        <div class="c-item__inner">
                            &nbsp;                        </div>
                    </td>
                                    <td class="c-item"
                        style="">
                        <div class="c-item__inner">
                            MLflow                        </div>
                    </td>
                                    <td class="c-item"
                        style="">
                        <div class="c-item__inner">
                            Kubeflow                        </div>
                    </td>
                                    <td class="c-item"
                        style="">
                        <div class="c-item__inner">
                            neptune.ai                        </div>
                    </td>
                            </tr>
            </thead>
        
        <tbody class="c-table__body">

                    
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Pricing</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Free</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Free</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Freemium</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Free Plan limitations</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>No limits</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>No limits</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Free for individuals, non-profit and educational research<br />
<a href="/pricing">Paid for teams</a></p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Open-source</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>No</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Easy to use</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Easy</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>There is a learning curve</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Easy</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Composability</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Portability</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Scalability</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Customizable</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Limited</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>On-prem version</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

            
                <tr class="c-row">

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p><strong>Managed service version</strong></p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>No</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                        <td class="c-ceil">
                            <div class="c-ceil__inner">
                                                                    <p>Yes</p>
                                                            </div>
                        </td>

                    
                </tr>

                    
        </tbody>
    </table>

</div>



<div id="separator-block_b8d6c42743140b9c156cea189f8b390e"
         class="block-separator block-separator--20">
</div>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-conclusion"><strong>Conclusion</strong></h2>



<p>In the end, the choice is in your hands, it depends on your requirements and needs but I want you to know that this is not an either-or situation. These tools are not mutually exclusive from one another, you can mix and match them as per your requirements and wishes.&nbsp;</p>



<p>It could be Kubeflow with MLflow or Kubeflow with neptune.ai as well as MLflow with neptune.ai.</p>



<p>Let me elaborate, for example Kubeflow and MLflow or Kubeflow and Neptune, in these two cases Kubeflow might not have a direct integration but you can add MLflow or Neptune to the pipeline component (aka containerized app).</p>



<p>Now when it comes to MLflow and Neptune it is much easier because Neptune has an integration with MLflow.</p>



<p>Thus, you are not stuck using only one tool.</p>



<p>With that we have come full circle, below is a ton of references for you to check out and devour. Have fun!</p>



<p>Thank you!</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-references"><strong>References</strong></h2>



<ul class="wp-block-list">
<li><a href="https://aws.amazon.com/sagemaker/">https://aws.amazon.com/sagemaker/</a></li>



<li><a href="https://aws.amazon.com/sagemaker/kubernetes/">https://aws.amazon.com/sagemaker/kubernetes/</a></li>



<li><a href="https://medium.com/ai%C2%B3-theory-practice-business/how-do-data-science-workers-collaborate-c4158d8bd471" target="_blank" rel="noreferrer noopener nofollow">https://medium.com/ai%C2%B3-theory-practice-business/how-do-data-science-workers-collaborate-c4158d8bd471 </a></li>



<li><a href="https://stackoverflow.com/questions/59046257/what-are-the-differences-between-airflow-and-kubeflow-pipeline">https://stackoverflow.com/questions/59046257/what-are-the-differences-between-airflow-and-kubeflow-pipeline</a></li>



<li><a href="/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective" target="_blank" rel="noreferrer noopener">https://neptune.ai/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective </a></li>



<li><a href="/blog/data-science-project-management-in-2021-the-new-guide-for-ml-teams" target="_blank" rel="noreferrer noopener">https://neptune.ai/blog/data-science-project-management-in-2021-the-new-guide-for-ml-teams</a></li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-mlflow">MLflow</h3>



<ul class="wp-block-list">
<li><a href="https://mlflow.org/">https://mlflow.org/</a></li>



<li><a href="https://databricks.com/blog/2019/10/17/managed-mlflow-now-available-on-databricks-community-edition.html" target="_blank" rel="noreferrer noopener nofollow">https://databricks.com/blog/2019/10/17/managed-mlflow-now-available-on-databricks-community-edition.html</a></li>



<li><a href="https://databricks.com/product/managed-mlflow" target="_blank" rel="noreferrer noopener nofollow">https://databricks.com/product/managed-mlflow</a></li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-kubeflow">Kubeflow</h3>



<ul class="wp-block-list">
<li><a href="https://www.youtube.com/watch?v=sRQECN7LsbI" target="_blank" rel="noreferrer noopener nofollow">https://www.youtube.com/watch?v=sRQECN7LsbI </a></li>



<li><a href="https://www.kubeflow.org/">https://www.kubeflow.org/</a></li>



<li><a href="https://www.kubeflow.org/docs/other-guides/integrations/">https://www.kubeflow.org/docs/other-guides/integrations/</a></li>



<li><a href="https://www.datarevenue.com/en-blog/airflow-vs-luigi-vs-argo-vs-mlflow-vs-kubeflow#:~:text=Airflow%20is%20a%20generic%20task,Kubeflow%20runs%20tasks%20on%20Kubernetes">https://www.datarevenue.com/en-blog/airflow-vs-luigi-vs-argo-vs-mlflow-vs-kubeflow#:~:text=Airflow%20is%20a%20generic%20task,Kubeflow%20runs%20tasks%20on%20Kubernetes</a></li>



<li><a href="https://aws.amazon.com/sagemaker/kubernetes/">https://aws.amazon.com/sagemaker/kubernetes/</a></li>
</ul>



<h3 class="wp-block-heading" class="wp-block-heading" id="h-neptune-ai">neptune.ai</h3>



<ul class="wp-block-list">
<li><a href="/blog/mlops-what-it-is-why-it-matters-and-how-to-implement-it-from-a-data-scientist-perspective" target="_blank" rel="noreferrer noopener">https://neptune.ai/blog/mlops</a></li>



<li><a href="/pricing-2" target="_blank" rel="noreferrer noopener">https://neptune.ai/pricing</a></li>



<li><a href="https://docs.neptune.ai/integrations/index.html" target="_blank" rel="noreferrer noopener">https://docs.neptune.ai/integrations/index.html</a></li>
</ul>



<p></p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">4603</post-id>	</item>
		<item>
		<title>MLOps: What It Is, Why It Matters, and How to Implement It</title>
		<link>https://neptune.ai/blog/mlops</link>
		
		<dc:creator><![CDATA[Prince Canuma]]></dc:creator>
		<pubDate>Thu, 21 Jul 2022 13:40:38 +0000</pubDate>
				<category><![CDATA[MLOps]]></category>
		<guid isPermaLink="false">https://neptune.test/mlops/</guid>

					<description><![CDATA[What is this MLOps thing?&#160; It was the question I had on my mind, but until recently (I&#8217;m writing it in the late 2020) , I had only heard about MLOps a few times at big AI conferences, I saw some mentions in papers I read over the years, but I didn&#8217;t know anything specific.&#160;&#8230;]]></description>
										<content:encoded><![CDATA[
<p>What is this <a href="https://ml-ops.org/">MLOps</a> thing?&nbsp;</p>



<p>It was the question I had on my mind, but until recently (I&#8217;m writing it in the late 2020) , I had only heard about MLOps a few times at big AI conferences, I saw some mentions in papers I read over the years, but I didn&#8217;t know anything specific.&nbsp;</p>



<p>Interestingly enough, around the same time, I had a conversation with a friend who works as a Data Mining Specialist in Mozambique, Africa. Recently they started to create their in-house ML pipeline, and coincidentally I was starting to write this article while doing my own research into the mysterious area of MLOps to put everything in one place.</p>



<p>In this conversion, I&#8217;ve learned more about the many pain points that both legacy companies (and many tech companies doing commercial ML) have regarding:</p>



<ul class="wp-block-list">
<li>Moving to the cloud;&nbsp;</li>



<li>Creating and managing ML pipelines;</li>



<li>Scaling;</li>



<li>Dealing with sensitive data at scale;</li>



<li>And about a million other problems.</li>
</ul>



<p>And so I made it my duty to dive in deep and conduct extensive research and learn as much as I could as I was writing down my own notes and ideas.</p>



<p>The result is this article.</p>



<p>But why research this topic now?</p>



<p>According to <a href="https://techjury.net/blog/how-much-data-is-created-every-day/#gref" target="_blank" rel="noreferrer noopener nofollow">techjury</a>, every person created at least 1.7 MB of data per second in 2020. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed.&nbsp;</p>



<p>But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:</p>



<ul class="wp-block-list">
<li>acquiring &amp; cleaning large amounts of data;</li>



<li>setting up tracking and versioning for experiments and model training runs;</li>



<li>setting up the deployment and monitoring pipelines for the models that do get to production.&nbsp;</li>
</ul>



<p>And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.</p>



<p>There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps&#8217; solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.</p>



<p>That brings us to <strong>MLOps</strong>. It was born at the intersection of <strong>DevOps</strong>, <strong>Data Engineering,</strong> and <strong>Machine Learning</strong>, and it&#8217;s a similar concept to DevOps<strong>, </strong>but the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.</p>



<p>Let&#8217;s dig in!</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-what-is-mlops">What is MLOps?&nbsp;</h2>



<p><strong>MLOps</strong> (Machine Learning Operations) is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases the quality, simplifies the management process, and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments. It’s easier to align models with business needs, as well as regulatory requirements.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLOps_cycle.jpg?ssl=1" alt="MLOps cycle" class="wp-image-40190"/></figure>
</div>


<p>MLOps is slowly evolving into an independent approach to ML lifecycle management. It applies to the entire lifecycle &#8211; data gathering, model creation (software development lifecycle, continuous integration/continuous delivery), orchestration, deployment, health, diagnostics, governance, and business metrics.</p>



<p>The key phases of MLOps are:</p>



<ul class="wp-block-list">
<li>Data gathering</li>



<li>Data analysis</li>



<li>Data transformation/preparation</li>



<li>Model training &amp; development&nbsp;</li>



<li>Model validation&nbsp;</li>



<li>Model serving&nbsp;</li>



<li>Model monitoring&nbsp;</li>



<li>Model re-training.</li>
</ul>


    <a
        href="/blog/how-to-learn-mlops"
        id="cta-box-related-link-block_baa2ad238e1b4fa620a2dcc2b24f6c17"
        class="block-cta-box-related-link  l-margin__top--0 l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-how-to-learn-mlops-in-2024-courses-books-and-other-resources">                How to Learn MLOps in 2024 [Courses, Books, and Other Resources]            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h3 class="wp-block-heading" id="devops-vs-mlops">DevOps vs MLOps</h3>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLOps-DevOps.png?ssl=1" alt="MLOps DevOps" class="wp-image-34038"/><figcaption class="wp-element-caption"><em>Source: <a href="https://nealanalytics.com/expertise/mlops/" target="_blank" rel="noreferrer noopener nofollow">NealAnalytics</a></em></figcaption></figure>
</div>


<p>DevOps and MLOps have fundamental similarities because <a href="https://neptune.ai/blog/mlops-principles" target="_blank" rel="noreferrer noopener">MLOps principles</a> were derived from DevOps principles. But they’re quite different in execution:</p>



<ol class="wp-block-list">
<li>Unlike DevOps, <strong>MLOps is much more experimental in nature</strong>. Data Scientists and ML/DL engineers have to tweak various features &#8211; hyperparameters, parameters, and models &#8211; while also keeping track of and managing the data and the code base for reproducible results. <em>Besides all the efforts and tools, the ML/DL industry still struggles with the reproducibility of experiments. This topic is out of the scope of this article, so for more information check the reproducibility subsection in references at the end.</em></li>
</ol>



<ol start="2" class="wp-block-list">
<li><strong>Hybrid team composition:</strong> the team needed to build and deploy models in production won’t be composed of software engineers only. In an ML project, the team usually includes data scientists or ML researchers, who focus on exploratory data analysis, model development, and experimentation. They might not be experienced software engineers who can build production-class services.</li>
</ol>



<ol start="3" class="wp-block-list">
<li><strong>Testing: </strong>testing an ML system involves <a href="https://link.medium.com/GxMQJqdQvbb" target="_blank" rel="noreferrer noopener nofollow">model validation</a>, model training, and so on &#8211; in addition to the conventional code tests, such as unit testing and integration testing.&nbsp;</li>
</ol>



<ol start="4" class="wp-block-list">
<li><strong>Automated Deployment</strong>: you can’t just deploy an offline-trained ML model as a prediction service. You’ll need a multi-step pipeline to automatically retrain and deploy a model. This pipeline adds complexity because you need to automate the steps that data scientists do manually before deployment to train and validate new models.</li>
</ol>



<ol start="5" class="wp-block-list">
<li><strong>Production performance degradation of the system due to evolving data profiles or simply Training-Serving Skew</strong>: ML models in production can have reduced performance not only due to suboptimal coding but also due to constantly <strong>evolving data profiles</strong>. Models can decay in more ways than conventional software systems, and you need to plan for it. This can be caused by:</li>
</ol>



<ul class="wp-block-list">
<li>A discrepancy between how you handle data in the training and serving pipelines.</li>



<li>A change in the data between when you train and when you serve.</li>



<li>Feedback loop &#8211; when you choose the wrong hypothesis (i.e. objective) to optimize, which makes you collect biased data for training your model. Then, without knowing, you collect newer data points using this flawed hypothesis, it’s fed back in to retrain/fine-tune future versions of the model, making the model even more biased, and the snowball keeps growing. For more information read Fastbook’s section on <a href="https://github.com/fastai/fastbook/blob/master/01_intro.ipynb" target="_blank" rel="noreferrer noopener nofollow">Limitations Inherent To Machine Learning</a>.&nbsp;</li>
</ul>



<ol start="6" class="wp-block-list">
<li><strong>Monitoring</strong>: models in production need to be monitored. Similarly, the summary statistics of data that built the model need to be monitored so that you can refresh the model when needed. These statistics can and will change over time, you need notifications or a roll-back process when values deviate from your expectations.</li>
</ol>



<p>MLOps and DevOps are similar when it comes to continuous integration of source control, unit testing, integration testing, and continuous delivery of the software module or the package.&nbsp;</p>


    <a
        href="/blog/mlops-is-extension-of-devops"
        id="cta-box-related-link-block_78abd809c04470af90a6c431ce98655e"
        class="block-cta-box-related-link  l-margin__top--0 l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-mlops-is-part-of-devops-not-a-fork-my-thoughts-on-the-mlops-paper-as-an-mlops-startup-ceo">                MLOps is part of DevOps. Not a fork — my thoughts on THE MLOps paper as an MLOps startup CEO            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<p>However, in ML there are a few notable differences:</p>



<ul class="wp-block-list">
<li><strong>Continuous Integration </strong>(CI) is no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models.</li>



<li><strong>Continuous Deployment</strong> (CD) is no longer about a single software package or service, but a system (an ML training pipeline) that should automatically deploy another service (model prediction service) or roll back changes from a model.</li>



<li><strong>Continuous Testing</strong> (CT) is a new property, unique to ML systems, that&#8217;s concerned with automatically retraining and serving the models.</li>
</ul>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/ML-process.png?ssl=1" alt="end-to-end machine learning platform" class="wp-image-34040"/><figcaption class="wp-element-caption">End-to-end machine learning platform | <a href="https://www.kdnuggets.com/2020/07/tour-end-to-end-machine-learning-platforms.html" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<h3 class="wp-block-heading" id="mlops-vs-experiment-tracking-vs-ml-model-management">MLOps vs experiment tracking vs ML model management</h3>



<p>We’ve defined what MLOps is, what about experiment tracking and ML model management?</p>



<h4 class="wp-block-heading" id="experiment-tracking">Experiment tracking</h4>



<p><a href="/experiment-tracking" target="_blank" rel="noreferrer noopener">Experiment tracking</a> is a part (or process) of MLOps focused on collecting, organizing, and tracking model training information across multiple runs with different configurations (hyperparameters, model size, data splits, parameters, and so on).&nbsp;</p>



<p>As mentioned earlier, because ML/DL is so experimental in nature, we use experiment tracking tools for benchmarking different models created either by different companies, teams or team members.</p>



<h4 class="wp-block-heading" id="model-management">Model management</h4>



<p>To ensure that ML models are consistent and all business requirements are met at scale, a logical, easy-to-follow policy for <a href="/blog/machine-learning-model-management" target="_blank" rel="noreferrer noopener">model management</a> is essential.&nbsp;</p>



<p>MLOps methodology includes a process for streamlining model training, packaging, validation, deployment, and monitoring. This way you can run ML projects consistently from end-to-end.</p>



<p>By setting a clear, consistent methodology for Model Management, organizations can:</p>



<ul class="wp-block-list">
<li>Proactively address common business concerns (such as regulatory compliance);</li>



<li>Enable reproducible models by tracking data, models, code, and model versioning;</li>



<li>Package and deliver models in repeatable configurations to support reusability.</li>
</ul>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-why-does-mlops-matter">Why does MLOps matter?</h2>



<p>MLOps is fundamental. Machine learning helps individuals and businesses deploy solutions that unlock previously untapped sources of revenue, save time, and reduce cost by creating more efficient workflows, leveraging data analytics for decision-making, and improving customer experience.&nbsp;</p>



<p>These goals are hard to accomplish without a solid framework to follow. Automating model development and deployment with MLOps means faster go-to-market times and lower operational costs. It helps managers and developers be more agile and strategic in their decisions.</p>



<p>MLOps serves as the map to guide individuals, small teams, and even businesses to achieve their goals no matter their constraints, be it sensitive data, fewer resources, small budget, and so on.&nbsp;&nbsp;</p>



<p>You decide how big you want your map to be because MLOps are practices that are not written in stone. You can experiment with different settings and only keep what works for you.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-mlops-best-practices">MLOps best practices</h2>



<p>At first, I wanted to just list 10 best practices, but after some research, I came to the conclusion that it would be best to cover the best practices for different components of an ML pipeline, namely: Team, Data, Objective, Model, Code, and Deployment.</p>



<p>The following list is distilled from various sources mentioned in the references:</p>



<h3 class="wp-block-heading" id="team">Team</h3>



<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/best_practices/05-collaborative_platform/" target="_blank" rel="noreferrer noopener nofollow">Use A Collaborative Development Platform</a></li>



<li><a href="https://se-ml.github.io/best_practices/05-use_backlog/" target="_blank" rel="noreferrer noopener nofollow">Work Against a Shared Backlog</a></li>



<li><a href="https://se-ml.github.io/best_practices/05-communication_collab/" target="_blank" rel="noreferrer noopener nofollow">Communicate, Align, and Collaborate With Others</a></li>
</ul>



<h3 class="wp-block-heading" id="data">Data</h3>



<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/best_practices/01-sanity_check/" target="_blank" rel="noreferrer noopener nofollow">Use Sanity Checks for All External Data Sources</a></li>



<li><a href="https://d1.awsstatic.com/whitepapers/mlops-continuous-delivery-machine-learning-on-aws.pdf">Track, identify, and account for changes in data sources.</a></li>



<li><a href="https://se-ml.github.io/best_practices/01-reusable_data_clean/" target="_blank" rel="noreferrer noopener nofollow">Write Reusable Scripts for Data Cleaning and Merging</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_20_combine_and_modify_existing_features_to_create_new_features_in_human%C2%AD-understandable_ways" target="_blank" rel="noreferrer noopener nofollow">Combine and modify existing features to create new features in human­-understandable ways</a></li>



<li><a href="https://se-ml.github.io/best_practices/01-data-label/" target="_blank" rel="noreferrer noopener nofollow">Ensure Data Labelling is Performed in a Strictly Controlled Process</a></li>



<li><a href="https://se-ml.github.io/best_practices/01-data-share/" target="_blank" rel="noreferrer noopener nofollow">Make Data Sets Available on Shared Infrastructure (private or public)</a></li>
</ul>



<h3 class="wp-block-heading" id="objective-metrics-kpis">Objective (Metrics &amp; KPIs)</h3>



<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_12_don%E2%80%99t_overthink_which_objective_you_choose_to_directly_optimize" target="_blank" rel="noreferrer noopener nofollow">Don’t overthink which objective you choose to directly optimize, track multiple metrics at first.</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_20_combine_and_modify_existing_features_to_create_new_features_in_human%C2%AD-understandable_ways" target="_blank" rel="noreferrer noopener nofollow">Choose a simple, observable and attributable metric for your first objective</a></li>



<li><a href="https://se-ml.github.io/best_practices/06-code_conduct/" target="_blank" rel="noreferrer noopener nofollow">Set Governance Objectives</a></li>



<li><a href="https://se-ml.github.io/best_practices/06-responsible_ml_ai/" target="_blank" rel="noreferrer noopener nofollow">Enforce Fairness and Privacy</a></li>
</ul>



<h3 class="wp-block-heading" id="model">Model&nbsp;</h3>



<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_4_keep_the_first_model_simple_and_get_the_infrastructure_right" target="_blank" rel="noreferrer noopener nofollow">Keep the first model simple and get the infrastructure right</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_14_starting_with_an_interpretable_model_makes_debugging_easier" target="_blank" rel="noreferrer noopener nofollow">Starting with an interpretable model makes debugging easier.</a></li>



<li><strong>Training</strong>
<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/best_practices/02-train_metric/" target="_blank" rel="noreferrer noopener nofollow">Capture the Training Objective in a Metric that is Easy to Measure and Understand</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-archive_old_feature/" target="_blank" rel="noreferrer noopener nofollow">Actively Remove or Archive Features That are Not Used</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-peer_review_mdl/" target="_blank" rel="noreferrer noopener nofollow">Peer Review Training Scripts</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-parallel_training/" target="_blank" rel="noreferrer noopener nofollow">Enable Parallel Training Experiments</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-auto_hyperparams/" target="_blank" rel="noreferrer noopener nofollow">Automate Hyper-Parameter Optimisation</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-measure_mdl_quality/" target="_blank" rel="noreferrer noopener nofollow">Continuously Measure Model Quality and Performance</a></li>



<li><a href="https://se-ml.github.io/best_practices/02-data_version/" target="_blank" rel="noreferrer noopener nofollow">Use Versioning for Data, Model, Configurations and Training Scripts</a></li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading" id="code">Code</h3>



<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/best_practices/03-regr_test/" target="_blank" rel="noreferrer noopener nofollow">Run Automated Regression Tests</a></li>



<li><a href="https://se-ml.github.io/best_practices/03-use_static_analysis/" target="_blank" rel="noreferrer noopener nofollow">Use Static Analysis to Check Code Quality</a></li>



<li><a href="https://se-ml.github.io/best_practices/03-cont-int/" target="_blank" rel="noreferrer noopener nofollow">Use Continuous Integration</a></li>
</ul>



<h3 class="wp-block-heading" id="deployment">Deployment</h3>



<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_16_plan_to_launch_and_iterate" target="_blank" rel="noreferrer noopener nofollow">Plan to launch and iterate.</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-auto_model_packaging/" target="_blank" rel="noreferrer noopener nofollow">Automate Model Deployment</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-monitor_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Continuously Monitor the Behaviour of Deployed Models</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-rollback_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Enable Automatic Rollbacks for Production Models</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_41_when_performance_plateaus_look_for_qualitatively_new_sources_of_information_to_add_rather_than_refining_existing_signals" target="_blank" rel="noreferrer noopener nofollow">When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-shadow_models_prod/" target="_blank" rel="noreferrer noopener nofollow">Enable Shadow Deployment</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_40_keep_ensembles_simple" target="_blank" rel="noreferrer noopener nofollow">Keep ensembles simple</a></li>



<li><a href="https://se-ml.github.io/best_practices/04-log_production/" target="_blank" rel="noreferrer noopener nofollow">Log Production Predictions with the Model&#8217;s Version, Code Version and Input Data</a></li>



<li><strong>Human Analysis of the System &amp; Training-Serving Skew</strong>
<ul class="wp-block-list">
<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_23_you_are_not_a_typical_end_user" target="_blank" rel="noreferrer noopener nofollow">You are not a typical end user.</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_24_measure_the_delta_between_models" target="_blank" rel="noreferrer noopener nofollow">Measure the delta between models</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_25_when_choosing_models_utilitarian_performance_trumps_predictive_power" target="_blank" rel="noreferrer noopener nofollow">When choosing models, utilitarian performance trumps predictive power.</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_37_measure_trainingserving_skew" target="_blank" rel="noreferrer noopener nofollow">Perform evolving data profiles checks</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml#rule_33_if_you_produce_a_model_based_on_the_data_until_january_5th_test_the_model_on_the_data_from_january_6th_and_after" target="_blank" rel="noreferrer noopener nofollow">If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.</a></li>
</ul>
</li>
</ul>



<p>These best practices will serve as the foundation on which you will build your MLOps solutions, with that said we can now dive into the implementation details.</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-how-to-implement-mlops">How to implement MLOps</h2>



<p><a href="https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning" target="_blank" rel="noreferrer noopener nofollow">According to Google</a>, there are three ways you can go about implementing MLOps:</p>



<ul class="wp-block-list">
<li>MLOps level 0 (Manual process)</li>



<li>MLOps level 1 (ML pipeline automation)</li>



<li>MLOps level 2 (CI/CD pipeline automation)</li>
</ul>



<h3 class="wp-block-heading" id="mlops-level-0">MLOps level 0</h3>



<p>This is typical for companies that are just starting out with ML. An entirely manual ML workflow and the data-scientist-driven process might be enough if your models are rarely changed or trained.</p>



<p><strong>Characteristics</strong></p>



<ul class="wp-block-list">
<li><strong>Manual, script-driven, and interactive process: </strong>every step is manual, including data analysis, data preparation, model training, and validation. It requires manual execution of each step and manual transition from one step to another<strong>.</strong></li>



<li><strong>Disconnect between ML and operations: </strong>the process separates data scientists who create the model, and engineers who serve the model as a prediction service. The data scientists hand over a trained model as an artifact for the engineering team to deploy on their API infrastructure.</li>



<li><strong>Infrequent release iterations:</strong> the assumption is that your data science team manages a few models that don&#8217;t change frequently—either changing model implementation or retraining the model with new data. A new model version is deployed only a couple of times per year.</li>



<li><strong>No Continuous Integration (CI):</strong> because few implementation changes are assumed, you ignore CI. Usually, testing the code is part of the notebooks or script execution.</li>



<li><strong>No Continuous Deployment (CD):</strong> because there aren&#8217;t frequent model version deployments, CD isn&#8217;t considered.</li>



<li><strong>Deployment refers to the prediction service </strong>(i.e. a microservice with REST API)</li>



<li><strong>Lack of active performance monitoring:</strong> the process doesn&#8217;t track or log model predictions and actions.</li>
</ul>



<p>The engineering team might have their own complex setup for API configuration, testing, and deployment, including security, regression, and load + canary testing.</p>



<p><strong>Challenges</strong>&nbsp;</p>



<p>In practice, models often break when they’re deployed in the real world. Models fail to adapt to changes in the dynamics of the environment or changes in the data that describes the environment. Forbes has a great article on this: <a href="https://www.forbes.com/sites/forbestechcouncil/2019/04/03/why-machine-learning-models-crash-and-burn-in-production/" target="_blank" rel="noreferrer noopener nofollow">Why Machine Learning Models Crash and Burn in Production.</a></p>



<p>To address the challenges of this manual process, it’s good to use MLOps practices for CI/CD and CT. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline</p>



<h3 class="wp-block-heading" id="mlops-level-1">MLOps level 1</h3>



<p>The goal of MLOps level 1 is to perform continuous training (CT) of the model by automating the ML pipeline. This way, you achieve continuous delivery of model prediction service.&nbsp;</p>



<p>This scenario may be helpful for solutions that operate in a constantly changing environment and need to proactively address shifts in customer behavior, price rates, and other indicators.</p>



<p><strong>Characteristics</strong></p>



<ul class="wp-block-list">
<li><strong>Rapid experiment</strong>: ML experiment steps are orchestrated and done automatically.&nbsp;</li>



<li><strong>CT of the model in production</strong>: the model is automatically trained in production, using fresh data based on live pipeline triggers.</li>



<li><strong>Experimental-operational symmetry</strong>: the pipeline implementation that’s used in the development or experiment environment is used in the preproduction and production environment, which is a key aspect of MLOps practice for unifying DevOps.</li>



<li><strong>Modularized code for components and pipelines:</strong> to construct ML pipelines, components need to be reusable, composable, and potentially shareable across ML pipelines (i.e. using containers).</li>



<li><strong>Continuous delivery of models</strong>: the model deployment step, which serves the trained and validated model as a prediction service for online predictions, is automated.</li>



<li><strong>Pipeline deployment:</strong> in level 0, you deploy a trained model as a prediction service to production. For level 1, you deploy a whole training pipeline, which automatically and recurrently runs to serve the trained model as the prediction service.</li>
</ul>



<p><strong>Additional components</strong></p>



<ul class="wp-block-list">
<li><strong>Data and model validation: </strong>the pipeline expects new, live data to produce a new model version that’s trained on the new data. Therefore, automated data validation and model validation steps are required in the production pipeline.</li>



<li><strong>Feature store: </strong>a feature store is a centralized repository where you standardize the definition, storage, and access of features for training and serving.</li>



<li><strong>Metadata management: </strong>information about each execution of the ML pipeline is recorded in order to help with data and artifacts lineage, reproducibility, and comparisons. It also helps you debug errors and anomalies</li>



<li><strong>ML pipeline triggers</strong>: you can automate ML production pipelines to retrain models with new data, depending on your use case:
<ul class="wp-block-list">
<li>On-demand</li>



<li>On a schedule</li>



<li>On availability of new training data</li>



<li>On model performance degradation</li>



<li>On significant changes in the data distribution (evolving data profiles).</li>
</ul>
</li>
</ul>



<p><strong>Challenges</strong>&nbsp;</p>



<p>This setup is suitable when you deploy new models based on new data, rather than based on new ML ideas.</p>



<p>However, you need to try new ML ideas and rapidly deploy new implementations of the ML components. If you manage many ML pipelines in production, you need a CI/CD setup to automate the build, test, and deployment of ML pipelines.</p>



<h3 class="wp-block-heading" id="mlops-level-2">MLOps level 2</h3>



<p>For a rapid and reliable update of pipelines in production, you need a robust automated CI/CD system. With this automated CI/CD system, your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters.&nbsp;</p>



<p>This level fits tech-driven companies that have to retrain their models daily, if not hourly, update them in minutes, and redeploy on thousands of servers simultaneously. Without an end-to-end MLOps cycle, such organizations just won’t survive.</p>



<p>This MLOps setup includes the following components:</p>



<ul class="wp-block-list">
<li>Source control</li>



<li>Test and build services</li>



<li>Deployment services</li>



<li>Model registry</li>



<li>Feature store</li>



<li>ML metadata store</li>



<li>ML pipeline orchestrator.</li>
</ul>



<p><strong>Characteristics</strong>&nbsp;</p>



<ul class="wp-block-list">
<li><strong>Development and experimentation:</strong> you iteratively try out new ML algorithms and new modeling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps, which are then pushed to a source repository.</li>



<li><strong>Pipeline continuous integration</strong>: you build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage.</li>



<li><strong>Pipeline continuous delivery:</strong> you deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.</li>



<li><strong>Automated triggering:</strong> the pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a newly trained model that is pushed to the model registry.</li>



<li><strong>Model continuous delivery:</strong> you serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.</li>



<li><strong>Monitoring</strong>: you collect statistics on model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.</li>
</ul>



<p>The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process.</p>


    <a
        href="/blog/mlops-principles"
        id="cta-box-related-link-block_eebaeb68dc7044274fecacea264b9c0c"
        class="block-cta-box-related-link  l-margin__top--0 l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
        <div class="block-cta-box-related-link__image-wrapper">
            <figure class="c-image__wrapper">

                
                <img
                    src="https://i0.wp.com/neptune.ai/wp-content/uploads/2023/05/blog_feature_image_051678_7_0_9_0.jpg?fit=200%2C105&amp;ssl=1"
                    loading="lazy"
                    decoding="async"
                    width="200"
                    height="105"
                    class="c-image"
                    alt="">
            </figure>
        </div>

    
    <div class="block-cta-box-related-link__description-wrapper">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-mlops-principles-and-how-to-implement-them">                MLOps Principles and How to Implement Them            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-building-vs-buying-vs-hybrid-mlops-infrastructure">Building vs buying vs hybrid MLOps infrastructure</h2>



<p>Cloud computing companies have invested hundreds of billions of dollars in infrastructure and management.</p>



<p>To give you a bit of context, a <a href="https://www.canalys.com/newsroom/canalys-worldwide-cloud-infrastructure-Q4-2019-and-full-year-2019" target="_blank" rel="noreferrer noopener nofollow">canalys</a> report states that public cloud infrastructure spending reached $77.8 billion in 2018, and it grew to $107 billion in 2019. According to another study by <a href="https://www.idc.com/getdoc.jsp?containerId=prUS45340719" target="_blank" rel="noreferrer noopener nofollow">IDC</a>, with a five-year compound annual growth rate (CAGR) of 22.3%, cloud infrastructure spending is estimated to grow to nearly $500 Billion by 2023.</p>



<p>Spending on cloud infrastructure services reached a record $30 billion in the second quarter of 2020, with Amazon Web Services (AWS), Microsoft, and Google Cloud accounting for half of customer spend.&nbsp;</p>



<p>From a vendor perspective, AWS market share remained at a “long-standing mark” of around 33% during the second quarter of 2020, followed by Microsoft at 18%, and Google Cloud at 9%. Meanwhile, Chinese cloud providers now account for over 12% of the worldwide market, led by Alibaba, Tencent and Baidu.</p>



<p>These companies invest in research &amp; development of specialized hardware, software, and SaaS applications, but also MLOps software. Two great examples come to mind:&nbsp;</p>



<ul class="wp-block-list">
<li>AWS with its Sagemaker, a fully managed end-to-end cloud ML-platform that enables developers to create, train, and deploy machine-learning models in the cloud, embedded systems, and edge-devices.</li>



<li>Google with its recently announced AI Platform Pipelines for building and managing ML pipelines, leveraging TensorFlow Extended (TFX’s) pre-built components and templates that do a lot of model deployment work for you.</li>
</ul>



<p>Now, should you <strong>build or buy</strong> your infrastructure? Maybe you should go <strong>hybrid</strong>?</p>



<p>Tech companies that want to survive long-term usually have in-house teams and build custom solutions. If they have the skills, knowledge, and tools to tackle complex problems, there’s nothing wrong with that approach. But there are other factors that are worth taking into account, like:</p>



<ul class="wp-block-list">
<li>time and effort</li>



<li>human resources</li>



<li>time to profit</li>



<li>opportunity cost.</li>
</ul>


    <a
        href="/blog/first-mlops-system-with-andy-mcmahon"
        id="cta-box-related-link-block_9bdf850a97d2c128c5356063d8fdb82e"
        class="block-cta-box-related-link  l-margin__top--0 l-margin__bottom--standard"
        target="_blank" rel="nofollow noopener noreferrer"    >

    
    <div class="block-cta-box-related-link__description-wrapper block-cta-box-related-link__description-wrapper--full">

        
            <div class="c-eyebrow">

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-related--article.svg"
                    loading="lazy"
                    decoding="async"
                    width="16"
                    height="16"
                    alt=""
                    class="c-eyebrow__icon">

                <div class="c-eyebrow__text">
                    Related post                </div>
            </div>

        
                    <h3 class="c-header" class="c-header" id="h-your-first-mlops-system-what-does-good-look-like-with-andy-mcmahon">                Your First MLOps System: What Does Good Look Like? With Andy McMahon            </h3>        
                    <div class="c-button c-button--tertiary c-button--small">

                <span class="c-button__text">
                    Read more                </span>

                <img
                    src="https://neptune.ai/wp-content/themes/neptune/img/icon-button-arrow-right.svg"
                    loading="lazy"
                    decoding="async"
                    width="12"
                    height="12"
                    alt=""
                    class="c-button__arrow">

            </div>
            </div>

    </a>



<h3 class="wp-block-heading" id="time-and-effort">Time and effort&nbsp;</h3>



<p>According to a survey by <a href="https://cnvrg.io/build-vs-buy-data-science-platform" target="_blank" rel="noreferrer noopener nofollow">cnvrg.io</a>, data scientists often spend their time building solutions to add to their existing infrastructure in order to complete projects. 65% of their time was spent on engineering heavy, <strong>non-data science</strong> tasks such as tracking, monitoring, configuration, compute resource management, serving infrastructure, feature extraction, and model deployment.&nbsp;</p>



<p>This wasted time is often referred to as ‘hidden technical debt’, and is a common bottleneck for machine learning teams. Building an in-house solution, or maintaining an underperforming solution can take from 6 months to 1 year. Even once you’ve built a functioning infrastructure, just to maintain the infrastructure and keep it up-to-date with the latest technology requires lifecycle management and a dedicated team.</p>



<h3 class="wp-block-heading" id="human-resources">Human resources</h3>



<p>Operationalizing machine learning requires a lot of engineering. For a smooth machine learning workflow, each data science team must have an operations team that understands the unique requirements of deploying machine learning models.</p>



<p>Investing in an end-to-end MLOps platform, these processes can be completely automated, making it easier for operations teams to focus on optimizing their infrastructure.</p>



<h3 class="wp-block-heading" id="cost">Cost</h3>



<p>Having a dedicated operations team to manage models can be expensive on its own. If you want to scale your experiments and deployments, you’d need to hire more engineers to manage this process. It’s a major investment, and a slow process to find the right team.&nbsp;</p>



<p>An out-of-the-box MLOps solution is built with scalability in mind, at a fraction of the cost. After calculating all the different costs associated with hiring and onboarding an entire team of engineers, your return on investment drops, which brings us to our next factor.</p>



<h3 class="wp-block-heading" id="time-to-profit">Time to profit</h3>



<p>It can take over a year to build a functioning machine learning infrastructure. It can take even longer to build a data pipeline that can produce value for your organization.&nbsp;</p>



<p>Companies like Uber, Netflix, and Facebook have dedicated years and massive engineering efforts to scale and maintain their machine learning platforms to stay competitive.&nbsp;</p>



<p>For most companies, an investment like this is not possible, and also not necessary. The machine learning landscape has matured since Uber, Netflix and Facebook originally built their in-house solutions.&nbsp;</p>



<p>There are more pre-built solutions that offer all you need out-of-the-box, at a fraction of the cost. For example, cnvrg.io customers can deliver profitable models in less than 1 month. Instead of building all the infrastructure necessary to make their models operational, data scientists can focus on research and experimentation to deliver the best model for their business problem.</p>



<h3 class="wp-block-heading" id="opportunity-cost">Opportunity cost</h3>



<p>As mentioned above, one survey shows that 65% of a data scientist’s time is spent on <strong>non-data science</strong> tasks. Using an MLOps platform automates technical tasks and reduces DevOps bottlenecks.&nbsp;</p>



<p>Data scientists can spend their time doing more of what they were hired to do &#8211; deliver high-impact models &#8211; while the cloud provider takes care of the rest.&nbsp;</p>



<p>Adopting an end-to-end MLOps platform has a considerable competitive advantage that allows your machine learning development to scale massively.</p>



<h3 class="wp-block-heading" id="what-about-hybrid-mlops-infrastructure">What about Hybrid MLOps infrastructure?</h3>



<p>Some companies have been entrusted with private &amp; sensitive data. It can’t leave their servers because in the chance of a small vulnerability, the ripple effect would be catastrophic. This is where <strong>Hybrid</strong> cloud infrastructure for MLOps comes in.</p>



<p>At the moment, cloud infrastructure exists side-by-side with on-premise systems in most cases.</p>



<p>Hybrid cloud management is complex, but often necessary. According to the 2020 Cloud infrastructure report by <a href="https://click.cloudcheckr.com/rs/222-ENM-584/images/CloudCheckr-White-Paper-The-Cloud-Infrastructure-Report-2020.pdf" target="_blank" rel="noreferrer noopener nofollow">Cloudcheckr</a>, today’s infrastructure is a mix of cloud and on-prem.&nbsp;</p>



<p>Cloud infrastructure is increasingly popular, but it’s still rare to find a large company that has completely abandoned on-premise infrastructure (most of them for obvious reasons, like sensitive data).&nbsp;</p>



<p>Another study by <a href="https://resources.flexera.com/web/media/documents/rightscale-2019-state-of-the-cloud-report-from-flexera.pdf" target="_blank" rel="noreferrer noopener nofollow">RightScale</a> shows that Hybrid cloud adoption grew to 58% in 2019 from 51% in 2018. It’s understandable because there’s a wide range of reasons for continuing to keep infrastructure on-prem.</p>



<h3 class="wp-block-heading" id="why-does-your-company-keep-maintaining-on-prem-infrastructure">Why does your company keep maintaining on-prem infrastructure?</h3>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Onprem-infrastructure.jpg?ssl=1" alt="" class="wp-image-34043" style="width:768px;height:385px"/><figcaption class="wp-element-caption"><a href="https://click.cloudcheckr.com/rs/222-ENM-584/images/CloudCheckr-White-Paper-The-Cloud-Infrastructure-Report-2020.pdf" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<h3 class="wp-block-heading" id="managing-hybrid-infrastructure-is-challenging">Managing hybrid infrastructure is challenging</h3>



<p>It’s not a walk in the park to manage any type of enterprise technology infrastructure. There are always issues related to security, performance, availability, cost, and much more.&nbsp;</p>



<p>Hybrid cloud environments add an additional layer of complexity that makes managing IT even more challenging.</p>



<p>The vast majority of cloud stakeholders (96%) face challenges managing both on-prem and cloud infrastructure.&nbsp;</p>



<h3 class="wp-block-heading" id="what-challenges-does-your-company-face-in-managing-both-on-prem-and-cloud-infrastructure">What challenges does your company face in managing both on-prem and cloud infrastructure?</h3>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Onprem-cloud-challenges.jpg?ssl=1" alt="" class="wp-image-34046" style="width:768px;height:416px"/><figcaption class="wp-element-caption"><a href="https://click.cloudcheckr.com/rs/222-ENM-584/images/CloudCheckr-White-Paper-The-Cloud-Infrastructure-Report-2020.pdf" target="_blank" rel="noreferrer noopener nofollow"><em>Source</em></a></figcaption></figure>
</div>


<p>&#8220;Other&#8221; issues reported included the need for a completely different skill set, lack of access to specialized compute and storage. Also, having to shift existing employees roles to dedicate them to manage the on-prem systems and finally dealing with ongoing reliability issues of the same (i.e. Timeout, Data resource missing, Computing resource missing, Software failure, Database failure, Hardware failure, and Network failure).</p>



<p><strong>Building your own platform</strong> and infrastructure will take more and more of your focus and attention as demand increases. The time that could be spent on <strong>model R&amp;D</strong> and <strong>data collection</strong> will be taken by <strong>infrastructure management</strong>. This isn’t great unless it’s part of your core business (if you’re a cloud service provider, PaaS or IaaS).</p>



<p><strong>Buying a fully managed platform</strong> gives you great flexibility and scalability, but then you’re faced with compliance, regulations, and security issues.</p>



<p><strong>Hybrid cloud infrastructure for MLOps</strong> is the best of both worlds, but it poses unique challenges, so it’s up to you to decide if it fits your business model.</p>



<p><strong><em>Note</em></strong><em>: I have a few ideas on possible future directions on securing, streaming, allowing statistical studies on sensitive data, but that’s a different topic for a future article perhaps.&nbsp;</em></p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-conclusion">Conclusion</h2>



<p>Now that you have identified which level your company is at, you can go with one of two MLOps solutions:</p>



<ul class="wp-block-list">
<li>End-to-end</li>



<li>Custom-built MLOps solution (the ecosystem of tools)</li>
</ul>



<h3 class="wp-block-heading" id="end-to-end-mlops-solution">End-to-end MLOps solution&nbsp;</h3>



<p>These are fully managed services that provide developers and data scientists with the ability to build, train, and deploy ML models quickly. The top commercial solutions are:</p>



<ul class="wp-block-list">
<li><a href="https://aws.amazon.com/sagemaker/" target="_blank" rel="noreferrer noopener nofollow"><strong>Amazon Sagemaker</strong></a>, a suite of tools to build, train, deploy, and monitor machine learning models</li>



<li><strong>Microsoft Azure MLOps suite:</strong>
<ul class="wp-block-list">
<li><a href="https://azure.microsoft.com/en-us/services/machine-learning/" target="_blank" rel="noreferrer noopener nofollow">Azure Machine Learning</a> to build, train, and validate reproducible ML pipelines</li>



<li><a href="https://azure.microsoft.com/en-us/services/devops/pipelines/" target="_blank" rel="noreferrer noopener nofollow">Azure Pipelines</a> to automate ML deployments</li>



<li><a href="https://docs.microsoft.com/en-us/azure/azure-monitor/overview" target="_blank" rel="noreferrer noopener nofollow">Azure Monitor</a> to track and analyze metrics</li>



<li><a href="https://azure.microsoft.com/en-us/services/kubernetes-service/" target="_blank" rel="noreferrer noopener nofollow">Azure Kubernetes Services</a> and other additional tools.</li>
</ul>
</li>



<li><strong>Google Cloud MLOps suite:</strong>
<ul class="wp-block-list">
<li><a href="https://cloud.google.com/dataflow" target="_blank" rel="noreferrer noopener nofollow">Dataflow</a> to extract, validate, and transform data as well as to evaluate models</li>



<li><a href="https://cloud.google.com/ai-platform-notebooks" target="_blank" rel="noreferrer noopener nofollow">AI Platform Notebook</a> to develop and train models</li>



<li>Cloud Build to build and test machine learning pipelines</li>



<li><a href="https://www.tensorflow.org/tfx" target="_blank" rel="noreferrer noopener nofollow">TFX</a> to deploy ML pipelines</li>



<li><a href="https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/" target="_blank" rel="noreferrer noopener nofollow">Kubeflow Pipelines</a> to arrange ML deployments on top of <a href="https://cloud.google.com/kubernetes-engine" target="_blank" rel="noreferrer noopener nofollow">Google Kubernetes Engine</a> (GKE).</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading" id="custom-built-mlops-solution-the-ecosystem-of-tools">Custom-built MLOps solution (the ecosystem of tools)</h3>



<p>End-to-end solutions are great, but you can also build your own with your favorite tools, by dividing your MLOps pipeline into multiple microservices.</p>



<p>This approach can help you avoid a <a href="https://en.wikipedia.org/wiki/Single_point_of_failure" target="_blank" rel="noreferrer noopener nofollow">single point of failure</a> (SPOF), and make your pipeline robust — this makes your pipeline easier to audit, debug, and more customizable. In case a microservice provider is having problems, you can easily plug in a new one.&nbsp;</p>



<p>The most recent example of SPOF was the <a href="https://www.theverge.com/2020/11/25/21719396/amazon-web-services-aws-outage-down-internet" target="_blank" rel="noreferrer noopener nofollow">AWS outage</a>, it’s very rare but it can happen. Even Goliath can fall.</p>



<p>Microservices ensure that each service is interconnected instead of embedded together. For example, you can have separate tools for model management and experiment tracking.</p>



<p>Finally, there are many MLOps tools available, I’m just going to mention my top 7 picks with one honorable mention:</p>



<ul class="wp-block-list">
<li><a href="https://jupyter.org/" target="_blank" rel="noreferrer noopener nofollow">Project Jupyter</a>&nbsp;</li>



<li><a href="https://github.com/fastai/nbdev" target="_blank" rel="noreferrer noopener nofollow">Nbdev</a></li>



<li><a href="https://airflow.apache.org/" target="_blank" rel="noreferrer noopener nofollow">Airflow</a></li>



<li><a href="https://www.kubeflow.org/" target="_blank" rel="noreferrer noopener nofollow">Kubeflow</a></li>



<li><a href="https://mlflow.org/" target="_blank" rel="noreferrer noopener nofollow">MLflow</a></li>



<li><a href="https://optuna.org/" target="_blank" rel="noreferrer noopener nofollow">Optuna</a></li>



<li><a href="https://www.cortex.dev/" target="_blank" rel="noreferrer noopener nofollow">Cortex</a>&nbsp;</li>



<li>Honorable mention: <a previewlistener="true" href="/" target="_blank" rel="noreferrer noopener">neptune.ai</a> (for its easy and scalable experiment tracking and compatibility with a lot of tools like Sagemaker and MLflow; if there isn’t an integration guide or pre-built solution, you can use their Python client API to build a custom integration)</li>
</ul>



<p>By leveraging these and many other tools, you can build an end-to-end solution by joining various micro-services together.&nbsp;</p>



<p>For more detailed information on the best MLOps tools available, see <a href="/blog/best-mlops-tools" target="_blank" rel="noreferrer noopener">Best MLOps Tools</a> by Jakub Czakon.</p>



<p>MLOps is a fresh area that’s rapidly developing, with new tools and processes coming out all the time. If you get on the MLOps train now, you’re gaining a huge competitive advantage.</p>



<p>In order to help you do so, below is a ton of references for you to check out and devour. Have fun!</p>



<h3 class="wp-block-heading" id="acknowledgments">Acknowledgments</h3>



<p>Special thanks to my dear friend Richaldo Elias whom I mentioned in the introduction. He always brings up topics or problems that inspire my creativity, and this article wouldn’t have been the same without him sharing some of the issues that he has had while building ML Projects at Scale.&nbsp;</p>



<h2 class="wp-block-heading" class="wp-block-heading" id="h-references">References&nbsp;</h2>



<ul class="wp-block-list">
<li><a href="https://techjury.net/blog/how-much-data-is-created-every-day/#gref" target="_blank" rel="noreferrer noopener nofollow">https://techjury.net/blog/how-much-data-is-created-every-day/#gref</a></li>



<li><a href="https://www.mckinsey.com/~/media/McKinsey/Featured%20Insights/Artificial%20Intelligence/Notes%20from%20the%20frontier%20Modeling%20the%20impact%20of%20AI%20on%20the%20world%20economy/MGI-Notes-from-the-AI-frontier-Modeling-the-impact-of-AI-on-the-world-economy-September-2018.ashx" target="_blank" rel="noreferrer noopener nofollow">NOTES FROM THE AI FRONTIER MODELING THE IMPACT OF AI ON THE WORLD ECONOMY</a></li>



<li><a href="https://link.medium.com/GxMQJqdQvbb" target="_blank" rel="noreferrer noopener nofollow">https://link.medium.com/GxMQJqdQvbb</a></li>



<li><a href="https://github.com/fastai/fastbook/blob/master/01_intro.ipynb" target="_blank" rel="noreferrer noopener nofollow">https://github.com/fastai/fastbook/blob/master/01_intro.ipynb</a></li>
</ul>



<h3 class="wp-block-heading" id="reproducibility">Reproducibility&nbsp;</h3>



<ul class="wp-block-list">
<li><a href="https://arxiv.org/abs/2006.14244" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/2006.14244</a></li>



<li><a href="https://arxiv.org/abs/1408.2123" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/1408.2123</a></li>



<li><a href="https://arxiv.org/abs/2001.10820" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/2001.10820</a></li>



<li><a href="https://arxiv.org/abs/2003.12206" target="_blank" rel="noreferrer noopener nofollow">https://arxiv.org/abs/2003.12206</a></li>
</ul>



<h3 class="wp-block-heading" id="mlops-methods-and-tools">MLOps &#8211; methods and tools&nbsp;</h3>



<ul class="wp-block-list">
<li><a href="/blog/best-open-source-mlops-tools" target="_blank" rel="noreferrer noopener">https://neptune.ai/blog/best-open-source-mlops-tools</a></li>



<li><a href="https://www.datasciencecentral.com/profiles/blogs/mlops-vs-devops-the-similarities-and-differences" target="_blank" rel="noreferrer noopener nofollow">https://www.datasciencecentral.com/profiles/blogs/mlops-vs-devops-the-similarities-and-differences</a></li>



<li><a href="https://www.contino.io/insights/mlops-and-the-machine-learning-lifecycle" target="_blank" rel="noreferrer noopener nofollow">https://www.contino.io/insights/mlops-and-the-machine-learning-lifecycle</a></li>



<li><a href="https://nealanalytics.com/expertise/mlops/" target="_blank" rel="noreferrer noopener nofollow">https://nealanalytics.com/expertise/mlops/</a></li>



<li><a href="/blog/best-mlops-tools" target="_blank" rel="noreferrer noopener">https://neptune.ai/blog/best-mlops-tools</a></li>



<li><a href="https://www.altexsoft.com/blog/mlops-methods-tools/" target="_blank" rel="noreferrer noopener nofollow">https://www.altexsoft.com/blog/mlops-methods-tools/</a></li>



<li><a href="https://towardsdatascience.com/a-simple-mlops-pipeline-on-your-local-machine-db9326addf31" target="_blank" rel="noreferrer noopener nofollow">https://towardsdatascience.com/a-simple-mlops-pipeline-on-your-local-machine-db9326addf31</a> (Recommended for DIY die-hards)</li>



<li><a href="https://towardsdatascience.com/building-a-devops-pipeline-for-machine-learning-and-ai-evaluating-sagemaker-cf7fdd3632e7" target="_blank" rel="noreferrer noopener nofollow">https://towardsdatascience.com/building-a-devops-pipeline-for-machine-learning-and-ai-evaluating-sagemaker-cf7fdd3632e7</a></li>



<li><a href="https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#devops_versus_mlops" target="_blank" rel="noreferrer noopener nofollow">https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#devops_versus_mlops</a></li>
</ul>



<h3 class="wp-block-heading" id="mlops-best-practices">MLOps best practices</h3>



<ul class="wp-block-list">
<li><a href="https://se-ml.github.io/practices/" target="_blank" rel="noreferrer noopener nofollow">Software for ML</a></li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml" target="_blank" rel="noreferrer noopener nofollow">Google’s Rules of ML</a>&nbsp;</li>



<li><strong>Governance Objectives:</strong>
<ul class="wp-block-list">
<li><a href="https://ai.google/responsibilities/responsible-ai-practices" target="_blank" rel="noreferrer noopener nofollow">Google Responsible AI</a></li>



<li><a href="https://www.microsoft.com/en-us/ai/responsible-ai" target="_blank" rel="noreferrer noopener nofollow">Microsoft AI principles</a></li>



<li><a href="https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai" target="_blank" rel="noreferrer noopener nofollow">European Commission High-Level Expert Group &#8211; Ethical guidelines for trustworthy AI</a></li>
</ul>
</li>



<li><a href="https://developers.google.com/machine-learning/guides/rules-of-ml" target="_blank" rel="noreferrer noopener nofollow">https://developers.google.com/machine-learning/guides/rules-of-ml</a></li>
</ul>



<h3 class="wp-block-heading" id="build-vs-buy-vs-hybrid">Build vs Buy vs Hybrid</h3>



<ul class="wp-block-list">
<li><a href="https://click.cloudcheckr.com/rs/222-ENM-584/images/CloudCheckr-White-Paper-The-Cloud-Infrastructure-Report-2020.pdf" target="_blank" rel="noreferrer noopener nofollow">https://click.cloudcheckr.com/rs/222-ENM-584/images/CloudCheckr-White-Paper-The-Cloud-Infrastructure-Report-2020.pdf</a></li>



<li><a href="https://resources.flexera.com/web/media/documents/rightscale-2019-state-of-the-cloud-report-from-flexera.pdf" target="_blank" rel="noreferrer noopener nofollow">https://resources.flexera.com/web/media/documents/rightscale-2019-state-of-the-cloud-report-from-flexera.pdf</a></li>



<li><a href="https://www.canalys.com/newsroom/canalys-battle-for-enterprise-cloud-customers-intensifies-as-spending-grows-42-in-q1-2019" target="_blank" rel="noreferrer noopener nofollow">https://www.canalys.com/newsroom/canalys-battle-for-enterprise-cloud-customers-intensifies-as-spending-grows-42-in-q1-2019</a></li>



<li><a href="https://www.idc.com/getdoc.jsp?containerId=prUS45340719" target="_blank" rel="noreferrer noopener nofollow">https://www.idc.com/getdoc.jsp?containerId=prUS45340719</a></li>



<li><a href="https://www.canalys.com/newsroom/canalys-worldwide-cloud-infrastructure-Q4-2019-and-full-year-2019" target="_blank" rel="noreferrer noopener nofollow">https://www.canalys.com/newsroom/canalys-worldwide-cloud-infrastructure-Q4-2019-and-full-year-2019</a></li>



<li><a href="https://hostingtribunal.com/blog/cloud-computing-statistics/#gref" target="_blank" rel="noreferrer noopener nofollow">https://hostingtribunal.com/blog/cloud-computing-statistics/#gref</a></li>



<li><a href="https://www.zdnet.com/article/record-sums-were-spent-on-cloud-infrastructure-this-year-and-the-bills-will-only-get-bigger/" target="_blank" rel="noreferrer noopener nofollow">https://www.zdnet.com/article/record-sums-were-spent-on-cloud-infrastructure-this-year-and-the-bills-will-only-get-bigger/</a></li>



<li><a href="https://www.cnbc.com/2020/04/20/alibaba-to-invest-28-billion-in-cloud-as-it-battles-amazon-microsoft.html" target="_blank" rel="noreferrer noopener nofollow">https://www.cnbc.com/2020/04/20/alibaba-to-invest-28-billion-in-cloud-as-it-battles-amazon-microsoft.html</a></li>



<li><a href="https://www.idc.com/getdoc.jsp?containerId=prUS46188120" target="_blank" rel="noreferrer noopener nofollow">https://www.idc.com/getdoc.jsp?containerId=prUS46188120</a></li>



<li><a href="https://venturebeat.com/2020/05/01/canalys-cloud-spending-hit-record-31-billion-in-q1-2020-but-growth-continues-to-slow/" target="_blank" rel="noreferrer noopener nofollow">https://venturebeat.com/2020/05/01/canalys-cloud-spending-hit-record-31-billion-in-q1-2020-but-growth-continues-to-slow/</a></li>



<li><a href="https://www.crn.com/news/data-center/why-data-center-spending-will-hit-200b-in-2021-gartner" target="_blank" rel="noreferrer noopener nofollow">https://www.crn.com/news/data-center/why-data-center-spending-will-hit-200b-in-2021-gartner</a></li>



<li><a href="https://cnvrg.io/build-vs-buy-data-science-platform/" target="_blank" rel="noreferrer noopener nofollow">https://cnvrg.io/build-vs-buy-data-science-platform</a></li>
</ul>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">3441</post-id>	</item>
	</channel>
</rss>
