Image analysis in ML pipelines for biology: a maths-and-plots guide

Biological image analysis is where measurement meets models. A microscope image is not “just pixels”: it’s a noisy sampling of an underlying biological scene (cells, nuclei, organelles), transformed by optics, staining, sensor noise, and experimental variability.

This post is a practical, end-to-end guide to building machine learning (ML) image analysis pipelines for biological data. The rule throughout is:

  • If we make a conceptual claim, we’ll back it up with an equation or a plot.

All figures in this post are generated from synthetic “microscopy-like” images so you can follow the ideas without needing a private dataset.


1. What is an image (mathematically)?

For most ML pipelines, an image is a function on a discrete grid:

\[I:\{1,\dots,H\}\times \{1,\dots,W\}\to \mathbb{R}^C,\]

where $H$ and $W$ are height/width and $C$ is the number of channels (e.g. $C=1$ for grayscale, $C=3$ for RGB, or $C>3$ for multiplexed fluorescence).

At the physics level (useful intuition for microscopy), you can think of an observed image as:

\[I = (S * h) + \epsilon,\]

where $S$ is the true scene, $h$ is the point-spread function (PSF), $*$ is convolution, and $\epsilon$ is noise (shot noise + read noise + background).

Figure: synthetic “cells” with blur + noise (a toy microscopy model), alongside the ground-truth object masks.

Synthetic microscopy-like image and ground-truth masks


2. The pipeline: from pixels to biology

Most biological image ML workflows are variations on:

  • Acquire: imaging settings, channels, controls, metadata
  • Preprocess: denoise, normalize, correct illumination, register channels
  • Label (if supervised): segmentation masks, classes, bounding boxes, weak labels
  • Model: classical features or deep nets
  • Evaluate: metrics that match the biology and failure modes
  • Deploy: QC, drift monitoring, uncertainty, batch effects, reproducibility

We’ll go stage-by-stage with maths and visuals.


3. Preprocessing as “making a measurement”

Preprocessing isn’t cosmetic. It changes what your model can learn.

3.1 Illumination correction (flat-field)

A common microscopy artifact is spatially varying illumination. A simple multiplicative model is:

\[I(x,y) = L(x,y)\,S(x,y) + B(x,y) + \epsilon(x,y),\]

where $L$ is illumination, $B$ is background, and $S$ is signal. A basic correction is:

\[\tilde{I}(x,y) = \frac{I(x,y) - \hat{B}(x,y)}{\hat{L}(x,y)}.\]

Plot: a synthetic image with a left-to-right illumination gradient, and the same image after correction.

Flat-field illumination correction on a synthetic microscopy image

3.2 Normalization (why “scale” matters)

Many models are sensitive to intensity scale. A standard choice is per-image z-scoring:

\[I' = \frac{I - \mu}{\sigma},\]

where $\mu$ and $\sigma$ are the image mean and standard deviation (sometimes per-channel).

Plot: intensity histograms before/after normalization.

Intensity distributions before and after normalization


4. Features: classical image analysis (still useful)

Before deep learning, bioimage analysis often meant:

  • segment objects,
  • compute morphology / texture features,
  • run a classifier or clustering step.

This is still valuable when datasets are small, interpretability is crucial, or you want strong baselines.

4.1 Convolution and filtering

A 2D convolution (single-channel) is:

\[(I * K)(x,y) = \sum_{i=-a}^{a}\sum_{j=-b}^{b} I(x-i,y-j)\,K(i,j),\]

where $K$ is a kernel (e.g. blur, edge detector).

Plot: the same synthetic image after blur and after an edge-like filter, showing how kernels turn “biology” into measurable patterns.

Convolution filters: blur vs edge-like response

4.2 Thresholding (a segmentation baseline)

The simplest segmentation is a threshold:

\[\hat{M}(x,y) = \mathbb{I}\big(I(x,y) \ge t\big),\]

where $t$ is a threshold and $\mathbb{I}$ is the indicator function.

Plot: intensity histogram with a threshold $t$, and resulting binary mask.

Thresholding: histogram, threshold, and resulting mask


5. Supervised learning setup (datasets + labels)

Let ${(X_i, Y_i)}_{i=1}^n$ be a dataset of images $X_i$ and labels $Y_i$.

In bioimage pipelines, common label types are:

  • Classification: $Y_i \in {1,\dots,K}$ (e.g. phenotype class)
  • Detection: $Y_i$ is a set of boxes/points (e.g. foci counting)
  • Segmentation: $Y_i\in{0,1}^{H\times W}$ (binary mask) or multi-class masks
  • Regression: $Y_i\in\mathbb{R}$ (e.g. viability score)

Plot: the same raw image supports multiple tasks (classification, counting, segmentation). The “task” is part of the pipeline design.

One image, multiple tasks: classification, counting, segmentation


6. Deep learning for images (the core maths)

6.1 The convolutional layer

For an input with channels $C_\text{in}$ and output channels $C_\text{out}$, a convolutional layer is:

\[z_{k}(x,y) = b_k + \sum_{c=1}^{C_\text{in}} (X_c * W_{k,c})(x,y),\]

followed by a nonlinearity, e.g. ReLU:

\[\mathrm{ReLU}(u)=\max(0,u).\]

Plot: example feature maps from simple learned-like kernels (on synthetic images) to show “edges/blobs/textures” emerge.

Toy feature maps illustrating convolutional representations

6.2 Loss functions that match the task

Classification (cross-entropy) with logits $s\in\mathbb{R}^K$ and true class $y$:

\[\mathcal{L}_\text{CE}(s,y) = -\log\left(\frac{e^{s_y}}{\sum_{k=1}^{K} e^{s_k}}\right).\]

Plot: how cross-entropy changes as the model becomes more/less confident in the true class.

Cross-entropy vs predicted probability of the true class

Segmentation often uses overlap-based losses because pixel imbalance is severe. Two common metrics/losses:

  • Intersection-over-Union (IoU, a.k.a. Jaccard): \(\mathrm{IoU} = \frac{|M \cap \hat{M}|}{|M \cup \hat{M}|}.\)
  • Dice coefficient: \(\mathrm{Dice} = \frac{2|M \cap \hat{M}|}{|M| + |\hat{M}|}.\)

Plot: the same predicted mask compared to truth, annotated with Dice and IoU.

Visualizing Dice and IoU between a ground-truth and predicted mask


7. Data augmentation (maths as invariances)

Augmentation bakes in the idea that certain transformations should not change the label. Write an augmentation as a transformation $T$ sampled from a distribution $\mathcal{T}$. Training minimizes expected loss:

\[\min_\theta \;\mathbb{E}_{(X,Y)}\;\mathbb{E}_{T\sim \mathcal{T}}\big[\mathcal{L}(f_\theta(T(X)), Y)\big].\]

Plot: a synthetic cell image with rotations, flips, intensity jitter, and blur—augmentations that often make sense in microscopy.

Example augmentations for microscopy images


8. Evaluation: metrics that reflect biology

8.1 Classification: ROC and PR curves

For a binary classifier that outputs a score $s(x)$, a threshold $\tau$ induces predictions. Varying $\tau$ traces:

  • ROC: TPR vs FPR
  • PR: Precision vs Recall (often more meaningful for rare events)

Plot: ROC and PR on a synthetic imbalanced dataset.

ROC and PR curves for an imbalanced classification problem

8.2 Segmentation: object-level vs pixel-level

Pixel metrics (Dice/IoU) can look great while biology is wrong (e.g. merged cells). For object-level counts, you often care about:

  • splits (one cell → many masks)
  • merges (many cells → one mask)

Plot: two predictions with similar pixel IoU but different biological correctness (merge vs correct separation).

Segmentation failure modes: merge vs correct instance separation


9. Representation learning for phenotyping (embeddings)

Often the goal isn’t a single label—it’s a phenotypic map of cells. A model can embed an image/crop into a vector $z \in \mathbb{R}^d$. Similar phenotypes should be close in embedding space.

Plot: synthetic “cell phenotypes” mapped into 2D embeddings (PCA-like) showing clustering structure.

Embedding space visualization for phenotypes


10. Uncertainty and QC (because labs drift)

If a classifier outputs probabilities $p_\theta(y\mid x)$, a simple uncertainty proxy is entropy:

\[H(p) = -\sum_{k=1}^{K} p_k \log p_k.\]

High entropy often flags out-of-distribution images, focus issues, staining failures, or new phenotypes.

Plot: examples of confident vs uncertain predictions, plus a histogram of predictive entropy.

Predictive entropy as an uncertainty/QC signal


11. A minimal “starter pipeline” checklist

To build a real biological image analysis pipeline, make sure you can answer (with evidence):

  • What is the task (classification vs segmentation vs counting), and is it aligned with the biology?
  • What are the nuisance variables (batch, plate, microscope, operator, stain intensity)?
  • What is your ground truth—and what are its failure modes (label noise, ambiguity)?
  • What are the metrics that reflect your scientific goal?
  • What is your plan for QC and drift after deployment?

If you want, I can extend this post with a concrete worked example (e.g. nuclei segmentation + per-cell feature extraction + phenotype embedding), or adapt it to your exact modality (confocal, brightfield, H&E, multiplex IF).

Previous: Maths notation as a language: sets, symbols, and what they mean Next: Data leakage in ML for biological and clinical data: what it is and how to avoid it