Training a segmentation model

deepcell-tf leverages Jupyter Notebooks in order to train models. Example notebooks are available for most model architectures in the notebooks folder. Most notebooks are structured similarly to this example and thus this notebook serves as a core reference for the deepcell approach to model training.

[ ]:
import os

import matplotlib.pyplot as plt
import numpy as np
from skimage.feature import peak_local_max
import tensorflow as tf

from deepcell.applications import NuclearSegmentation
from deepcell.image_generators import CroppingDataGenerator
from deepcell.losses import weighted_categorical_crossentropy
from deepcell.model_zoo.panopticnet import PanopticNet
from deepcell.utils.train_utils import count_gpus, rate_scheduler
from deepcell_toolbox.deep_watershed import deep_watershed
from deepcell_toolbox.metrics import Metrics
from deepcell_toolbox.processing import histogram_normalization

File paths

[ ]:
data_dir = '/notebooks/data'
model_path = 'NuclearSegmentation'
metrics_path = 'metrics.yaml'
train_log = 'train_log.csv'

Load the data

The DynamicNuclearNet tracking dataset can be downloaded from https://datasets.deepcell.org/

[ ]:
with np.load(os.path.join(data_dir, 'train.npz')) as data:
    X_train = data['X']
    y_train = data['y']

with np.load(os.path.join(data_dir, 'val.npz')) as data:
    X_val = data['X']
    y_val = data['y']

with np.load(os.path.join(data_dir, 'test.npz')) as data:
    X_test = data['X']
    y_test = data['y']

Training parameters

The majority of DeepCell models support a variety backbone choices specified in the “backbone” parameter. Backbones are provided through keras_applications and can be instantiated with weights that are pretrained on ImageNet.

[ ]:
# Model architecture
backbone = "efficientnetv2bl"
location = True
pyramid_levels = ["P1","P2","P3","P4","P5","P6","P7"]
[ ]:
# Augmentation and transform parameters
seed = 0
min_objects = 1
zoom_min = 0.75
crop_size = 256
outer_erosion_width = 1
inner_distance_alpha = "auto"
inner_distance_beta = 1
inner_erosion_width = 0
[ ]:
# Post processing parameters
maxima_threshold = 0.1
interior_threshold = 0.01
exclude_border = False
small_objects_threshold = 0
min_distance = 10
[ ]:
# Training configuration
epochs = 16
batch_size = 16
lr = 1e-4

Create data generators

[ ]:
# data augmentation parameters
zoom_max = 1 / zoom_min

# Preprocess the data
X_train = histogram_normalization(X_train)
X_val = histogram_normalization(X_val)

# use augmentation for training but not validation
datagen = CroppingDataGenerator(
    rotation_range=180,
    zoom_range=(zoom_min, zoom_max),
    horizontal_flip=True,
    vertical_flip=True,
    crop_size=(crop_size, crop_size),
)

datagen_val = CroppingDataGenerator(
    crop_size=(crop_size, crop_size)
)
[ ]:
transforms = ["inner-distance", "outer-distance", "fgbg"]

transforms_kwargs = {
    "outer-distance": {"erosion_width": outer_erosion_width},
    "inner-distance": {
        "alpha": inner_distance_alpha,
        "beta": inner_distance_beta,
        "erosion_width": inner_erosion_width,
    },
}

train_data = datagen.flow(
    {'X': X_train, 'y': y_train},
    seed=seed,
    min_objects=min_objects,
    transforms=transforms,
    transforms_kwargs=transforms_kwargs,
    batch_size=batch_size,
)

print("Created training data generator.")

val_data = datagen_val.flow(
    {'X': X_val, 'y': y_val},
    seed=seed,
    min_objects=min_objects,
    transforms=transforms,
    transforms_kwargs=transforms_kwargs,
    batch_size=batch_size,
)

print("Created validation data generator.")

Visualize the data generator output.

[ ]:
inputs, outputs = train_data.next()

img = inputs[0]
inner_distance = outputs[0]
outer_distance = outputs[1]
fgbg = outputs[2]

fig, axes = plt.subplots(1, 4, figsize=(15, 15))

axes[0].imshow(img[..., 0])
axes[0].set_title('Source Image')

axes[1].imshow(inner_distance[0, ..., 0])
axes[1].set_title('Inner Distance')

axes[2].imshow(outer_distance[0, ..., 0])
axes[2].set_title('Outer Distance')

axes[3].imshow(fgbg[0, ..., 0])
axes[3].set_title('Foreground/Background')

plt.show()

Create the PanopticNet Model

Here we instantiate a PanopticNet model from deepcell.model_zoo using 3 semantic heads: inner distance (1 class), outer distance (1 class), foreground/background distance (2 classes)

[ ]:
input_shape = (crop_size, crop_size, 1)

model = PanopticNet(
    backbone=backbone,
    input_shape=input_shape,
    norm_method=None,
    num_semantic_classes=[1, 1, 2],  # inner distance, outer distance, fgbg
    location=location,
    include_top=True,
    backbone_levels=["C1", "C2", "C3", "C4", "C5"],
    pyramid_levels=pyramid_levels,
)

Create a loss function for each semantic head

Each semantic head is trained with it’s own loss function. Mean Square Error is used for regression-based heads, whereas weighted_categorical_crossentropy is used for classification heads.

The losses are saved as a dictionary and passed to model.compile.

[ ]:
def semantic_loss(n_classes):
    def _semantic_loss(y_pred, y_true):
        if n_classes > 1:
            return 0.01 * weighted_categorical_crossentropy(
                y_pred, y_true, n_classes=n_classes
            )
        return tf.keras.losses.MSE(y_pred, y_true)

    return _semantic_loss

loss = {}

# Give losses for all of the semantic heads
for layer in model.layers:
    if layer.name.startswith("semantic_"):
        n_classes = layer.output_shape[-1]
        loss[layer.name] = semantic_loss(n_classes)

optimizer = tf.keras.optimizers.Adam(lr=lr, clipnorm=0.001)

model.compile(loss=loss, optimizer=optimizer)

Train the model

Call fit on the compiled model, along with a default set of callbacks.

[ ]:
# Clear clutter from previous TensorFlow graphs.
tf.keras.backend.clear_session()

monitor = "val_loss"

csv_logger = tf.keras.callbacks.CSVLogger(train_log)

# Create callbacks for early stopping and pruning.
callbacks = [
    tf.keras.callbacks.ModelCheckpoint(
        model_path,
        monitor=monitor,
        save_best_only=True,
        verbose=1,
        save_weights_only=False,
    ),
    tf.keras.callbacks.LearningRateScheduler(rate_scheduler(lr=lr, decay=0.99)),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor=monitor,
        factor=0.1,
        patience=5,
        verbose=1,
        mode="auto",
        min_delta=0.0001,
        cooldown=0,
        min_lr=0,
    ),
    tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
    csv_logger,
]

print(f"Training on {count_gpus()} GPUs.")

# Train model.
history = model.fit(
    train_data,
    steps_per_epoch=train_data.y.shape[0] // batch_size,
    epochs=epochs,
    validation_data=val_data,
    validation_steps=val_data.y.shape[0] // batch_size,
    callbacks=callbacks,
)

print("Final", monitor, ":", history.history[monitor][-1])

Save prediction model

We can now create a new prediction model without the foreground background semantic head. While this head is very useful during training, the output is unused during prediction. By using model.load_weights(path, by_name=True), the semantic head can be removed.

[ ]:
with tempfile.TemporaryDirectory() as tmpdirname:
    weights_path = os.path.join(str(tmpdirname), "model_weights.h5")
    model.save_weights(weights_path, save_format="h5")
    prediction_model = PanopticNet(
        backbone=backbone,
        input_shape=input_shape,
        norm_method=None,
        num_semantic_heads=2,
        num_semantic_classes=[1, 1],  # inner distance, outer distance
        location=location,  # should always be true
        include_top=True,
        backbone_levels=["C1", "C2", "C3", "C4", "C5"],
        pyramid_levels=pyramid_levels,
    )
    prediction_model.load_weights(weights_path, by_name=True)

Predict on test data

[ ]:
X_test = histograph_normalization(X_test)

test_images = prediction_model.predict(X_test)
[ ]:
index = np.random.choice(X_test.shape[0])
print(index)

fig, axes = plt.subplots(1, 4, figsize=(20, 20))

masks = deep_watershed(
    test_images,
    radius=radius,
    maxima_threshold=maxima_threshold,
    interior_threshold=interior_threshold,
    exclude_border=exclude_border,
    small_objects_threshold=small_objects_threshold,
    min_distance=min_distance
)

# calculated in the postprocessing above, but useful for visualizing
inner_distance = test_images[0]
outer_distance = test_images[1]

coords = peak_local_max(
    inner_distance[index],
    min_distance=min_distance
)

# raw image with centroid
axes[0].imshow(X_test[index, ..., 0])
axes[0].scatter(coords[..., 1], coords[..., 0],
                color='r', marker='.', s=10)

axes[1].imshow(inner_distance[index, ..., 0], cmap='jet')
axes[2].imshow(outer_distance[index, ..., 0], cmap='jet')
axes[3].imshow(masks[index, ...], cmap='jet')

plt.show()

Evaluate results

The deepcell.metrics package is used to measure advanced metrics for instance segmentation predictions.

[ ]:
outputs = model.predict(X_test)

y_pred = []

for i in range(outputs[0].shape[0]):

    mask = deep_watershed(
        [t[[i]] for t in outputs],
        radius=radius,
        maxima_threshold=maxima_threshold,
        interior_threshold=interior_threshold,
        exclude_border=exclude_border,
        small_objects_threshold=small_objects_threshold,
        min_distance=min_distance)

    y_pred.append(mask[0])

y_pred = np.stack(y_pred, axis=0)
y_pred = np.expand_dims(y_pred, axis=-1)
y_true = y_test.copy()

m = Metrics('DeepWatershed', seg=False)
m.calc_object_stats(y_true, y_pred)

This notebook is part of the deepcell-tf documentation: https://deepcell.readthedocs.io/.

Training a cell tracking model

[ ]:
import os

import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import CSVLogger
from tensorflow_addons.optimizers import RectifiedAdam
import yaml

import deepcell
from deepcell.data.tracking import Track, random_rotate, random_translate, temporal_slice
from deepcell.losses import weighted_categorical_crossentropy
from deepcell.model_zoo.tracking import GNNTrackingModel
from deepcell.utils.tfrecord_utils import get_tracking_dataset, write_tracking_dataset_to_tfr
from deepcell.utils.train_utils import count_gpus, rate_scheduler
from deepcell_toolbox.metrics import Metrics
from deepcell_tracking import CellTracker
from deepcell_tracking.metrics import benchmark_tracking_performance, calculate_summary_stats
from deepcell_tracking.trk_io import load_trks
from deepcell_tracking.utils import get_max_cells, is_valid_lineage

The DynamicNuclearNet tracking dataset can be downloaded from https://datasets.deepcell.org/

[ ]:
# Please change these file paths to match your file system.
data_dir = '/notebooks/data'

inf_model_path = "NuclearTrackingInf"
ne_model_path = "NuclearTrackingNE"
metrics_path = "train-metrics.yaml"
train_log_path = "train_log.csv"

prediction_dir = 'output'
# Check that prediction directory exists and make if needed
if not os.path.exists(prediction_dir):
    os.makedirs(prediction_dir)

Prepare the data for training

Tracked data are stored as .trks files. These files include images and lineage data in np.arrays. To manipulate .trks files, use deepcell_tracking.trk_io.load_trks and deepcell_tracking.trk_io.save_trks.

To facilitate training, we transform each movie’s image and lineage data into a Track object. Tracks help to encapsulate all of the feature creation from the movie, including:

  • Appearances: (num_frames, num_objects, 32, 32, 1)

  • Morphologies: (num_frames, num_objects, 32, 32, 3)

  • Centroids: (num_frames, num_objects, 2)

  • Normalized Adjacency Matrix: (num_frames, num_objects, num_objects, 3)

  • Temporal Adjacency Matrix (comparing across frames): (num_frames - 1, num_objects, num_objects, 3)

Each Track is then saved as a tfrecord file in order to load data from disk during training and reduce the total memory footprint.

[ ]:
appearance_dim = 32
distance_threshold = 64
crop_mode = "resize"
[ ]:
# This cell may take ~20 minutes to run
train_trks = load_trks(os.path.join(data_dir, "train.trks"))
val_trks = load_trks(os.path.join(data_dir, "val.trks"))

max_cells = max([get_max_cells(train_trks["y"]), get_max_cells(val_trks["y"])])

for split, trks in zip({"train", "val"}, [train_trks, val_trks]):
    print(f"Preparing {split} as tf record")

    with tf.device("/cpu:0"):
        tracks = Track(
            tracked_data=trks,
            appearance_dim=appearance_dim,
            distance_threshold=distance_threshold,
            crop_mode=crop_mode,
        )

        write_tracking_dataset_to_tfr(
            tracks, target_max_cells=max_cells, filename=split
        )

Training

Define training parameters

[ ]:
# Model architecture
n_layers = 1  # Number of graph convolution layers
n_filters = 64
encoder_dim = 64
embedding_dim = 64
graph_layer = "gat"
norm_layer = "batch"
[ ]:
# Data and augmentation
seed = 0
track_length = 8  # Number of frames per track object
rotation_range = 180
translation_range = 512
buffer_size = 128
[ ]:
# Training configuration
batch_size = 8
epochs = 50
steps_per_epoch = 1000
validation_steps = 200
lr = 1e-3

Load TFRecord Data

[ ]:
# Augmentation functions
def sample(X, y):
    return temporal_slice(X, y, track_length=track_length)

def rotate(X, y):
    return random_rotate(X, y, rotation_range=rotation_range)

def translate(X, y):
    return random_translate(X, y, range=translation_range)

with tf.device("/cpu:0"):
    train_data = get_tracking_dataset("train")
    train_data = train_data.shuffle(buffer_size, seed=seed).repeat()
    train_data = train_data.map(sample, num_parallel_calls=tf.data.AUTOTUNE)
    train_data = train_data.map(rotate, num_parallel_calls=tf.data.AUTOTUNE)
    train_data = train_data.map(translate, num_parallel_calls=tf.data.AUTOTUNE)
    train_data = train_data.batch(batch_size).prefetch(tf.data.AUTOTUNE)

    val_data = get_tracking_dataset("val")
    val_data = val_data.shuffle(buffer_size, seed=seed).repeat()
    val_data = val_data.map(sample, num_parallel_calls=tf.data.AUTOTUNE)
    val_data = val_data.batch(batch_size).prefetch(tf.data.AUTOTUNE)

max_cells = list(train_data.take(1))[0][0]["appearances"].shape[2]

Initialize the model

[ ]:
def filter_and_flatten(y_true, y_pred):
    n_classes = tf.shape(y_true)[-1]
    new_shape = [-1, n_classes]
    y_true = tf.reshape(y_true, new_shape)
    y_pred = tf.reshape(y_pred, new_shape)

    # Mask out the padded cells
    y_true_reduced = tf.reduce_sum(y_true, axis=-1)
    good_loc = tf.where(y_true_reduced == 1)[:, 0]

    y_true = tf.gather(y_true, good_loc, axis=0)
    y_pred = tf.gather(y_pred, good_loc, axis=0)
    return y_true, y_pred


class Recall(tf.keras.metrics.Recall):
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true, y_pred = filter_and_flatten(y_true, y_pred)
        super().update_state(y_true, y_pred, sample_weight)


class Precision(tf.keras.metrics.Precision):
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true, y_pred = filter_and_flatten(y_true, y_pred)
        super().update_state(y_true, y_pred, sample_weight)


def loss_function(y_true, y_pred):
    y_true, y_pred = filter_and_flatten(y_true, y_pred)
    return weighted_categorical_crossentropy(
        y_true, y_pred, n_classes=tf.shape(y_true)[-1], axis=-1
    )
[ ]:
strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")

with strategy.scope():
    model = GNNTrackingModel(
        max_cells=max_cells,
        graph_layer=graph_layer,
        track_length=track_length,
        n_filters=n_filters,
        embedding_dim=embedding_dim,
        encoder_dim=encoder_dim,
        n_layers=n_layers,
        norm_layer=norm_layer,
    )

    loss = {"temporal_adj_matrices": loss_function}

    optimizer = RectifiedAdam(learning_rate=lr, clipnorm=0.001)

    training_metrics = [
        Recall(class_id=0, name="same_recall"),
        Recall(class_id=1, name="different_recall"),
        Recall(class_id=2, name="daughter_recall"),
        Precision(class_id=0, name="same_precision"),
        Precision(class_id=1, name="different_precision"),
        Precision(class_id=2, name="daughter_precision"),
    ]

    model.training_model.compile(
        loss=loss, optimizer=optimizer, metrics=training_metrics
    )

Train the model

[ ]:
# Clear clutter from previous TensorFlow graphs.
tf.keras.backend.clear_session()

monitor = "val_loss"

csv_logger = CSVLogger(train_log_path)

# Create callbacks for early stopping and pruning.
callbacks = [
    tf.keras.callbacks.LearningRateScheduler(rate_scheduler(lr=lr, decay=0.99)),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor=monitor,
        factor=0.1,
        patience=5,
        verbose=1,
        mode="auto",
        min_delta=0.0001,
        cooldown=0,
        min_lr=0,
    ),
    csv_logger,
]

print(f"Training on {count_gpus()} GPUs.")

# Train model.
history = model.training_model.fit(
    train_data,
    steps_per_epoch=steps_per_epoch,
    epochs=epochs,
    validation_data=val_data,
    validation_steps=validation_steps,
    callbacks=callbacks,
)

print("Final", monitor, ":", history.history[monitor][-1])
[ ]:
# Save models
model.inference_model.save(inf_model_path, include_optimizer=False, overwrite=True)
model.neighborhood_encoder.save(
    ne_model_path, include_optimizer=False, overwrite=True
)
[ ]:
# Record training metrics
all_metrics = {
    "metrics": {"training": {k: float(v[-1]) for k, v in history.history.items()}}
}

# save a metadata.yaml file in the saved model directory
with open(metrics_path, "w") as f:
    yaml.dump(all_metrics, f)

Evaluate model performance

Set tracking parameters and CellTracker

[ ]:
death = 0.99
birth = 0.99
division = 0.01

Load test data

[ ]:
test_data = load_trks(os.path.join(data_dir, "test.trks"))
X_test = test_data["X"]
y_test = test_data["y"]
lineages_test = test_data["lineages"]

# Load metadata array
with np.load(os.path.join(data_dir, "data-source.npz"), allow_pickle=True) as data:
    meta = data["test"]

Predict and benchmark

[ ]:
metrics = {}
exp_metrics = {}
bad_batches = []
for b in range(len(X_test)):
    # currently NOT saving any recall/precision information
    gt_path = os.path.join(prediction_dir, f"{b}-gt.trk")
    res_path = os.path.join(prediction_dir, f"{b}-res.trk")

    # Check that lineage is valid before proceeding
    if not is_valid_lineage(y_test[b], lineages_test[b]):
        bad_batches.append(b)
        continue

    frames = find_frames_with_objects(y_test[b])

    tracker = CellTracker(
        movie=X_test[b][frames],
        annotation=y_test[b][frames],
        track_length=track_length,
        neighborhood_encoder=ne_model,
        tracking_model=inf_model,
        death=death,
        birth=birth,
        division=division,
    )

    try:
        tracker.track_cells()
    except Exception as err:
        print(
            "Failed to track batch {} due to {}: {}".format(
                b, err.__class__.__name__, err
            )
        )
        bad_batches.append(b)
        continue

    tracker.dump(res_path)

    gt = {
        "X": X_test[b][frames],
        "y_tracked": y_test[b][frames],
        "tracks": lineages_test[b],
    }

    tracker.dump(filename=gt_path, track_review_dict=gt)

    results = benchmark_tracking_performance(
        gt_path, res_path, threshold=iou_thresh
    )

    exp = meta[b, 1]  # Grab the experiment column from metadata
    tmp_exp = exp_metrics.get(exp, {})

    for k in results:
        if k in metrics:
            metrics[k] += results[k]
        else:
            metrics[k] = results[k]

        if k in tmp_exp:
            tmp_exp[k] += results[k]
        else:
            tmp_exp[k] = results[k]

    exp_metrics[exp] = tmp_exp
[ ]:
# Calculate summary stats for each set of metrics
tmp_metrics = metrics.copy()
del tmp_metrics["mismatch_division"]
summary = calculate_summary_stats(**tmp_metrics, n_digits=3)
metrics = {**metrics, **summary}

for exp, m in exp_metrics.items():
    tmp_m = m.copy()
    del tmp_m["mismatch_division"]
    summary = calculate_summary_stats(**tmp_m, n_digits=3)
    exp_metrics[exp] = {**m, **summary}

# save a metadata.yaml file in the saved model directory
with open(metrics_path, "w") as f:
    yaml.dump(all_metrics, f)
[ ]:

deepcell API

deepcell.applications

Application

class deepcell.applications.application.Application(model, model_image_shape=(128, 128, 1), model_mpp=0.65, preprocessing_fn=None, postprocessing_fn=None, format_model_output_fn=None, dataset_metadata=None, model_metadata=None)[source]

Bases: object

Application object that takes a model with weights and manages predictions

Parameters:
  • model (tensorflow.keras.Model) – tf.keras.Model with loaded weights.

  • model_image_shape (tuple) – Shape of input expected by model.

  • dataset_metadata (str or dict) – Metadata for the data that model was trained on.

  • model_metadata (str or dict) – Training metadata for model.

  • model_mpp (float) – Microns per pixel resolution of the training data used for model.

  • preprocessing_fn (function) – Pre-processing function to apply to data prior to prediction.

  • postprocessing_fn (function) – Post-processing function to apply to data after prediction. Must accept an input of a list of arrays and then return a single array.

  • format_model_output_fn (function) – Convert model output from a list of matrices to a dictionary with keys for each semantic head.

Raises:
  • ValueErrorpreprocessing_fn must be a callable function

  • ValueErrorpostprocessing_fn must be a callable function

  • ValueErrormodel_output_fn must be a callable function

_batch_predict(tiles, batch_size)[source]

Batch process tiles to generate model predictions.

The built-in keras.predict function has support for batching, but loads the entire image stack into GPU memory, which is prohibitive for large images. This function uses similar code to the underlying model.predict function without soaking up GPU memory.

Parameters:
  • tiles (numpy.array) – Tiled data which will be fed to model

  • batch_size (int) – Number of images to predict on per batch

Returns:

Model outputs

Return type:

list

_format_model_output(output_images)[source]

Applies formatting function the output from the model if one was provided. Otherwise, returns the unmodified model output.

Parameters:

output_images – stack of untiled images to be reformatted

Returns:

reformatted images stored as a dict, or input images stored as list if no formatting function is specified.

Return type:

dict or list

_postprocess(image, **kwargs)[source]

Applies postprocessing function to image if one has been defined. Otherwise returns unmodified image.

Parameters:

image (numpy.array or list) – Input to postprocessing function either an numpy.array or list of numpy.arrays.

Returns:

labeled image

Return type:

numpy.array

_predict_segmentation(image, batch_size=4, image_mpp=None, pad_mode='constant', preprocess_kwargs={}, postprocess_kwargs={})[source]

Generates a labeled image of the input running prediction with appropriate pre and post processing functions.

Input images are required to have 4 dimensions [batch, x, y, channel]. Additional empty dimensions can be added using np.expand_dims.

Parameters:
  • image (numpy.array) – Input image with shape [batch, x, y, channel].

  • batch_size (int) – Number of images to predict on per batch.

  • image_mpp (float) – Microns per pixel for image.

  • pad_mode (str) – The padding mode, one of “constant” or “reflect”.

  • preprocess_kwargs (dict) – Keyword arguments to pass to the pre-processing function.

  • postprocess_kwargs (dict) – Keyword arguments to pass to the post-processing function.

Raises:
  • ValueError – Input data must match required rank, calculated as one dimension more (batch dimension) than expected by the model.

  • ValueError – Input data must match required number of channels.

Returns:

Labeled image

Return type:

numpy.array

_preprocess(image, **kwargs)[source]

Preprocess image if preprocessing_fn is defined. Otherwise return image unmodified.

Parameters:
  • image (numpy.array) – 4D stack of images

  • kwargs (dict) – Keyword arguments for preprocessing_fn.

Returns:

The pre-processed image.

Return type:

numpy.array

_resize_input(image, image_mpp)[source]

Checks if there is a difference between image and model resolution and resizes if they are different. Otherwise returns the unmodified image.

Parameters:
  • image (numpy.array) – Input image to resize.

  • image_mpp (float) – Microns per pixel for the image.

Returns:

Input image resized if necessary to match model_mpp

Return type:

numpy.array

_resize_output(image, original_shape)[source]

Rescales input if the shape does not match the original shape excluding the batch and channel dimensions.

Parameters:
  • image (numpy.array) – Image to be rescaled to original shape

  • original_shape (tuple) – Shape of the original input image

Returns:

Rescaled image

Return type:

numpy.array

_run_model(image, batch_size=4, pad_mode='constant', preprocess_kwargs={})[source]

Run the model to generate output probabilities on the data.

Parameters:
  • image (numpy.array) – Image with shape [batch, x, y, channel]

  • batch_size (int) – Number of images to predict on per batch.

  • pad_mode (str) – The padding mode, one of “constant” or “reflect”.

  • preprocess_kwargs (dict) – Keyword arguments to pass to the preprocessing function.

Returns:

Model outputs

Return type:

numpy.array

_tile_input(image, pad_mode='constant')[source]

Tile the input image to match shape expected by model using the deepcell_toolbox function.

Only supports 4D images.

Parameters:
  • image (numpy.array) – Input image to tile

  • pad_mode (str) – The padding mode, one of “constant” or “reflect”.

Raises:

ValueError – Input images must have only 4 dimensions

Returns:

Tuple of tiled image and dict of tiling information.

Return type:

(numpy.array, dict)

_untile_output(output_tiles, tiles_info)[source]

Untiles either a single array or a list of arrays according to a dictionary of tiling specs

Parameters:
  • output_tiles (numpy.array or list) – Array or list of arrays.

  • tiles_info (dict) – Tiling specs output by the tiling function.

Returns:

Array or list according to input with untiled images

Return type:

numpy.array or list

predict(x)[source]

CytoplasmSegmentation

class deepcell.applications.cytoplasm_segmentation.CytoplasmSegmentation(model=None, preprocessing_fn=deepcell_toolbox.processing.histogram_normalization, postprocessing_fn=deepcell_toolbox.deep_watershed.deep_watershed)[source]

Bases: Application

Loads a deepcell.model_zoo.panopticnet.PanopticNet model for cytoplasm segmentation with pretrained weights.

The predict method handles prep and post processing steps to return a labeled image.

Example:

from skimage.io import imread
from deepcell.applications import CytoplasmSegmentation

# Load the image
im = imread('HeLa_cytoplasm.png')

# Expand image dimensions to rank 4
im = np.expand_dims(im, axis=-1)
im = np.expand_dims(im, axis=0)

# Create the application
app = CytoplasmSegmentation()

# create the lab
labeled_image = app.predict(image)
Parameters:

model (tf.keras.Model) – The model to load. If None, a pre-trained model will be downloaded.

dataset_metadata = {'name': 'general_cyto', 'other': 'Pooled phase and fluorescent cytoplasm data - computationally curated'}

Metadata for the dataset used to train the model

model_metadata = {'batch_size': 16, 'lr': 0.0001, 'lr_decay': 0.9, 'n_epochs': 8, 'training_seed': 0, 'training_steps_per_epoch': 3949, 'validation_steps_per_epoch': 986}

Metadata for the model and training process

predict(image, batch_size=4, image_mpp=None, pad_mode='reflect', preprocess_kwargs=None, postprocess_kwargs=None)[source]

Generates a labeled image of the input running prediction with appropriate pre and post processing functions.

Input images are required to have 4 dimensions [batch, x, y, channel].

Additional empty dimensions can be added using np.expand_dims.

Parameters:
  • image (numpy.array) – Input image with shape [batch, x, y, channel].

  • batch_size (int) – Number of images to predict on per batch.

  • image_mpp (float) – Microns per pixel for image.

  • pad_mode (str) – The padding mode, one of “constant” or “reflect”.

  • preprocess_kwargs (dict) – Keyword arguments to pass to the pre-processing function.

  • postprocess_kwargs (dict) – Keyword arguments to pass to the post-processing function.

Raises:
  • ValueError – Input data must match required rank of the application, calculated as one dimension more (batch dimension) than expected by the model.

  • ValueError – Input data must match required number of channels.

Returns:

Labeled image

Return type:

numpy.array

NuclearSegmentation

class deepcell.applications.nuclear_segmentation.NuclearSegmentation(model=None, preprocessing_fn=deepcell_toolbox.processing.histogram_normalization, postprocessing_fn=deepcell_toolbox.deep_watershed.deep_watershed)[source]

Bases: Application

Loads a deepcell.model_zoo.panopticnet.PanopticNet model for nuclear segmentation with pretrained weights.

The predict method handles prep and post processing steps to return a labeled image.

Example:

from skimage.io import imread
from deepcell.applications import NuclearSegmentation

# Load the image
im = imread('HeLa_nuclear.png')

# Expand image dimensions to rank 4
im = np.expand_dims(im, axis=-1)
im = np.expand_dims(im, axis=0)

# Create the application
app = NuclearSegmentation()

# create the lab
labeled_image = app.predict(image)
Parameters:

model (tf.keras.Model) – The model to load. If None, a pre-trained model will be downloaded.

dataset_metadata = {'name': 'general_nuclear_train_large', 'other': 'Pooled nuclear data from HEK293, HeLa-S3, NIH-3T3, and RAW264.7 cells.'}

Metadata for the dataset used to train the model

model_metadata = {'backbone': 'efficientnetv2bl', 'batch_size': 16, 'crop_size': 256, 'epochs': 16, 'location': True, 'lr': 0.0001, 'min_objects': 1, 'pyramid_levels': 'P1-P2-P3-P4-P5-P6-P7', 'zoom_min': 0.75}

Metadata for the model and training process

predict(image, batch_size=4, image_mpp=None, pad_mode='reflect', preprocess_kwargs=None, postprocess_kwargs=None)[source]

Generates a labeled image of the input running prediction with appropriate pre and post processing functions.

Input images are required to have 4 dimensions [batch, x, y, channel].

Additional empty dimensions can be added using np.expand_dims.

Parameters:
  • image (numpy.array) – Input image with shape [batch, x, y, channel].

  • batch_size (int) – Number of images to predict on per batch.

  • image_mpp (float) – Microns per pixel for image.

  • pad_mode (str) – The padding mode, one of “constant” or “reflect”.

  • preprocess_kwargs (dict) – Keyword arguments to pass to the pre-processing function.

  • postprocess_kwargs (dict) – Keyword arguments to pass to the post-processing function.

Raises:
  • ValueError – Input data must match required rank of the application, calculated as one dimension more (batch dimension) than expected by the model.

  • ValueError – Input data must match required number of channels.

Returns:

Labeled image

Return type:

numpy.array

Mesmer

class deepcell.applications.mesmer.Mesmer(model=None)[source]

Bases: Application

Loads a deepcell.model_zoo.panopticnet.PanopticNet model for tissue segmentation with pretrained weights.

The predict method handles prep and post processing steps to return a labeled image.

Example:

from skimage.io import imread
from deepcell.applications import Mesmer

# Load the images
im1 = imread('TNBC_DNA.tiff')
im2 = imread('TNBC_Membrane.tiff')

# Combined together and expand to 4D
im = np.stack((im1, im2), axis=-1)
im = np.expand_dims(im,0)

# Create the application
app = Mesmer()

# create the lab
labeled_image = app.predict(image)
Parameters:

model (tf.keras.Model) – The model to load. If None, a pre-trained model will be downloaded.

dataset_metadata = {'name': '20200315_IF_Training_6.npz', 'other': 'Pooled whole-cell data across tissue types'}

Metadata for the dataset used to train the model

model_metadata = {'batch_size': 1, 'lr': 1e-05, 'lr_decay': 0.99, 'n_epochs': 30, 'training_seed': 0, 'training_steps_per_epoch': 1739, 'validation_steps_per_epoch': 193}

Metadata for the model and training process

predict(image, batch_size=4, image_mpp=None, preprocess_kwargs={}, compartment='whole-cell', pad_mode='constant', postprocess_kwargs_whole_cell={}, postprocess_kwargs_nuclear={})[source]

Generates a labeled image of the input running prediction with appropriate pre and post processing functions.

Input images are required to have 4 dimensions [batch, x, y, channel]. Additional empty dimensions can be added using np.expand_dims.

Parameters:
  • image (numpy.array) – Input image with shape [batch, x, y, channel].

  • batch_size (int) – Number of images to predict on per batch.

  • image_mpp (float) – Microns per pixel for image.

  • compartment (str) – Specify type of segmentation to predict. Must be one of "whole-cell", "nuclear", "both".

  • preprocess_kwargs (dict) – Keyword arguments to pass to the pre-processing function.

  • postprocess_kwargs (dict) – Keyword arguments to pass to the post-processing function.

Raises:
  • ValueError – Input data must match required rank of the application, calculated as one dimension more (batch dimension) than expected by the model.

  • ValueError – Input data must match required number of channels.

Returns:

Instance segmentation mask.

Return type:

numpy.array

CellTracking

class deepcell.applications.cell_tracking.CellTracking(model=None, neighborhood_encoder=None, distance_threshold=64, appearance_dim=32, birth=0.99, death=0.99, division=0.01, track_length=8, embedding_axis=0, crop_mode='resize', norm=True)[source]

Bases: Application

Loads a deepcell.model_zoo.tracking.GNNTrackingModel model for object tracking with pretrained weights using a simple predict interface.

Parameters:
  • model (tf.keras.model) – Tracking inference model, defaults to latest published model

  • neighborhood_encoder (tf.keras.model) – Tracking neighborhood encoder, defaults to latest published model

  • distance_threshold (int) – Maximum distance between two cells to be considered adjacent

  • appearance_dim (int) – Length of appearance dimension

  • birth (float) – Cost of new cell in linear assignment matrix.

  • death (float) – Cost of cell death in linear assignment matrix.

  • division (float) – Cost of cell division in linear assignment matrix.

  • track_length (int) – Number of frames per track

  • crop_mode (str) – Type of cropping around each cell

  • norm (str) – Type of normalization layer

dataset_metadata = {'name': 'tracked_nuclear_train_large', 'other': 'Pooled tracked nuclear data from HEK293, HeLa-S3, NIH-3T3, and RAW264.7 cells.'}

Metadata for the dataset used to train the model

model_metadata = {'appearance_dim': 32, 'batch_size': 8, 'buffer_size': 128, 'crop_mode': 'resize', 'data_fraction': 1, 'distance_threshold': 64, 'embedding_dim': 64, 'encoder_dim': 64, 'epochs': 50, 'graph_layer': 'gat', 'lr': 0.001, 'n_filters': 64, 'n_layers': 1, 'norm_layer': 'batch', 'rotation_range': 180, 'steps_per_epoch': 1000, 'translation_range': 512, 'validation_steps': 200}

Metadata for the model and training process

predict(image, labels, **kwargs)[source]

Using both raw image data and segmentation masks, track objects across all frames.

Parameters:
  • image (numpy.array) – Raw image data.

  • labels (numpy.array) – Labels for image, integer masks.

Returns:

Tracked labels and lineage information.

Return type:

dict

track(image, labels, **kwargs)[source]

Wrapper around predict() for convenience.

LabelDetectionModel

deepcell.applications.label_detection.LabelDetectionModel(input_shape=(None, None, 1), inputs=None, backbone='mobilenetv2', num_classes=3)[source]

Classify a microscopy image as Nuclear, Cytoplasm, or Phase.

This can be helpful in determining the type of data (nuclear, cytoplasm, etc.) so that this data can be forwared to the correct segmenation model.

Based on a standard backbone with an intiial ImageNormalization2D and final AveragePooling2D, TensorProduct, and Softmax layers.

Parameters:
  • input_shape (tuple) – a 3-length tuple of the input data shape.

  • inputs (tensorflow.keras.Layer) – Optional input layer of the model. If not provided, creates a Layer based on input_shape.

  • backbone (str) – name of the backbone to use for the model.

  • num_classes (int) – The number of labels to detect.

ScaleDetectionModel

deepcell.applications.scale_detection.ScaleDetectionModel(input_shape=(None, None, 1), inputs=None, backbone='mobilenetv2')[source]

Create a ScaleDetectionModel for detecting scales of input data.

This enables data to be scaled appropriately for other segmentation models which may not be resolution tolerant.

Based on a standard backbone with an intiial ImageNormalization2D and final AveragePooling2D and TensorProduct layers.

Parameters:
  • input_shape (tuple) – a 3-length tuple of the input data shape.

  • inputs (tensorflow.keras.Layer) – Optional input layer of the model. If not provided, creates a Layer based on input_shape.

  • backbone (str) – name of the backbone to use for the model.

deepcell.datasets

deepcell.datasets.dynamic_nuclear_net

deepcell.datasets.tissue_net

deepcell.datasets.spot_net

Module contents

deepcell.image_generators

fully_convolutional

sample

scale

tracking

deepcell.layers

Custom Layers

location

Layers to encode location data

class deepcell.layers.location.Location2D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Location Layer for 2D cartesian coordinate locations.

Parameters:

data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]
class deepcell.layers.location.Location3D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Location Layer for 3D cartesian coordinate locations.

Parameters:

data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]

normalization

Layers to noramlize input images for 2D and 3D images

class deepcell.layers.normalization.ImageNormalization2D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Image Normalization layer for 2D data.

Parameters:
  • norm_method (str) – Normalization method to use, one of: “std”, “max”, “whole_image”, None.

  • filter_size (int) – The length of the convolution window.

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

  • activation (function) – Activation function to use. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

  • use_bias (bool) – Whether the layer uses a bias.

  • kernel_initializer (function) – Initializer for the kernel weights matrix, used for the linear transformation of the inputs.

  • bias_initializer (function) – Initializer for the bias vector. If None, the default initializer will be used.

  • kernel_regularizer (function) – Regularizer function applied to the kernel weights matrix.

  • bias_regularizer (function) – Regularizer function applied to the bias vector.

  • activity_regularizer (function) – Regularizer function applied to.

  • kernel_constraint (function) – Constraint function applied to the kernel weights matrix.

  • bias_constraint (function) – Constraint function applied to the bias vector.

build(input_shape)[source]
call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]
class deepcell.layers.normalization.ImageNormalization3D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Image Normalization layer for 3D data.

Parameters:
  • norm_method (str) – Normalization method to use, one of: “std”, “max”, “whole_image”, None.

  • filter_size (int) – The length of the convolution window.

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

  • activation (function) – Activation function to use. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

  • use_bias (bool) – Whether the layer uses a bias.

  • kernel_initializer (function) – Initializer for the kernel weights matrix, used for the linear transformation of the inputs.

  • bias_initializer (function) – Initializer for the bias vector. If None, the default initializer will be used.

  • kernel_regularizer (function) – Regularizer function applied to the kernel weights matrix.

  • bias_regularizer (function) – Regularizer function applied to the bias vector.

  • activity_regularizer (function) – Regularizer function applied to.

  • kernel_constraint (function) – Constraint function applied to the kernel weights matrix.

  • bias_constraint (function) – Constraint function applied to the bias vector.

build(input_shape)[source]
call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]

padding

Layers for padding for 2D and 3D images

class deepcell.layers.padding.ReflectionPadding2D(*args: Any, **kwargs: Any)[source]

Bases: ZeroPadding2D

Reflection-padding layer for 2D input (e.g. picture).

This layer can add rows and columns of reflected values at the top, bottom, left and right side of an image tensor.

Parameters:
  • padding (int, tuple) – If int, the same symmetric padding is applied to height and width. If tuple of 2 ints, interpreted as two different symmetric padding values for height and width: (symmetric_height_pad, symmetric_width_pad). If tuple of 2 tuples of 2 ints, interpreted as ((top_pad, bottom_pad), (left_pad, right_pad)).

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]
class deepcell.layers.padding.ReflectionPadding3D(*args: Any, **kwargs: Any)[source]

Bases: ZeroPadding3D

Reflection-padding layer for 3D data (spatial or spatio-temporal).

Parameters:
  • padding (int, tuple) – The pad-width to add in each dimension. If an int, the same symmetric padding is applied to height and width. If a tuple of 3 ints, interpreted as two different symmetric padding values for height and width: (symmetric_dim1_pad, symmetric_dim2_pad, symmetric_dim3_pad). If tuple of 3 tuples of 2 ints, interpreted as ((left_dim1_pad, right_dim1_pad), (left_dim2_pad, right_dim2_pad), (left_dim3_pad, right_dim3_pad))

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]

pooling

Layers to encode location data

class deepcell.layers.pooling.DilatedMaxPool2D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Dilated max pooling layer for 2D inputs (e.g. images).

Parameters:
  • pool_size (int) – An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to specify the same value for all spatial dimensions.

  • strides (int) – An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.

  • dilation_rate (int) – An integer or tuple/list of 2 integers, specifying the dilation rate for the pooling.

  • padding (str) – The padding method, either "valid" or "same" (case-insensitive).

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]
class deepcell.layers.pooling.DilatedMaxPool3D(*args: Any, **kwargs: Any)[source]

Bases: Layer

Dilated max pooling layer for 3D inputs.

Parameters:
  • pool_size (int) – An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to specify the same value for all spatial dimensions.

  • strides (int) – An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.

  • dilation_rate (int) – An integer or tuple/list of 2 integers, specifying the dilation rate for the pooling.

  • padding (str) – The padding method, either "valid" or "same" (case-insensitive).

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]

tensor_product

Layers to generate tensor products for 2D and 3D data

class deepcell.layers.tensor_product.TensorProduct(*args: Any, **kwargs: Any)[source]

Bases: Layer

Just your regular densely-connected NN layer.

Dense implements the operation:

output = activation(dot(input, kernel) + bias)

where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

Parameters:
  • output_dim (int) – Positive integer, dimensionality of the output space.

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

  • activation (function) – Activation function to use. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).

  • use_bias (bool) – Whether the layer uses a bias.

  • kernel_initializer (function) – Initializer for the kernel weights matrix, used for the linear transformation of the inputs.

  • bias_initializer (function) – Initializer for the bias vector. If None, the default initializer will be used.

  • kernel_regularizer (function) – Regularizer function applied to the kernel weights matrix.

  • bias_regularizer (function) – Regularizer function applied to the bias vector.

  • activity_regularizer (function) – Regularizer function applied to.

  • kernel_constraint (function) – Constraint function applied to the kernel weights matrix.

  • bias_constraint (function) – Constraint function applied to the bias vector.

Input shape:

nD tensor with shape: (batch_size, …, input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).

Output shape:

nD tensor with shape: (batch_size, …, output_dim). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, output_dim).

build(input_shape)[source]
call(inputs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]

upsample

Upsampling layers

class deepcell.layers.upsample.UpsampleLike(*args: Any, **kwargs: Any)[source]

Bases: Layer

Layer for upsampling a Tensor to be the same shape as another Tensor.

Adapted from https://github.com/fizyr/keras-retinanet.

Parameters:

data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

call(inputs, **kwargs)[source]
compute_output_shape(input_shape)[source]
get_config()[source]
resize_volumes(volume, size)[source]

deepcell.losses

deepcell.metrics

deepcell.model_zoo

FeatureNet

PanOpticNet

FPN

Tracking

deepcell.running

deepcell.tracking

deepcell.training

deepcell.utils

Deepcell Utilities Module

backbone_utils

Functions for creating model backbones

deepcell.utils.backbone_utils.featurenet_3D_backbone(input_tensor=None, input_shape=None, n_filters=32, **kwargs)[source]

Construct the deepcell backbone with five convolutional units

Parameters:
  • input_tensor (tensor) – Input tensor to specify input size

  • n_filters (int) – Number of filters for convolutional layers

Returns:

List of backbone layers, list of backbone names

Return type:

tuple

deepcell.utils.backbone_utils.featurenet_3D_block(x, n_filters)[source]

Add a set of layers that make up one unit of the featurenet 3D backbone

Parameters:
  • x (tensorflow.keras.Layer) – Keras layer object to pass to backbone unit

  • n_filters (int) – Number of filters to use for convolutional layers

Returns:

Keras layer object

Return type:

tensorflow.keras.Layer

deepcell.utils.backbone_utils.featurenet_backbone(input_tensor=None, input_shape=None, n_filters=32, **kwargs)[source]

Construct the deepcell backbone with five convolutional units

Parameters:
  • input_tensor (tensor) – Input tensor to specify input size

  • n_filters (int) – Number of filters for convolutional layers

Returns:

List of backbone layers, list of backbone names

Return type:

tuple

deepcell.utils.backbone_utils.featurenet_block(x, n_filters)[source]

Add a set of layers that make up one unit of the featurenet backbone

Parameters:
  • x (tensorflow.keras.Layer) – Keras layer object to pass to backbone unit

  • n_filters (int) – Number of filters to use for convolutional layers

Returns:

Keras layer object

Return type:

tensorflow.keras.Layer

deepcell.utils.backbone_utils.get_backbone(backbone, input_tensor=None, input_shape=None, use_imagenet=False, return_dict=True, frames_per_batch=1, **kwargs)[source]

Retrieve backbones for the construction of feature pyramid networks.

Parameters:
  • backbone (str) – Name of the backbone to be retrieved.

  • input_tensor (tensor) – The input tensor for the backbone. Should have channel dimension of size 3

  • use_imagenet (bool) – Load pre-trained weights for the backbone

  • return_dict (bool) – Whether to return a dictionary of backbone layers, e.g. {'C1': C1, 'C2': C2, 'C3': C3, 'C4': C4, 'C5': C5}. If false, the whole model is returned instead

  • kwargs (dict) – Keyword dictionary for backbone constructions. Relevant keys include 'include_top', 'weights' (should be None), 'input_shape', and 'pooling'.

Returns:

An instantiated backbone

Return type:

tensorflow.keras.Model

Raises:

data_utils

Functions for making training data

deepcell.utils.data_utils.get_data(file_name, mode='sample', test_size=0.2, seed=0)[source]

Load data from NPZ file and split into train and test sets

Parameters:
  • file_name (str) – path to NPZ file to load

  • mode (str) – if ‘siamese_daughters’, returns lineage information from .trk file otherwise, returns the same data that was loaded.

  • test_size (float) – percent of data to leave as testing holdout

  • seed (int) – seed number for random train/test split repeatability

Returns:

dict of training data, and a dict of testing data

Return type:

(dict, dict)

deepcell.utils.data_utils.get_max_sample_num_list(y, edge_feature, output_mode='sample', padding='valid', window_size_x=30, window_size_y=30)[source]

For each set of images and each feature, find the maximum number of samples for to be used. This will be used to balance class sampling.

Parameters:
  • y (numpy.array) – mask to indicate which pixels belong to which class

  • edge_feature (list) – [1, 0, 0], the 1 indicates the feature is the cell edge

  • output_mode (str) – ‘sample’ or ‘conv’

  • padding (str) – ‘valid’ or ‘same’

Returns:

list of maximum sample size for all classes

Return type:

list

deepcell.utils.data_utils.relabel_movie(y)[source]

Relabels unique instance IDs to be from 1 to N

Parameters:

y (numpy.array) – tensor of integer labels

Returns:

relabeled tensor with sequential labels

Return type:

numpy.array

deepcell.utils.data_utils.reshape_matrix(X, y, reshape_size=256)[source]

Reshape matrix of dimension 4 to have x and y of size reshape_size. Adds overlapping slices to batches. E.g. reshape_size of 256 yields (1, 1024, 1024, 1) -> (16, 256, 256, 1) The input image is divided into subimages of side length reshape_size, with the last row and column of subimages overlapping the one before the last if the original image side lengths are not divisible by reshape_size.

Parameters:
  • X (numpy.array) – raw 4D image tensor

  • y (numpy.array) – label mask of 4D image data

  • reshape_size (int, list) – size of the output tensor If input is int, output images are square with side length equal reshape_size. If it is a list of 2 ints, then the output images size is reshape_size[0] x reshape_size[1]

Returns:

reshaped X and y 4D tensors in shape[1:3] = (reshape_size, reshape_size), if reshape_size is an int, and shape[1:3] = reshape_size, if reshape_size is a list of length 2

Return type:

numpy.array

Raises:
deepcell.utils.data_utils.reshape_movie(X, y, reshape_size=256)[source]

Reshape tensor of dimension 5 to have x and y of size reshape_size. Adds overlapping slices to batches. E.g. reshape_size of 256 yields (1, 5, 1024, 1024, 1) -> (16, 5, 256, 256, 1)

Parameters:
  • X (numpy.array) – raw 5D image tensor

  • y (numpy.array) – label mask of 5D image tensor

  • reshape_size (int) – size of the square output tensor

Returns:

reshaped X and y tensors in shape (reshape_size, reshape_size)

Return type:

numpy.array

Raises:
deepcell.utils.data_utils.sample_label_matrix(y, window_size=(30, 30), padding='valid', max_training_examples=10000000.0, data_format=None)[source]

Sample a 4D Tensor, creating many small images of shape window_size.

Parameters:
  • y (numpy.array) – label masks with the same shape as X data

  • window_size (tuple) – size of window around each pixel to sample

  • padding (str) – padding type ‘valid’ or ‘same’

  • max_training_examples (int) – max number of samples per class

  • data_format (str) – ‘channels_first’ or ‘channels_last’

Returns:

4 arrays of coordinates of each sampled pixel

Return type:

tuple

deepcell.utils.data_utils.sample_label_movie(y, window_size=(30, 30, 5), padding='valid', max_training_examples=10000000.0, data_format=None)[source]

Sample a 5D Tensor, creating many small voxels of shape window_size.

Parameters:
  • y (numpy.array) – label masks with the same shape as X data

  • window_size (tuple) – size of window around each pixel to sample

  • padding (str) – padding type ‘valid’ or ‘same’

  • max_training_examples (int) – max number of samples per class

  • data_format (str) – ‘channels_first’ or ‘channels_last’

Returns:

5 arrays of coordinates of each sampled pixel

Return type:

tuple

deepcell.utils.data_utils.trim_padding(nparr, win_x, win_y, win_z=None)[source]

Trim the boundaries of the numpy array to allow for a sliding window of size (win_x, win_y) to not slide over regions without pixel data

Parameters:
  • nparr (numpy.array) – numpy array to trim

  • win_x (int) – number of row pixels to ignore on either side

  • win_y (int) – number of column pixels to ignore on either side

  • win_y – number of column pixels to ignore on either side

Returns:

trimmed numpy array of size x - 2 * win_x - 1, y - 2 * win_y - 1

Return type:

numpy.array

Raises:

ValueError – nparr.ndim is not 4 or 5

export_utils

Save Keras models as a SavedModel for TensorFlow Serving

deepcell.utils.export_utils.export_model(keras_model, export_path, model_version=0, weights_path=None, include_optimizer=True, overwrite=True, save_format='tf')[source]

Export a model for use with TensorFlow Serving.

DEPRECATED: tf.keras.models.save_model is preferred.

Parameters:
  • keras_model (tensorflow.keras.Model) – Instantiated Keras model.

  • export_path (str) – Destination to save the exported model files.

  • model_version (int) – Integer version of the model.

  • weights_path (str) – Path to a .h5 or .tf weights file.

  • include_optimizer (bool) – Whether to export the optimizer.

  • overwrite (bool) – Whether to overwrite any existing files in export_path.

  • save_format (str) – Saved model format, one of 'tf' or 'h5'.

deepcell.utils.export_utils.export_model_to_tflite(model_file, export_path, calibration_images, norm=True, location=True, file_name='model.tflite')[source]

Export a saved keras model to tensorflow-lite with int8 precision.

Deprecated since version 0.12.4: The export_model_to_tflite function is deprecated and will be removed in 0.13. Use tf.keras.models.save_model instead.

This export function has only been tested with PanopticNet models. For the export to be successful, the PanopticNet model must have norm_method set to None, location set to False, and the upsampling layers must use bilinear interpolation.

Parameters:
  • model_file (str) – Path to saved model file

  • export_path (str) – Directory to save the exported tflite model

  • calibration_images (numpy.array) – Array of images used for calibration during model quantization

  • norm (bool) – Whether to normalize calibration images.

  • location (bool) – Whether to append a location image to calibration images.

  • file_name (str) – File name for the exported model. Defaults to ‘model.tflite’

io_utils

Utilities for reading/writing files

deepcell.utils.io_utils.get_image(file_name)[source]

DEPRECATED. Use skimage.io.imread instead.

Read image from file and returns it as a tensor.

Parameters:

file_name (str) – path to image file

Returns:

numpy array of image data

Return type:

numpy.array

deepcell.utils.io_utils.save_model_output(output, output_dir, feature_name='', channel=None, data_format=None)[source]

Save model output as tiff images in the provided directory

Parameters:
  • output (numpy.array) – Output of a model. Expects channel to have its own axis.

  • output_dir (str) – Directory to save the model output images.

  • feature_name (str) – Optional description to start each output image filename.

  • channel (int) – If given, only saves this channel.

misc_utils

Miscellaneous utility functions

deepcell.utils.misc_utils.get_sorted_keys(dict_to_sort)[source]

Gets the keys from a dict and sorts them in ascending order. Assumes keys are of the form Ni, where N is a letter and i is an integer.

Parameters:

dict_to_sort (dict) – dict whose keys need sorting

Returns:

list of sorted keys from dict_to_sort

Return type:

list

deepcell.utils.misc_utils.sorted_nicely(ll)[source]

Sort a list of strings by the numerical order of all substrings

Parameters:

l (list) – List of strings to sort

Returns:

a sorted list

Return type:

list

plot_utils

Utilities plotting data

deepcell.utils.plot_utils.cf(x_coord, y_coord, sample_image)[source]

Format x and y coordinates for printing

Parameters:
  • x_coord (int) – X coordinate

  • y_coord (int) – y coordinate

  • sample_image (numpy.array) – Sample image for numpy arrays

Returns:

formatted coordinates (x, y, z).

Return type:

str

deepcell.utils.plot_utils.create_rgb_image(input_data, channel_colors)[source]

Takes a stack of 1- or 2-channel data and converts it to an RGB image

Parameters:
  • input_data – 4D stack of images to be converted to RGB

  • channel_colors – list specifying the color for each channel

Returns:

transformed version of input data into RGB version

Return type:

numpy.array

Raises:
  • ValueError – if len(channel_colors) is not equal to number of channels

  • ValueError – if invalid channel_colors provided

  • ValueError – if input_data is not 4D, with 1 or 2 channels

deepcell.utils.plot_utils.get_js_video(images, batch=0, channel=0, cmap='jet', vmin=0, vmax=0, interval=200, repeat_delay=1000)[source]

Create a JavaScript video as HTML for visualizing 3D data as a movie.

Parameters:
  • images (numpy.array) – images to display as video

  • batch (int) – batch number of images to plot

  • channel (int) – channel index to plot

  • vmin (int) – lower end of data range covered by colormap

  • vmax (int) – upper end of data range covered by colormap

Returns:

JS HTML to display video

Return type:

str

deepcell.utils.plot_utils.make_outline_overlay(rgb_data, predictions)[source]

Overlay a segmentation mask with image data for easy visualization

Parameters:
  • rgb_data – 3 channel array of images, output of create_rgb_data

  • predictions – segmentation predictions to be visualized

Returns:

overlay image of input data and predictions

Return type:

numpy.array

Raises:
  • ValueError – If predictions are not 4D

  • ValueError – If there is not matching RGB data for each prediction

deepcell.utils.plot_utils.plot_error(loss_hist_file, saved_direc, plot_name)[source]

Plot the training and validation error from the npz file

Parameters:
  • loss_hist_file (str) – full path to .npz loss history file

  • saved_direc (str) – full path to directory where the plot is saved

  • plot_name (str) – the name of plot

deepcell.utils.plot_utils.plot_training_data_2d(X, y, max_plotted=5)[source]
deepcell.utils.plot_utils.plot_training_data_3d(X, y, num_image_stacks, frames_to_display=5)[source]

Plot 3D training data

Parameters:
  • X (numpy.array) – Raw 3D data

  • y (numpy.array) – Labels for 3D data

  • num_image_stacks (int) – number of independent 3D examples to plot

  • frames_to_display (int) – number of frames of X and y to display

tracking_utils

Utilities for tracking cells

train_utils

Utilities for training neural nets

deepcell.utils.train_utils.count_gpus()[source]

Get the number of available GPUs.

Returns:

count of GPUs as integer

Return type:

int

deepcell.utils.train_utils.get_callbacks(model_path, save_weights_only=False, lr_sched=None, tensorboard_log_dir=None, reduce_lr_on_plateau=False, monitor='val_loss', verbose=1)[source]

Returns a list of callbacks used for training

Parameters:
  • model_path – (str) path for the h5 model file.

  • save_weights_only – (bool) if True, then only the model’s weights will be saved.

  • lr_sched (function) – learning rate scheduler per epoch. from rate_scheduler.

  • tensorboard_log_dir (str) – log directory for tensorboard.

  • monitor (str) – quantity to monitor.

  • verbose (int) – verbosity mode, 0 or 1.

Returns:

a list of callbacks to be passed to model.fit()

Return type:

list

deepcell.utils.train_utils.rate_scheduler(lr=0.001, decay=0.95)[source]

Schedule the learning rate based on the epoch.

Parameters:
  • lr (float) – initial learning rate

  • decay (float) – rate of decay of the learning rate

Returns:

A function that takes in the epoch and returns a learning rate.

Return type:

function

transform_utils

Utilities for data transformations

deepcell.utils.transform_utils.inner_distance_transform_2d(mask, bins=None, erosion_width=None, alpha=0.1, beta=1)[source]

Transform a label mask with an inner distance transform.

inner_distance = 1 / (1 + beta * alpha * distance_to_center)
Parameters:
  • mask (numpy.array) – A label mask (y data).

  • bins (int) – The number of transformed distance classes.

  • erosion_width (int) – number of pixels to erode edges of each labels

  • alpha (float, str) – coefficent to reduce the magnitude of the distance value. If “auto”, determines alpha for each cell based on the cell area.

  • beta (float) – scale parameter that is used when alpha is “auto”.

Returns:

a mask of same shape as input mask, with each label being a distance class from 1 to bins.

Return type:

numpy.array

Raises:

ValueErroralpha is a string but not set to “auto”.

deepcell.utils.transform_utils.inner_distance_transform_3d(mask, bins=None, erosion_width=None, alpha=0.1, beta=1, sampling=[0.5, 0.217, 0.217])[source]

Transform a label mask for a z-stack with an inner distance transform.

inner_distance = 1 / (1 + beta * alpha * distance_to_center)
Parameters:
  • mask (numpy.array) – A label mask (y data).

  • bins (int) – The number of transformed distance classes.

  • erosion_width (int) – Number of pixels to erode edges of each labels

  • alpha (float, str) – Coefficent to reduce the magnitude of the distance value. If 'auto', determines alpha for each cell based on the cell area.

  • beta (float) – Scale parameter that is used when alpha is “auto”.

  • sampling (list) – Spacing of pixels along each dimension.

Returns:

A mask of same shape as input mask, with each label being a distance class from 1 to bins.

Return type:

numpy.array

Raises:

ValueErroralpha is a string but not set to “auto”.

deepcell.utils.transform_utils.inner_distance_transform_movie(mask, bins=None, erosion_width=None, alpha=0.1, beta=1)[source]

Transform a label mask with an inner distance transform. Applies the 2D transform to each frame.

Parameters:
  • mask (numpy.array) – A label mask (y data).

  • bins (int) – The number of transformed distance classes.

  • erosion_width (int) – Number of pixels to erode edges of each labels.

  • alpha (float, str) – Coefficent to reduce the magnitude of the distance value. If “auto”, determines alpha for each cell based on the cell area.

  • beta (float) – Scale parameter that is used when alpha is “auto”.

Returns:

A mask of same shape as input mask, with each label being a distance class from 1 to bins.

Return type:

numpy.array

Raises:

ValueErroralpha is a string but not set to “auto”.

deepcell.utils.transform_utils.outer_distance_transform_2d(mask, bins=None, erosion_width=None, normalize=True)[source]

Transform a label mask with an outer distance transform.

Parameters:
  • mask (numpy.array) – A label mask (y data).

  • bins (int) – The number of transformed distance classes. If None, returns the continuous outer transform.

  • erosion_width (int) – Number of pixels to erode edges of each labels

  • normalize (bool) – Normalize the transform of each cell by that cell’s largest distance.

Returns:

A mask of same shape as input mask, with each label being a distance class from 1 to bins.

Return type:

numpy.array

deepcell.utils.transform_utils.outer_distance_transform_3d(mask, bins=None, erosion_width=None, normalize=True, sampling=[0.5, 0.217, 0.217])[source]

Transforms a label mask for a z stack with an outer distance transform. Uses scipy’s distance_transform_edt

Parameters:
  • mask (numpy.array) – A z-stack of label masks (y data).

  • bins (int) – The number of transformed distance classes.

  • erosion_width (int) – Number of pixels to erode edges of each labels.

  • normalize (bool) – Normalize the transform of each cell by that cell’s largest distance.

  • sampling (list) – Spacing of pixels along each dimension.

Returns:

3D Euclidiean Distance Transform

Return type:

numpy.array

deepcell.utils.transform_utils.outer_distance_transform_movie(mask, bins=None, erosion_width=None, normalize=True)[source]

Transform a label mask for a movie with an outer distance transform. Applies the 2D transform to each frame.

Parameters:
  • mask (numpy.array) – A label mask (y data).

  • bins (int) – The number of transformed distance classes.

  • erosion_width (int) – number of pixels to erode edges of each labels.

  • normalize (bool) – Normalize the transform of each cell by that cell’s largest distance.

Returns:

a mask of same shape as input mask, with each label being a distance class from 1 to bins

Return type:

numpy.array

deepcell.utils.transform_utils.pixelwise_transform(mask, dilation_radius=None, data_format=None, separate_edge_classes=False)[source]

Transforms a label mask for a z stack edge, interior, and background

Parameters:
  • mask (numpy.array) – tensor of labels

  • dilation_radius (int) – width to enlarge the edge feature of each instance

  • data_format (str) – A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).

  • separate_edge_classes (bool) – Whether to separate the cell edge class into 2 distinct cell-cell edge and cell-background edge classes.

Returns:

An array with the same shape as mask, except the channel axis will be a one-hot encoded semantic segmentation for 3 main features: [cell_edge, cell_interior, background]. If separate_edge_classes is True, the cell_interior feature is split into 2 features and the resulting channels are: [bg_cell_edge, cell_cell_edge, cell_interior, background].

Return type:

numpy.array

DeepCell API Key

DeepCell models and training datasets are licensed under a modified Apache license for non-commercial academic use only. An API key for accessing datasets and models can be obtained at https://users.deepcell.org/login/.

For more information about datasets published through DeepCell, please see Deepcell Datasets.

API Key Usage

The token that is issued by users.deepcell.org should be added as an environment variable through one of the following methods:

  1. Save the token in your shell config script (e.g. .bashrc, .zshrc, .bash_profile, etc.)

export DEEPCELL_ACCESS_TOKEN=<token-from-users.deepcell.org>
  1. Save the token as an environment variable during a python session. Please be careful to avoid commiting your token to any public repositories.

import os

os.environ.update({"DEEPCELL_ACCESS_TOKEN": "<token-from-users.deepcell.org>"})

Deepcell Datasets

SpotNet

SpotNet

TissueNet

TissueNet

DynamicNuclearNet

DynamicNuclearNet

SpotNet

_images/spots.png

SpotNet is a training dataset for a deep learning model for spot detection published in Laubscher et al. 2023.

This dataset is licensed under a modified Apache license for non-commercial academic use only.

The dataset can be accessed using deepcell.datasets with a DeepCell API key.

For more information about using a DeepCell API key, please see DeepCell API Key.

Each batch of the dataset contains two components:

  • X: raw images of fluorescent spots

  • y: coordinate annotations for spot locations

from deepcell.datasets import SpotNet

spotnet = SpotNet(version='1.0')
X_val, y_val = spotnet.load_data(split='val')

Gallery generated by Sphinx-Gallery

TissueNet

_images/multiplex_overlay.png

TissueNet is a training dataset for nuclear and whole cell segmentation in tissues published in Greenwald, Miller et al. 2022.

The TissueNet dataset is composed of a train, val, and test split.

  • The train split is composed of aproximately 2600 images, each of which are 512x512 pixels. During training, we select random crops of size 256x256 from each image as a form of data augmentation.

  • The val split is composed of aproximately 300 images, each of which is originally of size 512x512. However, because we do not perform any augmentation on the validation dataset during training, we reshape these 512x512 images into 256x256 images so that no cropping is needed in order to pass them through the model. Finally, we make two copies of the val set at different image resolutions and concatenate them all together, resulting in a total of aproximately 3000 images of size 256x256,

  • The test split is composed of aproximately 300 images, each of which is originally of size 512x512. However, because the model was trained on images that are size 256x256, we reshape these 512x512 images into 256x256 images, resulting in aproximately 1200 images.

Change Log

  • TissueNet 1.0 (July 2021): The original dataset used for all experiments in Greenwald, Miller at al.

  • TissueNet 1.1 (April 2022): This version of TissueNet has gone through an additional round of manual QC to ensure consistency in labeling across the entire dataset.

This dataset is licensed under a modified Apache license for non-commercial academic use only

The dataset can be accessed using deepcell.datasets with a DeepCell API key.

For more information about using a DeepCell API key, please see DeepCell API Key

from deepcell.datasets import TissueNet

tissuenet = TissueNet(version='1.1')
X_val, y_val, meta_val = tissuenet.load_data(split='val')

Gallery generated by Sphinx-Gallery

DynamicNuclearNet

_images/tracked.gif

DynamicNuclearNet is a training dataset for nuclear segmentation and tracking published in Schwartz et al. 2023. The dataset is made up of two subsets, one for tracking and one for segmentation.

This dataset is licensed under a modified Apache license for non-commercial academic use only

The dataset can be accessed using deepcell.datasets with a DeepCell API key.

For more information about using a DeepCell API key, please see DeepCell API Key

Tracking

Each batch of the dataset contains three components

  • X: raw fluorescent nuclear data

  • y: nuclear segmentation masks

  • lineages: lineage records including the cell id, frames present and division links from parent to daughter cells

from deepcell.datasets import DynamicNuclearNetTracking

dnn_trk = DynamicNuclearNetTracking(version='1.0')
X_val, y_val, lineage_val = dnn_trk.load_data(split='val')
data_source = dnn_trk.load_source_metadata()

Segmentation

Each batch of the dataset includes three components

  • X: raw fluorescent nuclear data

  • y: nuclear segmentation masks

  • metadata: description of the source of each batch

from deepcell.datasets import DynamicNuclearNetSegmentation

dnn_seg = DynamicNuclearNetSegmentation(version='1.0')
X_val, y_val, meta_val = dnn_seg.load_data(split='val')

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

Deepcell Applications

Caliban: Nuclear Segmentation and Tracking

Caliban: Nuclear Segmentation and Tracking

Mesmer: Tissue Segmentation

Mesmer: Tissue Segmentation

Caliban: Nuclear Segmentation and Tracking

Caliban is a pipeline for nuclear segmentation and tracking in live cell imaging datasets.

The models associated with Caliban can be accessed using deepcell.applications with a DeepCell API key.

For more information about using a DeepCell API key, please see DeepCell API Key.

import copy

import imageio
import matplotlib as mpl
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np

from deepcell.applications import NuclearSegmentation, CellTracking
from deepcell.datasets import DynamicNuclearNetSample
def shuffle_colors(ymax, cmap):
    """Utility function to generate a colormap for a labeled image"""
    cmap = mpl.colormaps[cmap].resampled(ymax)
    nmap = cmap(range(ymax))
    np.random.shuffle(nmap)
    cmap = ListedColormap(nmap)
    cmap.set_bad('black')
    return cmap

Prepare nuclear data

x, y, _ = DynamicNuclearNetSample().load_data()
def plot(im):
    fig, ax = plt.subplots(figsize=(6, 6))
    ax.imshow(im, 'Greys_r', vmax=3000)
    plt.axis('off')
    plt.title('Raw Image Data')

    fig.canvas.draw()  # draw the canvas, cache the renderer
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    plt.close(fig)

    return image

imageio.mimsave('caliban-raw.gif', [plot(x[i, ..., 0]) for i in range(x.shape[0])])

View .GIF of raw cells

_images/caliban-raw.gif

Nuclear Segmentation

Initialize nuclear model

The application will download pretrained weights for nuclear segmentation. For more information about application objects, please see our documentation.

app = NuclearSegmentation()

Use the application to generate labeled images

Typically, neural networks perform best on test data that is similar to the training data. In the realm of biological imaging, the most common difference between datasets is the resolution of the data measured in microns per pixel. The training resolution of the model can be identified using app.model_mpp.

print('Training Resolution:', app.model_mpp, 'microns per pixel')

The resolution of the input data can be specified in app.predict using the image_mpp option. The Application will rescale the input data to match the training resolution and then rescale to the original size before returning the labeled image.

y_pred = app.predict(x, image_mpp=0.65)

print(y_pred.shape)

Save labeled images as a gif to visualize

ymax = np.max(y_pred)
cmap = shuffle_colors(ymax, 'tab20')

def plot(x, y):
    yy = copy.deepcopy(y)
    yy = np.ma.masked_equal(yy, 0)

    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    ax[0].imshow(x, cmap='Greys_r', vmax=3000)
    ax[0].axis('off')
    ax[0].set_title('Raw')
    ax[1].imshow(yy, cmap=cmap, vmax=ymax)
    ax[1].set_title('Segmented')
    ax[1].axis('off')

    fig.canvas.draw()  # draw the canvas, cache the renderer
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    plt.close(fig)

    return image

imageio.mimsave(
    './caliban-labeled.gif',
    [plot(x[i,...,0], y_pred[i,...,0])
     for i in range(y_pred.shape[0])]
)

View .GIF of segmented cells

The NuclearSegmentation application was able to create a label mask for every cell in every frame!

_images/caliban-labeled.gif

Cell Tracking

The NuclearSegmentation worked well, but the cell labels of the same cell are not preserved across frames. To resolve this problem, we can use the CellTracker! This object will use another CellTrackingModel to compare all cells and determine which cells are the same across frames, as well as if a cell split into daughter cells.

Initalize CellTracking application

Create an instance of deepcell.applications.CellTracking.

tracker = CellTracking()

Track the cells

tracked_data = tracker.track(x, y_pred)
y_tracked = tracked_data['y_tracked']

Visualize tracking results

ymax = np.max(y_tracked)
cmap = shuffle_colors(ymax, 'tab20')

def plot(x, y):
    yy = copy.deepcopy(y)
    yy = np.ma.masked_equal(yy, 0)

    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    ax[0].imshow(x, cmap='Greys_r', vmax=3000)
    ax[0].axis('off')
    ax[0].set_title('Raw')
    ax[1].imshow(yy, cmap=cmap, vmax=ymax)
    ax[1].set_title('Tracked')
    ax[1].axis('off')

    fig.canvas.draw()  # draw the canvas, cache the renderer
    image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    plt.close(fig)

    return image

imageio.mimsave(
    './caliban-tracks.gif',
    [plot(x[i,...,0], y_tracked[i,...,0])
     for i in range(y_tracked.shape[0])]
)

View .GIF of tracked cells

Now that we’ve finished using CellTracker.track_cells, not only do the annotations preserve label across frames, but the lineage information has been saved in CellTracker.tracks.

_images/caliban-tracks.gif

Gallery generated by Sphinx-Gallery

Mesmer: Tissue Segmentation

Mesmer can be accessed using deepcell.applications with a DeepCell API key.

For more information about using a DeepCell API key, please see DeepCell API Key.

from matplotlib import pyplot as plt

from deepcell.datasets import TissueNetSample
from deepcell.utils.plot_utils import create_rgb_image, make_outline_overlay
# Download multiplex data
X, y, _ = TissueNetSample().load_data()

create rgb overlay of image data for visualization

rgb_images = create_rgb_image(X, channel_colors=['green', 'blue'])

plot the data

fig, ax = plt.subplots(1, 3, figsize=(15, 5))
ax[0].imshow(X[0, ..., 0], cmap='Greys_r')
ax[1].imshow(X[0, ..., 1], cmap='Greys_r')
ax[2].imshow(rgb_images[0, ...])

ax[0].set_title('Nuclear channel')
ax[1].set_title('Membrane channel')
ax[2].set_title('Overlay')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-input.png')
_images/mesmer-input.png

The application will download pretrained weights for tissue segmentation. For more information about application objects, please see our documentation.

from deepcell.applications import Mesmer
app = Mesmer()

Whole Cell Segmentation

Typically, neural networks perform best on test data that is similar to the training data. In the realm of biological imaging, the most common difference between datasets is the resolution of the data measured in microns per pixel. The training resolution of the model can be identified using app.model_mpp.

print('Training Resolution:', app.model_mpp, 'microns per pixel')

The resolution of the input data can be specified in app.predict using the image_mpp option. The Application will rescale the input data to match the training resolution and then rescale to the original size before returning the labeled image.

segmentation_predictions = app.predict(X, image_mpp=0.5)

create overlay of predictions

overlay_data = make_outline_overlay(rgb_data=rgb_images, predictions=segmentation_predictions)

select index for displaying

idx = 0

# plot the data
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(rgb_images[idx, ...])
ax[1].imshow(overlay_data[idx, ...])

ax[0].set_title('Raw data')
ax[1].set_title('Predictions')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-wc.png')
_images/mesmer-wc.png

Nuclear Segmentation

In addition to predicting whole-cell segmentation, Mesmer can also be used for nuclear predictions

segmentation_predictions_nuc = app.predict(X, image_mpp=0.5, compartment='nuclear')
overlay_data_nuc = make_outline_overlay(
    rgb_data=rgb_images,
    predictions=segmentation_predictions_nuc)

select index for displaying

idx = 0

# plot the data
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(rgb_images[idx, ...])
ax[1].imshow(overlay_data_nuc[idx, ...])

ax[0].set_title('Raw data')
ax[1].set_title('Nuclear Predictions')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-nuc.png')
_images/mesmer-nuc.png

Fine-tuning the model output

In most cases, we find that the default settings for the model work quite well across a range of tissues. However, if you notice specific, consistent errors in your data, there are a few things you can change.

The first is the interior_threshold parameter. This controls how conservative the model is in estimating what is a cell vs what is background. Lower values of interior_threshold will result in larger cells, whereas higher values will result in smaller cells.

The second is the maxima_threshold parameter. This controls what the model considers a unique cell. Lower values will result in more separate cells being predicted, whereas higher values will result in fewer cells.

To demonstrate the effect of interior_threshold, we’ll compare the default with a much more stringent setting

segmentation_predictions_interior = app.predict(
    X,
    image_mpp=0.5,
    postprocess_kwargs_whole_cell={'interior_threshold': 0.5})
overlay_data_interior = make_outline_overlay(
    rgb_data=rgb_images,
    predictions=segmentation_predictions_interior)

select index for displaying

idx = 0

# plot the data
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(overlay_data[idx, ...])
ax[1].imshow(overlay_data_interior[idx, ...])

ax[0].set_title('Default settings')
ax[1].set_title('More restrictive interior threshold')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-interior-threshold.png')
_images/mesmer-interior-threshold.png

To demonstrate the effect of maxima_threshold, we’ll compare the default with a much more stringent setting

segmentation_predictions_maxima = app.predict(
    X,
    image_mpp=0.5,
    postprocess_kwargs_whole_cell={'maxima_threshold': 0.8})
overlay_data_maxima = make_outline_overlay(
    rgb_data=rgb_images,
    predictions=segmentation_predictions_maxima)

select index for displaying

idx = 0

# plot the data
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(overlay_data[idx, ...])
ax[1].imshow(overlay_data_maxima[idx, ...])

ax[0].set_title('Default settings')
ax[1].set_title('More stringent maxima threshold')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-maxima-threshold.png')
_images/mesmer-maxima-threshold.png

Finally, if your data doesn’t include in a strong membrane marker, the model will default to just predicting the nuclear segmentation, even for whole-cell mode. If you’d like to add a manual pixel expansion after segmentation, you can do that using the pixel_expansion argument. This will universally apply an expansion after segmentation to each cell

To demonstrate the effect of pixel_expansion, we’ll compare the nuclear output with expanded output

segmentation_predictions_expansion = app.predict(
    X,
    image_mpp=0.5,
    compartment='nuclear',
    postprocess_kwargs_nuclear={'pixel_expansion': 5}
)
overlay_data_expansion = make_outline_overlay(
    rgb_data=rgb_images,
    predictions=segmentation_predictions_expansion
)

select index for displaying

idx = 0

# plot the data
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(overlay_data_nuc[idx, ...])
ax[1].imshow(overlay_data_expansion[idx, ...])

ax[0].set_title('Default nuclear segmentation')
ax[1].set_title('Nuclear segmentation with an expansion')

for a in ax:
    a.axis('off')

plt.show()
fig.savefig('mesmer-nuc-expansion.png')
_images/mesmer-nuc-expansion.png

There’s a separate dictionary passed to the model that controls the post-processing for whole-cell and nuclear predictions. You can modify them independently to fine-tune the output. The current defaults the model is using can be found here

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

DeepCell Banner
Build Status Coverage Status Documentation Status Modified Apache 2.0 PyPI version PyPi Monthly Downloads Python Versions

deepcell-tf is a deep learning library for single-cell analysis of biological images. It is written in Python and built using TensorFlow 2.

This library allows users to apply pre-existing models to imaging data as well as to develop new deep learning models for single-cell analysis. This library specializes in models for cell segmentation (whole-cell and nuclear) in 2D and 3D images as well as cell tracking in 2D time-lapse datasets. These models are applicable to data ranging from multiplexed images of tissues to dynamic live-cell imaging movies.

deepcell-tf is one of several resources created by the Van Valen lab to facilitate the development and application of new deep learning methods to biology. Other projects within our DeepCell ecosystem include the DeepCell Toolbox for pre and post-processing the outputs of deep learning models, DeepCell Tracking for creating cell lineages with deep-learning-based tracking models, and the DeepCell Kiosk for deploying workflows on large datasets in the cloud. Additionally, we have developed DeepCell Label for annotating high-dimensional biological images to use as training data.

Read the documentation at deepcell.readthedocs.io.

For more information on deploying models in the cloud refer to the the Kiosk documentation.

Examples

Raw Image Tracked Image
Raw Image Tracked Image

Getting Started

Install with pip

The fastest way to get started with deepcell-tf is to install the package with pip:

pip install deepcell

Install with Docker

There are also docker containers with GPU support available on DockerHub. To run the library locally on a GPU, make sure you have CUDA and Docker v19.03 or later installed. For prior Docker versions, use nvidia-docker. Alternatively, Google Cloud Platform (GCP) offers prebuilt virtual machines preinstalled with CUDA, Docker, and the NVIDIA Container Toolkit.

Once docker is installed, run the following command:

# Start a GPU enabled container on one GPUs
docker run --gpus '"device=0"' -it --rm \
    -p 8888:8888 \
    -v $PWD/notebooks:/notebooks \
    -v $PWD/data:/data \
    vanvalenlab/deepcell-tf:latest-gpu

This will start a Docker container with deepcell-tf installed and start a jupyter session using the default port 8888. This command also mounts a data folder ($PWD/data) and a notebooks folder ($PWD/notebooks) to the docker container so it can access data and Juyter notebooks stored on the host workstation. Data and models must be saved in these mounted directories to persist them outside of the running container. The default port can be changed to any non-reserved port by updating -p 8888:8888 to, e.g., -p 8080:8888. If you run across any errors getting started, you should either refer to the deepcell-tf for developers section or raise an issue on GitHub.

For examples of how to train models with the deepcell-tf library, check out the following notebooks:

DeepCell Applications and DeepCell Datasets

deepcell-tf contains two modules that greatly simplify the development and usage of deep learning models for single cell analysis. The first is deepcell.datasets, a collection of biological images that have single-cell annotations. These data include live-cell imaging movies of fluorescent nuclei (approximately 10,000 single-cell trajectories over 30 frames), as well as static images of whole cells (both phase and fluorescence images - approximately 75,000 single cell annotations). The second is deepcell.applications, which contains pre-trained models (fluorescent nuclear and phase/fluorescent whole cell) for single-cell analysis. Provided data is scaled so that the physical size of each pixel matches that in the training dataset, these models can be used out of the box on live-cell imaging data. We are currently working to expand these modules to include data and models for tissue images. Please note that they may be spun off into their own GitHub repositories in the near future.

DeepCell-tf for Developers

deepcell-tf uses docker and tensorflow to enable GPU processing. If using GCP, there are pre-built images which come with CUDA and Docker pre-installed. Otherwise, you will need to install docker and CUDA separately.

Build a local docker container, specifying the tensorflow version with TF_VERSION

git clone https://github.com/vanvalenlab/deepcell-tf.git
cd deepcell-tf
docker build --build-arg TF_VERSION=2.8.0-gpu -t $USER/deepcell-tf .

Run the new docker image

# '"device=0"' refers to the specific GPU(s) to run DeepCell-tf on, and is not required
docker run --gpus '"device=0"' -it \
-p 8888:8888 \
$USER/deepcell-tf:latest-gpu

It can also be helpful to mount the local copy of the repository and the notebooks to speed up local development. However, if you are going to mount a local version of the repository, you must first run the docker image without the local repository mounted so that the C extensions can be compiled and then copied over to your local version.

# First run the docker image without mounting externally
docker run --gpus '"device=0"' -it \
-p 8888:8888 \
$USER/deepcell-tf:latest-gpu

# Use ctrl-p, ctrl-q (or ctrl+p+q) to exit the running docker image without shutting it down

# Then, get the container_id corresponding to the running image of DeepCell-tf
container_id=$(docker ps -q --filter ancestor="$USER/deepcell-tf")

# Copy the compiled c extensions into your local version of the codebase:
docker cp "$container_id:/usr/local/lib/python3.6/dist-packages/deepcell/utils/compute_overlap.cpython-36m-x86_64-linux-gnu.so" deepcell/utils/compute_overlap.cpython-36m-x86_64-linux-gnu.so

# close the running docker
docker kill $container_id

# you can now start the docker image with the code mounted for easy editing
docker run --gpus '"device=0"' -it \
    -p 8888:8888 \
    -v $PWD/deepcell:/usr/local/lib/python3.6/dist-packages/deepcell/ \
    -v $PWD/notebooks:/notebooks \
    -v $PWD:/data \
    $USER/deepcell-tf:latest-gpu

How to Cite

License

This software is licensed under a modified APACHE2. See LICENSE for full details.

Trademarks

All other trademarks referenced herein are the property of their respective owners.

Credits

Van Valen Lab, Caltech

For more information on deploying DeepCell in the cloud:

Refer to the DeepCell Kiosk documentation