An N-Point Statistics Framework for Predicting Tissue Traits in Biomedical Images 16.07


Project Start Date: Jul 1, 2016
Research Areas: Analytics, Analytics - Machine Learning
Funding: Member Funded
Project Tags: ,

Project Summary

One of the many factors considered by pathologists when assessing tissue samples during cancer diagnosis is the “architecture” of the cells in the tumor.  This architecture is defined by the spatial distribution and structure of the cells in the tissue. For example, cells can be tightly packed in orderly disk-like clumps, or they may be more chaotic and dispersed in finger-like formations, or they may be gathered in circular structures with hollow centers.  These formations contribute to a pathologist’s evaluation of the severity, subtype and grade of the tumor, an evaluation that is used to determine the most appropriate and effective treatment for the patient.

In our work, we develop computational tools that can automate and assist in the assessment of cancerous tumors via analysis of images of immunohistochemical-stained (IHC) tissues extracted from the tumors. This project has specifically focused on exploring and developing new techniques for capturing and characterizing the spatial distribution of cells in tissues. Our approach utilizes N-points statistics to gather spatial information. This involves calculating the average distance from every cell nucleus to its kth nearest neighboring cell nuclei.  For any analyzed spatial distribution the calculation produces a discrete, sampled function, which we call a k-nearest-neighbor (k-NN) curve, that contains the mean distances between the closest, second-closest, …, and kth closest points, as a function of the integer k.  A low dimension feature space that captures certain types of spatial structures may be defined by applying Principal Component Analysis (PCA) to the k-NN curves derived from a set of representative training data. Instead of using actual sample images to compute the feature space (in order to keep our analysis independent of the images we are analyzing), we compute numerous synthetic datasets that mimic the formation of cell clumps and rings in tissues.

In order to produce a 2D feature that characterizes the spatial structure of cell nuclei in a test IHC image, the image is first segmented to identify the locations of the cell nuclei centers. The k-NN curve is computed for this point set. The sample’s curve is then mapped into the first and second principal components derived from the synthetic datasets. The figure demonstrates that these 2D features are able to discriminate between different types of spatial structures extracted from real IHC images because they map these structures, with various levels of density, compactness and circularity, into different regions of the feature space.