Connected Components (Connected Components)

Statistical 3D Segmentation With Greedy Connected Component Labeling

Authors: Jiuxiang Hu, Gerald Farin, Matthew Holecko II, Stephen P. Massia, Gregory Nielson, Anshuman Razdan

   URL:
     http://prism.asu.edu/research/data/publications/paper02_s3dswgcclr.pdf#search=%223D%20connected%20components%20greedy%22

Problem: Segmenting protein and nuclei and electrode implant structures to reveal important biological information.

Data Structure: 3D data is represented as 2D plane slices. Typical size: Example A: 512 x 512 with 58 slices. Example B: 512 x 512 with 78 slices. An equivalence table is used to store the information about the equivalence of labels from the same component.

Efficiency: classical CCL pays more attention to the cost of memory then time complexity compared to GCCL (greedy connected component labeling).

Data	Data Size	Number of Components	Largest Component	Time(s)
GFAP & Nuclei	29.9MB	2514	87536	4
GFAP & Implant	63.9MB	3115	234976	8

-Memory: size of the equivalence table is bounded to O(N) for a N x N 2D image. In 3D the equivalence table is O(the width of the connected component)

-Time: O(N) where N is the number of voxels in the image. I believe the greedy algorithm computes between 15 to 20 million voxels less then 10 seconds.

Implementation: Semi-automatic using 2 phases.

Phase 1: Initial Segmentation of image

-Label the voxels - simplest way to label voxels is as the target or background

-uses maximum likelihood estimation to find Weibull parameters.

Weibull Distribution: is most commonly used in life data analysis because of its applications to extreme value theory. The Weibull distribution is often used in place of the normal distribution. It is also a very popular statistical model in reliability engineering and failure analysis

Phase 2: Refined Image Segmentation using Greedy Connected Component Labeling

Why do we need greedy connected component labeling? After phase 1 the segmentation results in many small isolated regions, therefore we use greedy connected component algorithm to isolate the significant components.

Advantages:

Wiebull parameters allow local and global information are taken into account for segmentation

Disadvantages:

Main Problem: GCCL may repeat the labeling of a connected component
Semi-automatic: requires user input and expertise

Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines

Authors: Steven S Lumetta, Arvind Krishnamurthy, and David E. Culler

URL: http://www.eecs.berkeley.edu/Research/Projects/CS/parallel/castle/split-c-apps/connect/

Propose: Presents a fast, portable, general purpose algorithm for finding connected components on a distributed memory machine.

Implementation: We want to minimize the high cost of remote access which is typically hundreds of time higher then local access, but still utilize the performance advantage of using parallel processes. Therefore a parallel hybrid algorithm that blends a sequential algorithm optimized for local access with an efficient algorithm optimized for the global(remote) phases

Local: variation of depth-first search Global: powerful efficient variant of Shiloach-Vishkin

Programming Language: Split-C: Split-C is a parallel extension of the C programming language that supports efficient access to a global address space on distributed memory multiprocessors.

Algorithm:

Local Phase: Perform a local DFS on each processor's portion of the graph, collapsing each local connected component into a representative node. Mark each component of this global graph with a unique value for the global phase. Execution time is O(n), where n is the number of nodes
Global Initialization: Complete the preparation of the global graph by pointing each remote edge at one of the local representative nodes selected in the local phase.
Global: Beginning with the reduced global graph of representative nodes and remote edges, repeat the following:

Termination Check: If no components remain to be processed on any processor, quit.
Hooking Step: Attempt to merge each component into another component by collapsing an edge. Conditions on merging guarantee that representative nodes remain unique.
Star Formation Step: Collapse newly formed components into trees of height one (stars) to ensure that representative nodes in a component have a single, consistent value.
Self-Loop Removal Step: For each component, remove edges that point to nodes within the component, i.e., nodes with the same value.
Edge-List Concatenation Step: For all leaf of components, move remaining edges to the root.

4. Local Cleanup: For each node not selected as a representative node in the global graph, update the value of the node from its representative.

Efficiency: Algorithm is used on three large-scale parallel machines: Cray T3D, the Meiko CS-2 and the Thinking Machines CM-5

	T3D	CS-2	CM-5
Clock (MHz)	150 (1.00)	90 (0.60)	33 (0.22)
Comm. (Mreads/s)	1.00 (1.00)	0.05 (0.05)	0.08 (0.08)
DFS (Mn/s)	0.78 (1.00)	0.50 (0.64)	0.22 (0.29)

Advantages:

Hybrid solution has best performance to date on connected component problem (or so they claim). The paper does not provide any information on the size of the datasets in the number of voxels.

Disadvantages:

Requires high communication performance to obtain good speed ups in performance
Although sequential solution can be simple, parallel solution can be very challenging in practice

Tracking and Matching Connected Components from 3D Video

Authors: David da Silva Pires, Roberto M. Cesar-Jr.

Purpose: Presents new real-time solution for tracking geometrical connected components and texture information provided by the 3D video stream.

Implementation:

1. Identifying 3D Connected Components in each frame.

2. Track the Connected Components between frames. Utilizing the fact that subsequent frames have high intersection between

corresponding connected components.

3. Identify corresponding salient points.

-Texture Alignment - Alignment is important for compression and scene reconstruction.

-Identification of salient pointing in the geometry data (frame t)

-Identification of corresponding salient point in the geometry data (frame t–1)

4. Estimate the geometrical transformation for 3D registration of the connected components.

-The steps also allow noise filtering

-Method also deals with different events, such as when connected components appear, disappear or split.

Efficiency: Real-time corresponding to a frame rate of 30 fps using high resolution video. Therefore identification and tracking of connected components happens at video rate time.

Applications: Tracking Methods are important in computer vision for feature detection and extraction, matching, image alignment and stitching panorama generation.

Coronary Vessel Cores from 3D Imagery: a Topological Approach

Authors: Andrzej Szymczak and Allen Tannenbaum and Konstantin Mischaikow

URL: http://www-static.cc.gatech.edu/~andrzej/papers/spie.pdf

The Algorithm: Two steps and is a variant of the union-find algorithm

1) Compute Robust Maxima

-Purpose: Generate points near the centerlines of blood vessels at each 2D slice of data.

-This step runs in O(nlogn) where n is the number of all the voxels in the input volume. High cost is a result of sorting

2) Vessel Reconstruction from the Robust Maxima

-Purpose: Reconstruct the blood vessel trees from the set A obtained in step 1 by connecting the neighboring points on the same blood vessel trees.

-First create a basic representation by creating a forest F by connecting input points with short edges. Then use geometric filters to improve the image.

Advantages:

Medical datasets (including those from CT scans) are noisy. A topological method is good at finding and describing persistent features that aren’t strongly influenced by local noise.
Modeling of an object that has a one-dimensional structure (blood vessels)

Disadvantages:

Algorithm exhibits far from optimal run time performance.

Efficiency:

Datasets are between 10 and 15 million voxels
Run-Time is about 6 minutes on a Pentium III-850MHz workstation

Current Work

To add more interesting 3D connected component implementations to the literature survey
Provide all references in the literature survey in Bibtex form
To brainstorm a creative idea(s) for a research project. Some possible ideas below
Real-time product testing: I thought of this idea after taking with a chemical engineer working for Gillette. Gillette is marketing a new kind of razor that has a hardened soup-like solution attached to the razor so the customer does not require shaving cream. Currently they manually inspect each razor to see if there are cracks in the solution. If there are cracks, the product does not meet testing requirements and isn't used. Could we automate this process or a process similar to this? Possibly using X-rays to provide the datasets. What about detecting cracks in glass?
Another possibility is to find an application when we need to know whether two voxels are in the same component. We know if two voxels are in the same component if both their parents point to the same root. This can be used to determine if there is a path from one end of a structure to another. This might have applications in determining how water penetrates concrete.
Video Tracking: We could create a video from a bunch of still frames running at about 30fps. We might be able to track components as they move in real time. This is similar to the 3D video tracking and matching work done above, but with a different implementation based off Connect.