^{1}

^{2}

^{3}

^{3}

^{1}

^{2}

^{3}

A novel parallel tool for large-scale image enhancement/reconstruction and postprocessing of radar/SAR sensor systems is addressed. The proposed parallel tool performs the following intelligent processing steps: image formation, for the application of different system-level effects of image degradation with a particular remote sensing (RS) system and simulation of random noising effects, enhancement/reconstruction by employing nonparametric robust high-resolution techniques, and image postprocessing using the fuzzy anisotropic diffusion technique which incorporates a better edge-preserving noise removal effect and faster diffusion process. This innovative tool allows the processing of high-resolution images provided with different radar/SAR sensor systems as required by RS endusers for environmental monitoring, risk prevention, and resource management. To verify the performance implementation of the proposed parallel framework, the processing steps are developed and specifically tested on graphic processing units (GPU), achieving considerable speedups compared to the serial version of the same techniques implemented in C language.

The amount of data acquired by imaging satellites has been growing steadily in recent years. Many techniques of parallel computing and distributed systems are used by such imaging systems in novel remote sensing (RS) applications, which require timely responses for swift decisions. Relevant examples include monitoring of natural disasters like earthquakes and floods, military applications, tracking of man-induced hazards, forest fires, oil spills, and other types of biological agents. In addition, the acquisition of large-scale RS images with radar/SAR systems collects huge data of information [

In this paper, a new parallel tool for processing remotely sensed images is addressed. The proposed parallel tool performs the following processing steps: image enhancement/reconstruction and postprocessing. First, the image formation technique applies different system-level effects in order to degrade the images (i.e., along the range and azimuth directions) and adding random noising effects. Second, the high-resolution enhancement/reconstruction of the power spatial spectrum pattern (SSP) of the wave field scattered from the extended remotely sensed scene is employed via the descriptive regularization approach with the Robust Space Filter (RSF) and the Robust Adaptive Space Filter (RASF) algorithms [

The set of steps described above will provide to the end user a tool with the aim of improving the quality of the RS images acquired from radar/SAR systems; however, there is an important drawback for its high computational cost. Therefore, the proposed parallel tool is developed using GPU. Nowadays, these specialized hardware devices have evolved into a highly parallel, multithreaded, many-core processors with tremendous computational speed and very high memory bandwidth [

The rest of the paper is organized as follows. In Section

In this section, we present a brief overview of the general processing chain employed in this study. The addressed methodology for real-time formation/enhancement/reconstruction/post-processing of the RS imagery acquired with radar/SAR systems is described. Figure

RS problem model block diagram.

In this section, we present the summary of the RS imaging problem that was previously developed in [

In this stage, we estimate

In this regard, the solution to the optimization problem derived in previous studies [

In this section, the aggregation of the fuzzy anisotropic diffusion and the regularization-based methods are described for the reconstruction/post-processing image processing. The authors consider that the aggregation of such methods in the proposed parallel toolbox increases the flexibility and robust edge definition in the reconstructed image. That is, the noise and the edges are both high frequency image components, and the conventional image enhancement/reconstruction techniques oriented for large-scale remote sensing imaging do not work well for edge-preserving smoothing of image corrupted with additive and speckle noise (i.e., a tradeoff between sharpening and blurring must be selected). In this study, the gradient as the edge factor in the anisotropic diffusion is replaced by a rule-based fuzzy anisotropic diffusion. Also, a fuzzy inference system is employed to replace the edge stopping function providing a better control on the diffusion process.

The aggregation of the reconstruction/post-processing framework offers the possibility to preserve high spatial resolution performances via anisotropic diffusion image post-processing, which implies anisotropic regularizing windowing (WO) over the reconstructed solution in the image space. Next, the following equation represents the celebrated Perona-Malik anisotropic diffusion method [

Now, the discrete version of the anisotropic diffusion equation of (

In this study, we propose to change each nearest-neighbor differences

Fuzzy logic system (FLS) is a rule-based theory in which an input is first fuzzified (converted from a crisp number to a fuzzy set) and subsequently processed by an inference engine that retrieves knowledge in the form of fuzzy rules contained in a rule-base. The fuzzy sets computed by the fuzzy inference as the output of each rule are then composed and defuzzified (converted from a fuzzy set to a crisp number). The following fuzzy rules are defined in Table

Fuzzy rules to compute

Direction | Fuzzy rule |
---|---|

North | IF |

South | IF |

East | IF |

West | IF |

North east | IF |

North west | IF |

South east | IF |

South west | IF |

The proposed fuzzy anisotropic diffusion approach is next described in detail as follows:

Now, we are ready to describe the GPU-based implementation of the intelligent parallel processing toolbox for remotely sensed imagery.

In order to speed up the processing design flow methodology described in the previous section, a parallel framework is used. This parallel tool was developed using the graphic user interface (GUI) of MATLAB and the algorithmic implementation (i.e., image formation/reconstruction/post-processing) with GPU. The proposed approach also allows other functionalities, such as gray-scale image representation, random noising effects, image enhancement/reconstruction, fuzzy edge detection representation, and loading/storing of results for different radar/SAR systems. The developed parallel tool is shown in Figure

Graphic user interface.

In addition, the algorithmic implementation of the processing chain framework uses several processors working with independent data partitions [

The basic unit of work on the GPU is a thread. Every thread acts as if it has its own processor with separate registers and identity that happens to run in a shared memory environment [

The way the GPU is divided is in Streaming Multiprocessors (SMs), and each SM contains cores that execute and identical instruction set, or sleep; up to 32 threads may be scheduled at a time, called a warp, but maximum 24 warps are active in one SM. The threads in the same multiprocessor can share “Shared Memory” by synchronizing their execution for coordinating accesse to memory. The SM divides registers among threads and threads access the register memory as local; they have access to local cache memories in the multiprocessor, while the multiprocessors have access to the global GPU (device) memory.

The GPU blocks represent the number of parallel blocks that we will launch in our grid. Hardware is free to assign blocks to any processor at any time, and at execution time they tell GPU how many threads to launch per block [

A 2D hierarchy of blocks and threads in an SM.

One of the first decisions to make before developing the GPU implementation is about the memory mapping of data, that is, how the GPU memory will be used for an improved performance. To do that, we assume that image fits into GPU memory, and in future works this problem will be addressed for larger images. However, instead of just copying and using the values for each pixel, we use textures. Textures are bound to global memory and can provide both cache and some limited, 9-bit processing capabilities [

On the other hand, we use fuzzy rules which are common to each thread and do not change over time, and for them we use another kind of memory known as constant memory. Constant memory is used for data that will not change over the course of a kernel execution. In some situations, using constant memory rather than global memory will reduce the required memory bandwidth [

Considering the aggregation of parallel techniques with GPU computing, we next describe the efficient GPU-based implementation of each processing stage of the proposed parallel tool: GPU implementation of the (i) image formation; (ii) image enhancement/reconstruction; (iii) image post-processing.

Now, we describe the specific functions and CUDA kernels used for efficient implementation of image formation/reconstruction/post-processing on the GPU.

First, a kernel is implemented to compute the image formation processing applying the conventional Matched Space Filter (MSF) algorithm [

Second, the computational procedures for the implementation of the RSF/RASF reconstructive algorithms are described. We start with the memory configuration management using the “mmap” library of GNU C. With this library, the image can be accessed from the CPU with a simple pointer without the necessity to read each pixel value. Next, the matrix-matrix multiplication operations are computed. A specific CUDA kernel is implemented using one grid of

Next, the extern function Perona-Malik is called from the host, it binds the texture to array, calls the “fuzzy” kernel for image enhancement, and normalizes the resulting array using the optimized NVIDIA Performance Primitives [

After that, it is applied the implication method (min function), afterwards, an aggregation method (max function), and finally defuzze the values applying the anisotropic diffusion.

The implication method is computed Imp_{Y} as the minimum between the resulting _{White} and Imp_{Black}. Finally the anisotropic diffusion was applied using (

The resulting pixel values are independent from others, and hence the fuzzy calculations are suitable for a GPU implementation. Each thread is responsible for computing the fuzzy anisotropic diffusion of a part of the image. This image part is selected as shown in Figure

The resulting pixel values are float numbers; hence, in order to ease the visualization, they are normalized to unsigned char values for each channel. This task is performed before displaying the resulting image but is optional in case of post-processing. We take advantage of optimized functions provided by the Nvidia Performance Primitives [

In this section, we carry out the experimental validation of the proposed parallel toolbox system. Also, the time performance analysis is performed in order to demonstrate the time processing improvement achieved with the GPU-based implementation. The experimental case studies are next described for a high-resolution RS image acquired from an SAR system.

In these experiments, the validation was performed with large scales (1

In the reconstruction and post-processing stage, we conduct the evaluation of the GPU-based implementation of the well-known reconstructive and enhancement post-processing algorithms. The validation has been realized between the original scene frame, the degraded RS image, the reconstructive RASF algorithm, the Perona-Malik (PM) technique, and the fuzzy anisotropic-diffusion technique.

In analogy to the image reconstruction for quantitative evaluation of the RS reconstruction performances, the quality metric defined as an improvement in the output signal-to-noise ratio (IOSNR) is employed. This metric is defined as follows:

IOSNR of the aggregated RASF-PM and RASF-fuzzy anisotropic diffusion algorithms evaluated for different SNRs.

SNR (dB) | RASF-PM method (dB) | RASF-fuzzy method (dB) |
---|---|---|

5 | 7.13 | 8.15 |

10 | 7.92 | 10.37 |

15 | 9.75 | 11.74 |

20 | 10.86 | 13.65 |

According to this quality metric, the higher the IOSNR (the average over 100 realizations), the better the improvement of the image reconstruction/post-processing with the particular employed algorithm.

Next, the qualitative results are presented in Figures

Experimental results: (SNR = 20 dB): (a) original test scene of lower Manhattan; (b) degraded scene image formed applying the MSF method; (c) image reconstructed applying the regularized RASF algorithm; (d) image enhancement applying the RASF-PM post-processing algorithm; (e) image applying the fuzzy edge detection; (f) image enhancement using the RASF-fuzzy post-processing algorithm.

Experimental results: (SNR = 20 dB): (a) original test scene which corresponds to the Pentagon region; (b) degraded scene image formed applying the MSF method; (c) image reconstructed applying the regularized RASF algorithm; (d) image enhancement applying the RASF-PM post-processing algorithm; (e) image applying the fuzzy edge detection; (f) image enhancement using the RASF-fuzzy post-processing algorithm.

From the analysis of the qualitative and quantitative simulation results reported in Figures

Next, we compared the required processing time for two different implementation schemes as reported in Tables

Comparative feature analysis of the employed GPU.

GPU features | GTS 450 | Tesla C2075 |
---|---|---|

Peak single precision floating point performance | 601 Gflops | 1030 Gflops |

Memory bandwidth (ECC off) | 57.7 GB/sec | 148 GB/sec |

Memory size (GDDR5) | 1 GB | 6 GB |

CUDA compute capability | 2.1 | 2.0 |

CUDA cores | 192 | 448 |

Comparative analysis of processing times for model-free fuzzy anisotropic diffusion method.

Image size | MATLAB | GTS 450 (Desktop GPU) | Tesla C2075 (HPC server) | ||
---|---|---|---|---|---|

Total | Total | Kernel | Total | Kernel | |

1 K × 1 K | 9626 ms | 897 ms | 357 ms | 98 ms | 39 ms |

3 K × 3 K | 79670 ms | 4814 ms | 1916 ms | 503 ms | 200 ms |

Next, considering the implementation on a Dell PowerEdge Server, which has a Nvidia Tesla C2075 GPU, we have achieved kernel-processing times up to 30 ms, which means 10+ times faster than desktop GPU processing with a Nvidia Gforce GTS 450.

The overall process for each stage of aggregation of the reconstructive model-based RASF method and the model-free fuzzy anisotropic diffusion method is 98 ms for Tesla and 897 ms for GTS, and considering only the fuzzy anisotropic kernel the times are 39 ms for Tesla and 357 ms for GTS. The speedup of each stage is over 200 times in comparison to a Matlab implementation, and 10 times a Desktop GPU implementation, as presented in the comparative processing time analysis of Table

In this paper, a parallel tool for large-scale remote sensing images was presented. This tool allows the image formation and the reconstruction of images using two well-known techniques: the robust spatial filter (RSF) and the robust adaptive spatial filter (RASF). It also applies a post-processing stage based on the Perona-Malik algorithm jointly with a Fuzzy Logic System (FLS) named in this paper as fuzzy anisotropic diffusion technique. This toolbox was implemented using a Matlab graphic user interface (GUI) and the mathematical operations were computed using graphics processor units (GPU). Under this paradigm, it was possible to save significant processing time due to the adaptation of the algorithms for its parallel implementation. For the presented case of studies, it was shown that the processing time in comparison with a PC based implementation was reduced +200 times. Thus, this approach represents a powerful parallel tool for processing huge amount of large scale RS images.

This study was supported by Consejo Nacional de Ciencia y Tecnologia (CONACYT), Mexico, under Grant CB-2010- 01-158136.