Title: 一個基於像像減量的快速分水嶺演算法及其硬體設計
An Accelerated Watershed Algorithm and Its Hardware Design Based on Pixel Reduction Techniques
Authors: Tung-Yang Pan
Contributors: NTOU:Department of Electrical Engineering
Keywords: 影像分割;分水嶺演算法;平坦化;梯度化
Image Segmentation;Watershed Algorithm;Smoothing;Gradient Thresholding
Date: 2005
Issue Date: 2011-07-04
Abstract: 影像分割在影像處理過程中是一個非常重要的處理步驟,無論在影像辨識或影像壓縮等應用上均佔有關鍵性的地位,而分水嶺演算法是目前常用但運算量很大的影像分割法。近來很多手機和PDA都新增了數位影像處理與傳輸的功能,但受限於低功耗的要求,這類產品所搭載的CPU通常較難勝任高複雜度的運算。影像分割當然也會對這類CPU造成負擔。本研究基於副處理器的概念,設計並實作一個分水嶺計算專用的電路架構,讓CPU可以將複雜的分水嶺計算交由此電路負責,以減輕CPU的負載。我們針對前處理以及分水嶺計算的流程加以修改,以降低整體的運算量。前置處理的平坦化過程改以一種像素減量的方式完成,如此不但能減少需處理的像素量,並且使每一處理步驟都能以較短的時間完成,其後的工作也都因這項修改而降低處理次數。複雜的梯度化流程也經過修改而大幅降低運算量但不影響其結果。最後完成分水嶺計算後再將分水嶺線還原至原始像素,並在MATLAB 7.1做行為模擬,確定分割區塊達到預期。電路設計共分為六大模組,即垂直像素減量模組、水平像素減量模組、梯度模組、排序模組、分水嶺計算模組、以及分水嶺還原模組。各模組除分別以ModelSim模擬驗證,確認功能正常與運算結果正確後,並以Xilinx 7.1 ise合成、佈局及繞線。最後將六大模組整合,經由ModelSim驗證確認所產生的分水嶺線位置與預期相同,並將完整之電路合成至同一FPGA晶片上。此電路工作頻率可達79.890 MHz,總共耗用4173個4-input LUTs。此六大模組除可同時運作外,亦皆可獨立運作。
We propose in this thesis a series of changes to the well-known watershed image segmentation algorithm in order to make the algorithm more suitable for hardware implementation using simple digital circuits. The modification starts from replacing the traditional pre-processing step of smoothing with a pixel reduction process, which not only greatly reduces the amount of computation, but also simplifies all the steps that follow. After smoothing, the gradient thresholding process is also modified in the direction of simplifying the computation without loss of effectiveness. We complete the whole process with a technique to reconstruct watershed lines from the isolated points obtained from a pixel-reduced image. Our algorithm has been verified with simulations using MATLAB. After the correctness has been shown acceptable, we further design a digital circuit capable of performing image segmentation based on our algorithm including the pre-processing steps. The synthesizable RTL models are described in Verilog, simulated with ModelSim, and synthesized on a Xilinx platform. Simulation results show that the clock rate reaches 79.89 MHz, which is well above the necessary processing speed if the circuit is used as a special-purpose image co-processor for mobile devices mostly equipped with low-power CPUs lacking computing power for real-time image functions. The FPGA implementation of the circuit consumes 4173 LUTs not counting the memories.
