Thank you farhada and AdvaRes for the responses. The target algorithm is associated with stereo matching i.e. placing two cameras like the two human eyes, estimate the depth information from the image pair, and then interpolate a plausible view like taken from an intermediate position...During the depth estimation, there are many for-loops with few data dependencies in each iteration. So I think it has the potential to be highly parallelized. The algorithm has already been implemented on PC but now can process only one such a pair in one second (Pentium IV 3.0GHz). GPU is much better, but we have some plans to go embedded and ASIC.
I will first evaluate some promising platforms and do prototyping. To achieve the best real-time performance, implement the algorithm directly using FPGA fabrics seems to be the only solution. However the MPSoC solution also came into my mind, and some of our professors and PhDs are pioneers in this field (also on the NoC architectures). After the prototyping, perhaps they also want to make an ASIC or say ASSP implementation.
I am a quite new member with the related topics. So I would like to learn knowledge on this, from both the academic and industry. AdvaRes, do you know some successful SoC products (ASIC) that employs NoC as interconnections?