Commits
Jingning Han committed 1bb11781e2b
Rework idct8x8_10 SSE2 implementation This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits the fact that only top-left 4x4 block contains non-zero coefficients, and hence reduces the instructions needed. The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles, estimated by averaging over 100000 runs. For pedestrian_area_1080p 300 frames coded at 4000kbps, the average decoding speed goes up from 79.3 fps to 79.7 fps. Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180