Research

  • Thinker-IM

    This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduce 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8× better than state-of-the-art.

  • Thinker-S

    Thinker-S is an ultra-low power speech recognition processor, which mainly consists of energy-based VAD unit, reconfigurable data-path for feature extraction and on-chip self-learning, BCNN unit, smoothing unit, main controller, and shared memory. Fabricated in 28 nm CMOS, this processor supports real time speech recognition with power consumption of 141 uW and energy efficiency of 2.46 pJ/Neuron, while achieving at most 98.6\% recognition accuracy. There are mainly three features of Thinker-S. Firstly, a configurable architecture optimized for BCNN is proposed in this work. Frame-level reuse, which is a spatial-temporal 2-Dimensional BCNN computation optimization, is leveraged to maximally reuse data, and eliminate the redundancy in convolutional computation. To optimize memory access and storage, we tailor the memory mapping of BCNN, and propose Bit-level Regularization to compress BCNN weight matrices, along with hybrid-bank memory. Secondly, approximate circuits are designed to reach a balance between accuracy and energy consumption, considering the characteristics of BCNN operations. The objects include partially accurate adder in BCNN inference and softmax function in back-propagation. Thirdly, we implement runtime self-learning on this chip. A novel framework is proposed to select training samples and generate labels during runtime. By updating weights according to the input speech of specific users, the recognition word error rate (WER) is reduced relatively by 14.37\%--51.4\%.Comparing with state-of-the-art designs, this processor reduces 42.9\%--91.2\% power consumption, 2.53x lower energy consumption per neuron, and 7.96x lower energy per speech frame.

  • Thinker-II

    Thinker-II is an ultra-high energy-efficient reconfigurable processor fabricated in 28nm CMOS technology to accelerate deep neural networks with binary/ternary weights. It has a total area of 4.8 mm2 and runs at a nominal state of 200 MHz frequency and 0.9 V voltage. It has 32 PEs running in parallel, and 224 KB on-chip memory to store activations, weights and intermediate values. Four techniques are proposed and implemented to improve its energy efficiency. Firstly, feature-integral-based convolution method is designed to decrease the arithmetic complexity of standard binary/ternary convolution. Secondly, kernel-transformation-feature-reconstruction convolution calculation method is proposed and implemented to remove redundant operations among in standard binary/ternary convolution. Thirdly, a hierarchical load-balancing mechanism is designed to eliminate zero-value computations and improve the resource utilization. Finally, a joint optimization approach for convolutional layer is designed to seek the optimal convolution calculation pattern for each layer to minimize power consumption and computation time. With all optimization techniques, this processor achieves an energy efficiency of 19.9 TOPS/W on AlexNet with 1b weight and 16b activation. The high energy-efficient Thinker-II is very beneficial to acceleration of neural networks with binary/ternary weights for mobile and IOT devices.