Publications

Selected Publications

(Updated on October 28th, 2021, from dblp)

  • 2019

    [j]     Shouyi Yin, Peng Ouyang, Jianxun Yang, Tianyi Lu, Xiudong Li, Leibo Liu, Shaojun Wei: An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width. J. Solid-State Circuits 54(4): 1120-1136 (2019)
    [j]     Yinglin Zhao, Peng Ouyang, Wang Kang, Shouyi Yin, Youguang Zhang, Shaojun Wei, Weisheng Zhao: An STT-MRAM Based in Memory Architecture for Low Power Integral Computing. IEEE Trans. Computers 68(4): 617-623 (2019)
    [j]     Hai Huang, Leibo Liu, Qihuan Huang, Yingjie Chen, Shouyi Yin, Shaojun Wei: Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack. IEEE Trans. on CAD of Integrated Circuits and Systems 38(2): 208-219 (2019)
    [j]     Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Leibo Liu, Shaojun Wei: A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA. IEEE Trans. on CAD of Integrated Circuits and Systems 38(4): 678-691 (2019)
    [j]     Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei: A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution. IEEE Trans. on CAD of Integrated Circuits and Systems 38(7): 1265-1277 (2019)
    [j]     Liang Wang, Ping Lv, Leibo Liu, Jie Han, Ho-fung Leung, Xiaohang Wang, Shouyi Yin, Shaojun Wei, Terrence S. T. Mak: A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems. IEEE Trans. on CAD of Integrated Circuits and Systems 38(9): 1771-1784 (2019)
    [j]     Dajiang Liu, Shouyi Yin, Guojie Luo, Jiaxing Shang, Leibo Liu, Shaojun Wei, Yong Feng, Shangbo Zhou: Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning. IEEE Trans. on CAD of Integrated Circuits and Systems 38(12): 2271-2283 (2019)
    [j]     Man Shi, Peng Ouyang, Shouyi Yin, Leibo Liu, Shaojun Wei: A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression. IEEE Trans. Circuits Syst. II Express Briefs 66-II(11): 1870-1874 (2019)
    [j]     Shixuan Zheng, Peng Ouyang, Dandan Song, Xiudong Li, Leibo Liu, Shaojun Wei, Shouyi Yin: An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning. IEEE Trans. Circuits Syst. I Regul. Pap. 66-I(12): 4648-4661 (2019)
    [j]     Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei: Reconfigurable Architecture for Neural Approximation in Multimedia Computing. IEEE Trans. Circuits Syst. Video Techn. 29(3): 892-906 (2019)
    [j]    Leibo Liu, Qiang Wang, Wenping Zhu, Huiyu Mo, Tianchen Wang, Shouyi Yin, Yiyu Shi, Shaojun Wei: A Face Alignment Accelerator Based on Optimized Coarse-to-Fine Shape Searching. IEEE Trans. Circuits Syst. Video Techn. 29(8): 2467-2481 (2019)
    [j]     Huiyu Mo, Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei: Face Alignment With Expression- and Pose-Based Adaptive Initialization. IEEE Trans. Multimedia 21(4): 943-956 (2019)
    [j]     Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Leibo Liu, Jishen Zhao, Cong Xu, Shuangchen Li, Yuan Xie, Shaojun Wei: Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory. IEEE Trans. Parallel Distrib. Syst. 30(1): 146-160 (2019)
    [c]    Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei: Small-Footprint Keyword Spotting with Graph Convolutional Network. ASRU 2019: 539-546
    [c]    Hui Yan, Zhaoshi Li, Leibo Liu, Shouyi Yin, Shaojun Wei: Constructing Concurrent Data Structures on FPGA with Channels. FPGA 2019: 172-177
    [c]    Yu Pan, Peng Ouyang, Yinglin Zhao, Shouyi Yin, Youguang Zhang, Shaojun Wei, Weisheng Zhao: A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural Convolutional Network. ACM Great Lakes Symposium on VLSI 2019: 271-274
    [c]    Leibo Liu, Ao Luo, Guanhua Li, Jianfeng Zhu, Yong Wang, Gang Shan, Jianfeng Pan, Shouyi Yin, Shaojun Wei: Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor. Hot Chips Symposium 2019: 1-25
    [c]    Kai Lu, Zhaoshi Li, Leibo Liu, Jiawei Wang, Shouyi Yin, Shaojun Wei: ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms. ICCAD 2019: 1-8
    [c]    Hang Yuan, Wei Guo, Chip-Hong Chang, Yuan Cao, Shaojun Wei, Shouyi Yin, Chenchen Deng, Leibo Liu, Wei Ge, Fan Zhang: A Reliable Physical Unclonable Function Based on Differential Charging Capacitors. ISCAS 2019: 1-5
    [c]    Jun Yang, Yuyao Kong, Zhen Wang, Yan Liu, Bo Wang, Shouyi Yin, Longxin Shi: Sandwich-RAM: An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation. ISSCC 2019: 394-396
    [c]    Feng Xiong, Fengbin Tu, Shouyi Yin, Shaojun Wei: Towards Efficient Compact Network Training on Edge-Devices. ISVLSI 2019: 61-67
    [c]    Zhaoshi Li, Leibo Liu, Yangdong Deng, Jiawei Wang, Zhiwei Liu, Shouyi Yin, Shaojun Wei: FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory. MICRO 2019: 911-923
    [c]    Jianxun Yang, Leibo Liu, Jin Zhang, Shaojun Wei, Shouyi Yin: An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks. NANOARCH 2019: 1-6
    [c]    Weiwei Wu, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei: MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions. NEWCAS 2019: 1-4
    [c]   Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, Peng Ouyang, Win-San Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, Shouyi Yin: A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS. VLSI Circuits 2019: 120-
    [i]     Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei: Small-footprint Keyword Spotting with Graph Convolutional Network. CoRR abs/1912.05124 (2019)
    [i]     Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan Song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin: THUEE system description for NIST 2019 SRE CTS Challenge. CoRR abs/1912.11585 (2019)

  • 2018

    [j] Shouyi Yin, Tianyi Lu, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei: Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis. IEEE Access 6: 7526-7540 (2018)
    [j] Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, Shaojun Wei: Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution. IEEE Comput. Archit. Lett. 17(2): 147-150 (2018)
    [j] Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, Shaojun Wei: FP-BNN: Binarized neural network on FPGA. Neurocomputing 275: 1072-1086 (2018)
    [j] Ruofei Hu, Binren Tian, Shouyi Yin, Shaojun Wei: Optimization of Softmax Layer in Deep Neural Network Using Integral Stochastic Computation. J. Low Power Electron. 14(4): 475-480 (2018)
    [j] Shouyi Yin, Peng Ouyang, Shibin Tang, Fengbin Tu, Xiudong Li, Shixuan Zheng, Tianyi Lu, Jiangyuan Gu, Leibo Liu, Shaojun Wei: A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications. J. Solid-State Circuits 53(4): 968-982 (2018)
    [j] Shouyi Yin, Zhicong Xie, Chenyue Meng, Peng Ouyang, Leibo Liu, Shaojun Wei: Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays. IEEE Trans. on CAD of Integrated Circuits and Systems 37(2): 431-444 (2018)
    [j] Leibo Liu, Zhuoquan Zhou, Shaojun Wei, Min Zhu, Shouyi Yin, Shengyang Mao: DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures. IEEE Trans. on CAD of Integrated Circuits and Systems 37(4): 782-795 (2018)
    [j] Leibo Liu, Chen Yang, Shouyi Yin, Shaojun Wei: CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance. IEEE Trans. on CAD of Integrated Circuits and Systems 37(6): 1171-1184 (2018)
    [j] Jiale Yan, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei: GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration. IEEE Trans. on CAD of Integrated Circuits and Systems 37(11): 2519-2529 (2018)
    [j] Leibo Liu, Bo Wang, Chenchen Deng, Min Zhu, Shouyi Yin, Shaojun Wei: Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms. IEEE Trans. on CAD of Integrated Circuits and Systems 37(12): 3081-3094 (2018)
    [j] Leibo Liu, Zhaoshi Li, Chen Yang, Chenchen Deng, Shouyi Yin, Shaojun Wei: HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing. IEEE Trans. Circuits Syst. II Express Briefs 65-II(3): 381-385 (2018)
    [j] Guiqiang Peng, Leibo Liu, Sheng Zhou, Shouyi Yin, Shaojun Wei: A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS. IEEE Trans. Circuits Syst. I Regul. Pap. 65-I(5): 1717-1730 (2018)
    [j] Peng Ouyang, Shouyi Yin, Leibo Liu, Youguang Zhang, Weisheng Zhao, Shaojun Wei: A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT. IEEE Trans. Circuits Syst. I Regul. Pap. 65-I(10): 3362-3375 (2018)
    [j] Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei: Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration. IEEE Trans. Parallel Distrib. Syst. 29(9): 2105-2120 (2018)
    [j] Yanan Lu, Leibo Liu, Yangdong Deng, Jian Weng, Shouyi Yin, Yiyu Shi, Shaojun Wei: Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays. IEEE Trans. Parallel Distrib. Syst. 29(10): 2360-2372 (2018)
    [j] Guiqiang Peng, Leibo Liu, Sheng Zhou, Yang Xue, Shouyi Yin, Shaojun Wei: Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems. IEEE Trans. Signal Process. 66(7): 1860-1875 (2018)
    [j] Shouyi Yin, Tianyi Lu, Zhicong Xie, Leibo Liu, Shaojun Wei: Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM. IEEE Trans. Very Large Scale Integr. Syst. 26(11): 2345-2357 (2018)
    [c] Hang Wang, Hongbin Sun, Xuchong Zhang, Qiubo Chen, Pengju Ren, Xiaogang Wu, Shouyi Yin, Zhiqiang Jiang, Xiang Li, Daqiang Han, Shiquan Yu, Shaojun Wei, Nanning Zheng: A 4K×2K@60fps Multi-format Multi-function Display Processor for High Perceptual Quality. APCCAS 2018: 427-430
    [c] Guiqiang Peng, Leibo Liu, Qiushi Wei, Yao Wang, Shouyi Yin, Shaojun Wei: A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems. A-SSCC 2018: 191-194
    [c] Hang Yuan, Leibo Liu, Hui Li, Shouyi Yin, Shaojun Wei: A Full Multicast Reconfigurable Non-blocking Permutation Network. CyberC 2018
    [c] Xinhan Lin, Shouyi Yin, Fengbin Tu, Leibo Liu, Xiangyu Li, Shaojun Wei: LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA. DAC 2018: 16:1-16:6
    [c]  Shixuan Zheng, Yonggang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei: An efficient kernel transformation architecture for binary- and ternary-weight neural network inference. DAC 2018: 137:1-137:6
    [c] Ruofei Hu, Binren Tian, Shouyi Yin, Shaojun Wei: Efficient Hardware Architecture of Softmax Layer in Deep Neural Network. DSL 2018: 1-5
    [c] Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei: RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM. ISCA 2018: 340-352
    [c] Jianxin Guo, Shouyi Yin, Peng Ouyang, Fengbin Tu, Shibin Tang, Leibo Liu, Shaojun Wei: Bit-width Adaptive Accelerator Design for Convolution Neural Network. ISCAS 2018: 1-5
    [c] Zhihui Wang, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei: An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing. ISCAS 2018: 1-5
    [c] Shouyi Yin, Peng Ouyang, Jianxun Yang, Tianyi Lu, Xiudong Li, Leibo Liu, Shaojun Wei: An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS. VLSI Circuits 2018: 37-38
    [c] Shouyi Yin, Peng Ouyang, Shixuan Zheng, Dandan Song, Xiudong Li, Leibo Liu, Shaojun Wei: A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS. VLSI Circuits 2018: 139-140 (2018)