Thinker-S is an ultra-low power speech recognition processor, which mainly consists of energy-based VAD unit, reconfigurable data-path for feature extraction and on-chip self-learning, BCNN unit, smoothing unit, main controller, and shared memory. Fabricated in 28 nm CMOS, this processor supports real time speech recognition with power consumption of 141 uW and energy efficiency of 2.46 pJ/Neuron, while achieving at most 98.6\% recognition accuracy. There are mainly three features of Thinker-S. Firstly, a configurable architecture optimized for BCNN is proposed in this work. Frame-level reuse, which is a spatial-temporal 2-Dimensional BCNN computation optimization, is leveraged to maximally reuse data, and eliminate the redundancy in convolutional computation. To optimize memory access and storage, we tailor the memory mapping of BCNN, and propose Bit-level Regularization to compress BCNN weight matrices, along with hybrid-bank memory. Secondly, approximate circuits are designed to reach a balance between accuracy and energy consumption, considering the characteristics of BCNN operations. The objects include partially accurate adder in BCNN inference and softmax function in back-propagation. Thirdly, we implement runtime self-learning on this chip. A novel framework is proposed to select training samples and generate labels during runtime. By updating weights according to the input speech of specific users, the recognition word error rate (WER) is reduced relatively by 14.37\%--51.4\%.Comparing with state-of-the-art designs, this processor reduces 42.9\%--91.2\% power consumption, 2.53x lower energy consumption per neuron, and 7.96x lower energy per speech frame.