An intelligent non-optimality self-recovery method based on reinforcement learning with small data in big data era