Reinforcement Learning for QAS