Parallel applications are highly irregular and high performance computing (HPC) infrastructures are very complex. The HPC applications of interest herein are timestepping scientific applications (TSSA). Often, TSSA involve the repeated execution of multiple parallel loops with thousands of iterations and irregular behavior. Dynamic loop scheduling (DLS) techniques were developed over time and have proven to be effective in scheduling parallel loops for achieving load balancing of TSSA. Using a single particular DLS technique throughout the entire execution of a time-step, or even over the entire application, does not guarantee optimal performance due to the unpredictable variations in problem and algorithmic characteristics as well as those of the infrastructure capabilities. For that reason, an autonomic selection of DLS techniques as function of the parallel loop execution time has shown to improve application performance. Recently, a robustness metric of DLS techniques, named "flexibility", has been proposed to estimate the capability of a DLS technique to resist to variations in the loop iterations execution time. To improve the performance of TSSA, we propose in this work an approach that involves the autonomic selection of DLS techniques as function of the flexibility of DLS techniques. The first major novelty of our approach lies in the use of state-of-the-art reinforcement learning (RL) algorithms as smart agents. The second novelty lies in the design of a modified flexibility metric. The third major novelty resides in using the new modified flexibility metric as a reward for the smart agents. The fourth novelty is the evaluation of the proposed approach within a simulated environment, in particular using the SimGrid-SMPI interface to execute DLS algorithms. We discuss the advantages and the limitations of the new proposed flexibility metric as a reward.