The problem of radio resource scheduling subject to fairness satisfaction is very challenging even in future radio access networks. Standard fairness criteria aim to find the best trade-off between overall throughput maximization and user fairness satisfaction under various types of network conditions. However, at the Radio Resource Management (RRM) level, the existing schedulers are rather static being unable to react according to the momentary networking conditions so that the user fairness measure is maximized all time. This paper proposes a dynamic scheduler framework able to parameterize the proportional fair scheduling rule at each Transmission Time Interval (TTI) to improve the user fairness. To deal with the framework complexity, the parameterization decisions are approximated by using the neural networks as non-linear functions. The actor-critic Reinforcement Learning (RL) algorithm is used to learn the best set of non-linear functions that approximate the best fairness parameters to be applied in each momentary state. Simulations results reveal that the proposed framework outperforms the existing fairness adaptation techniques as well as other types of RL-based schedulers.