C Eisenach, U Ghai, D Madeka, K Torkkola, D Foster, S Kakade, Neural Coordination and Capacity Control for Inventory Management, 2024, arXiv preprint arXiv:2410.02817. (SCOT, Amazon)
This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by the questions of (1) what does it mean to backtest a capacity control mechanism, (2) can we devise and backtest a capacity control mechanism that is compatible with recent advances in deep reinforcement learning for inventory management? First, because we only have a single historic sample path of Amazon’s capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios. This novel approach allows for more robust and realistic testing of inventory management strategies. Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. (2022) to capacitated periodic review inventory control problems and show that certain capacitated control problems are no harder than supervised learning. Third, we introduce a ‘neural coordinator’, designed to produce forecasts of capacity prices, guiding the system to adhere to target constraints in place of a traditional model predictive controller. Finally, we apply a modified DirectBackprop algorithm for learning a deep RL buying policy and a training the neural coordinator. Our methodology is evaluated through large-scale backtests, demonstrating RL buying policies with a neural coordinator outperforms classic baselines both in terms of cumulative discounted reward and capacity adherence (we see improvements of up to 50% in some cases).
Capacity Curve Sampling
Because we only observe one sample historically of Amazon’s capacity in any given marketplace, if we backtest against a capacity control mechanism against this single capacity curve, we would not be able to obtain any sort of generalization guarantee. For example, in a world where Amazon did not offer one day shipping, the network capacity would have looked different through time. In order for a capacity control mechanism to be useful in practice, we need to ensure that in many scenarios, the coordination mechanism will properly constrain. Intuitively, what we would like to backtest against is a large set of capacity curves that (in different worlds) Amazon might have had instead.
One salient property of real-world capacity curves is that they tend to have discontinuities as capacity comes online or goes offline. In order to sample paths from some space, we must first decide on a choice of function space and a basis on that space. Donoho (1993) showed that wavelet bases are optimal for representing functions that have arbitrary discontinuities. Figure 2 shows examples of generated constraint paths.
沒有留言:
張貼留言