Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides
在线平台中的数据驱动动态分类:学习双边信息
Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan
AI总结 针对双边服务平台,提出一种数据驱动算法,在未知顾客和卖家选择参数的情况下动态优化商品分类,并证明其遗憾值随时间呈多对数增长且达到最优速率。
详情
我们研究了一个在离散时间环境下,具有不完全信息和异质顾客的双边服务平台上的动态分类问题。在每个周期,一位顾客到达寻求服务,平台选择一组卖家进行展示。顾客根据多项逻辑选择模型,最多向分类中的一个卖家提出交易。经过固定数量的周期后,卖家审查收到的提议,并根据另一个多项逻辑选择模型,每位卖家最多选择一个顾客,然后循环重复。一个关键挑战是平台事先不知道顾客或卖家的选择模型参数。据我们所知,这是首次研究双边选择参数均未知的动态分类问题。我们开发了一种数据驱动算法,该算法在优化平台目标的同时学习这些参数。我们使用遗憾值来评估性能,该遗憾值衡量相对于一个预知所有参数和顾客到达时间的先知基准的收入损失。我们证明该算法的最坏情况遗憾值随时间呈多对数增长,并推导出匹配的下界,从而确定其速率最优性。
We study a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers in a discrete-time setting. In each period, a customer arrives seeking service, and the platform chooses an assortment of sellers to display. The customer then proposes a transaction to at most one seller in the assortment according to a multinomial logit choice model. After a fixed number of periods, sellers review the proposals they have received and each chooses at most one customer according to another multinomial logit choice model, after which the cycle repeats. A key challenge is that the platform does not know the choice-model parameters of either customers or sellers in advance. To our knowledge, this is the first study of a dynamic assortment problem in which both sides' choice parameters are unknown. We develop a data-driven algorithm that learns these parameters while optimizing the platform's objective over time. We evaluate performance using regret, which measures revenue loss relative to a clairvoyant benchmark that knows all parameters and customer arrivals in advance. We show that the algorithm's worst-case regret grows polylogarithmically over time, and we derive a matching lower bound, establishing its rate optimality.