Michael Fleder and Devavrat D. Shah, I Know What You Bought At Chipotle for $9.81 by Solving A Linear Inverse Problem, Proceedings of the ACM on Measurement and Analysis of Computing Systems, November 2020, Article No. 47.
We consider the question of identifying which set of products are purchased and at what prices in a given transaction by observing only the total amount spent in the transaction, and nothing more. The ability to solve such an inverse problem can lead to refined information about consumer spending by simply observing anonymized credit card transactions data. Indeed, when considered in isolation, it is impossible to identify the products purchased and their prices from a given transaction just based on the transaction total. However, given a large number of transactions, there may be a hope.
Problem Statement
Formally, we have access to 𝑀 separate, but not necessarily distinct, transaction totals 𝑦 associated with a given company. We assume the company has a finite number of 𝑁 products with associated prices 𝑝. We treat both 𝑁 and 𝑝 as unknown: that is, an unknown number of products with unknown prices. Each transaction total corresponds to the summation of prices for the products purchased. In addition, we include an additive noise term 𝜂 to account for price variations due to discounts, promotions, taxes, etc. Therefore, we have 𝑦 = 𝐴𝑝 + 𝜂, where the unknown matrix 𝐴 represents the product decomposition for each transaction. The goal is to identify the number of products 𝑁, their associated prices 𝑝, and the decomposition of transactions 𝐴; all from observation of 𝑦 only.
This problem is a subset-sum problem that is known to be NP-hard. The authors propose a polynomial-time approximate algorithm and prove its performance bound.
Results:
We apply the algorithm to a large corpus of anonymized consumer credit card transactions in the period 2016-2019, with data obtained from a commercial data vendor. The transactions are associated with spending at Apple, Chipotle, Netflix, and Spotify. From just transactions data, our algorithm identifies (i) key price points (without access to the listed prices), (ii) products purchased within a transaction, (iii) product launches, and (iv) evidence of a new ‘secret’ product from Netflix - rumored to be in limited release.
沒有留言:
張貼留言