Statistical Efficiency

A practitioner is nowadays faced with an entire zoo of different data-driven optimization methods all of which are touted to work well in practice. This begs the question which – if any – method should be preferred and under what statistical conditions. In my research I investigate which data-driven optimization methods are efficient and do more with the given data than other methods. Despite the folklore believe that an era of unlimited data is upon us, many real-world decisions in fact critically depend on relatively few relevant data points. A case in point would be personalized health care where medical records are hard to obtain and the fraction of data relevant to a specific medical issue is often very small. Data famously has been claimed as the oil of the 21st century and I believe that just like oil it ought to be treated as a scarce resource and consequently used efficiently.

Efficient Prescription

I was able to show that certain robust data-driven optimization formulations as well as certain regularization techniques are indeed statistically efficient under certain common statistical assumptions on the data. These results were all established for data-driven prediction and prescription problems in which all data is given up front.

Related publications

Efficient Bandits

Sometimes decisions have to be made repeatedly over time instead of only once. When trying to determine which drug to prescribe for treatment of a novel decease, for instance, treatment efficacy data is only available after treatments have been prescribed. In such dynamic settings, a decision made now, and its associated response, will serve as an additional data point later on. When making repeated decisions, exploration (learning) and exploitation (decision-making) can no longer be done sequentially but rather must be carefully balanced against each other. Some methods make this trade-off better than others in particular when structural information is present. For instance, a drug developer could assume that the efficacy of drugs based on chemically related active components are likely similar. Classical reinforcement learning methods such as Thomson sampling or UCB do not treat the collected data efficiently and may end up wasting most collected data even in the presence of such simple structural information. I demonstrate that certain data-driven optimization formulations can, again surprisingly, also balance such exploration and exploitation optimally even in the presence of generic convex structural information.

Related publications

>> Home