7/13/2025

Statistical Modeling: The Two Cultures

Cynthia Rudin, Leo Breiman, the Rashomon Effect, and the Occam Dilemma, arXiv:2507.03884, 2025.

In the famous “Two Cultures” paper, Leo Breiman provided a visionary perspective on the cultures of “data models” (modeling with consideration of data generation) versus “algorithmic models” (vanilla machine learning models). I provide a modern perspective on these two approaches. One of Breiman’s key arguments against data models is what he called the “Rashomon Effect,” which is the existence of many different-but-equally-good models. The Rashomon Effect implies that data modelers would not be able to determine which model generated the data. Conversely, one of his core advantages in favor of data models is simplicity, as he claimed there exists an “Occam Dilemma,” i.e., an accuracy-simplicity tradeoff, where algorithmic models must be complex in order to be accurate. After 25 years of more powerful computers, it has become clear that this claim is not generally true, in that algorithmic models do not need to be complex to be accurate; however, there are nuances that help explain Breiman’s logic, specifically, that by “simple,” he appears to consider only linear models or unoptimized decision trees. Interestingly, the Rashomon Effect is a key tool in proving the nullification of the Occam Dilemma. To his credit though, Breiman did not have the benefit of modern computers, with which my observations are much easier to make.

7/11/2025

The Batch by A. Ng

weekly summary of AI and more by a. Ng. Free subscription and enjoy reading/Listening like I  do.

  • Large scale systemThe system aggregates data generated by 240 million customers and 2 million store personnel, feeding applications that streamline operations among 100,000 suppliers, 150 distributors, and 10,000 retail venues in 19 countries.
雷鋒網,年終收藏,吳恩達盤點 2020 年度 AI 熱門事件,2020 年 12 月 31 日

7/10/2025

什麼是『大學生』

(25/6/30) 今晚,真的是開眼界。讓我體驗到人生的另一個高點

睡不著,想一下今晚和研究生討論的方法,第 16 章 Sampling Plans 的確是我們需要的。他是『大學生』

(25/7/1) 2:30 am 還沒睡意,來寫部落格。