顯示具有 醫療 標籤的文章。 顯示所有文章
顯示具有 醫療 標籤的文章。 顯示所有文章

7/13/2025

Statistical Modeling: The Two Cultures

Cynthia Rudin, Leo Breiman, the Rashomon Effect, and the Occam Dilemma, arXiv:2507.03884, 2025.

In the famous “Two Cultures” paper, Leo Breiman provided a visionary perspective on the cultures of “data models” (modeling with consideration of data generation) versus “algorithmic models” (vanilla machine learning models). I provide a modern perspective on these two approaches. One of Breiman’s key arguments against data models is what he called the “Rashomon Effect,” which is the existence of many different-but-equally-good models. The Rashomon Effect implies that data modelers would not be able to determine which model generated the data. Conversely, one of his core advantages in favor of data models is simplicity, as he claimed there exists an “Occam Dilemma,” i.e., an accuracy-simplicity tradeoff, where algorithmic models must be complex in order to be accurate. After 25 years of more powerful computers, it has become clear that this claim is not generally true, in that algorithmic models do not need to be complex to be accurate; however, there are nuances that help explain Breiman’s logic, specifically, that by “simple,” he appears to consider only linear models or unoptimized decision trees. Interestingly, the Rashomon Effect is a key tool in proving the nullification of the Occam Dilemma. To his credit though, Breiman did not have the benefit of modern computers, with which my observations are much easier to make.

3/15/2023

2022 Franz Edelman Award

 2022 Edelman Competition (videos)

Leonardo J. Basso et al., Analytics Saves Lives During the COVID-19 Crisis in Chile, INFORMS Journal on Applied Analytics, 2023, 53(1):9-31. (2022 Franz Edelman Award) (statistical analysis, integer programming, regression)

During the COVID-19 crisis, the Chilean Ministry of Health and the Ministry of Sciences, Technology, Knowledge and Innovation partnered with the Instituto Sistemas Complejos de Ingeniería (ISCI) and the telecommunications company ENTEL, to develop innovative methodologies and tools that placed operations research (OR) and analytics at the forefront of the battle against the pandemic. These innovations have been used in key decision aspects that helped shape a comprehensive strategy against the virus, including tools that (1) provided data on the actual effects of lockdowns in different municipalities and over time; (2) helped allocate limited intensive care unit (ICU) capacity; (3) significantly increased the testing capacity and provided on-the-ground strategies for active screening of asymptomatic cases; and (4) implemented a nationwide serology surveillance program that significantly influenced Chile’s decisions regarding vaccine booster doses and that also provided information of global relevance. Significant challenges during the execution of the project included the coordination of large teams of engineers, data scientists, and healthcare professionals in the field; the effective communication of information to the population; and the handling and use of sensitive data. The initiatives generated significant press coverage and, by providing scientific evidence supporting the decision making behind the Chilean strategy to address the pandemic, they helped provide transparency and objectivity to decision makers and the general population. According to highly conservative estimates, the number of lives saved by all the initiatives combined is close to 3,000, equivalent to more than 5% of the total death toll in Chile associated with the pandemic until January 2022. The saved resources associated with testing, ICU beds, and working days amount to more than 300 million USD.

2/03/2023

Multimodal artificial intelligence

Jessica Leung, Omega Rho Keynote: Artificial Intelligence and the Future of Universities, ORMS Today, 2022.

Léonard Boussioux, Cynthia Zeng, Théo Guénais, and Dimitris Bertsimas, Hurricane Forecasting: A Novel Multimodal Machine Learning Framework, Weather and Forecasting, March 2022, 37(6), pp. 817–831.

Soenksen, L.R., Ma, Y., Zeng, C. et al. Integrated multimodal artificial intelligence framework for healthcare applications. Nature Machine Intelligence 5, 149 (2022). https://doi.org/10.1038/s41746-022-00689-4. (Data and Code)

1/26/2023

Bridging physics-based and data-driven modeling for COVID-19 forecasting

Rui Wang, Danielle Robinson, Christos Faloutsos, Yuyang Wang, and Rose Yu, AutoODE: Bridging physics-based and data-driven modeling for COVID-19 forecasting, NeurIPS 2020 Workshop on Machine Learning in Public Health. (best paper award at the NeurIPS Machine Learning in Public Health Workshop)

As COVID-19 continues to spread, accurately forecasting the number of newly infected, removed and death cases has become a crucial task in public health. While mechanics compartment models are widely-used in epidemic modeling, data-driven models are emerging for disease forecasting. In this work, we investigate these two types of methods for COVID-19 forecasting. Through a comprehensive study, we find that data-driven models outperform physics-based models on the number of death cases prediction. Meanwhile, physics-based models have superior performances in predicting the number of infected and removed cases. In addition, we present an hybrid approach, AutoODE, that obtains a 57.4% reduction in mean absolute errors of the 7-day ahead COVID-19 trajectories prediction compared with the best deep learning competitor.

1/15/2023

10 Breakthrough Technologies 2023

 David Rotman, 10 Breakthrough Technologies 2023, MIT Technology Review, January 9, 2023.

Our annual look at 10 Breakthrough Technologies—including CRISPR for high cholesterol, battery recycling, AI that makes images, and the James Webb Space Telescope—that will have a profound effect on our lives. Plus care robots, 3-D printing pioneers, and chasing bugs on the blockchain.

10/27/2022

Stroke risk is not linear

Orfanoudaki A, Chesley E, Cadisch C, Stein B, Nouh A, Alberts MJ, et al. (2020) Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score. PLoS ONE 15(5): e0232414. https://doi.org/10.1371/journal.pone.0232414

Current stroke risk assessment tools presume the impact of risk factors is linear and cumulative. However, both novel risk factors and their interplay influencing stroke incidence are difficult to reveal using traditional additive models. The goal of this study was to improve upon the established Revised Framingham Stroke Risk Score and design an interactive Non-Linear Stroke Risk Score. Leveraging machine learning algorithms, our work aimed at increasing the accuracy of event prediction and uncovering new relationships in an interpretable fashion. A two-phase approach was used to create our stroke risk prediction score. First, clinical examinations of the Framingham offspring cohort were utilized as the training dataset for the predictive model. Optimal Classification Trees were used to develop a tree-based model to predict 10-year risk of stroke. Unlike classical methods, this algorithm adaptively changes the splits on the independent variables, introducing non-linear interactions among them. Second, the model was validated with a multi-ethnicity cohort from the Boston Medical Center. Our stroke risk score suggests a key dichotomy between patients with history of cardiovascular disease and the rest of the population. While it agrees with known findings, it also identified 23 unique stroke risk profiles and highlighted new non-linear relationships; such as the role of T-wave abnormality on electrocardiography and hematocrit levels in a patient’s risk profile. Our results suggested that the non-linear approach significantly improves upon the baseline in the c-statistic (training 87.43% (CI 0.85–0.90) vs. 73.74% (CI 0.70–0.76); validation 75.29% (CI 0.74–0.76) vs 65.93% (CI 0.64–0.67), even in multi-ethnicity populations. The clinical implications of the new risk score include prioritization of risk factor modification and personalized care at the patient level with improved targeting of interventions for stroke prevention.

5/17/2022

Integration of Face-to-Face Screening With Real-time Machine Learning to Predict Risk of Suicide Among Adults

Drew Wilimitis, Robert W. Turer, Michael Ripperger, et al., Integration of Face-to-Face Screening With Real-time Machine Learning to Predict Risk of Suicide Among AdultsJAMA Netw Open. 2022; 5(5):e2212095. doi:10.1001/jamanetworkopen.2022.12095.
In this cohort study of 120 398 adult patient encounters, an ensemble learning approach combined suicide risk predictions from the Columbia Suicide Severity Rating Scale and a real-time machine learning model. Combined models outperformed either model alone for risks of suicide attempt and suicidal ideation across a variety of time periods.

4/11/2022

The Clinician and Dataset Shift in Artificial Intelligence

Samuel G. Finlayson et al., The Clinician and Dataset Shift in Artificial Intelligence, New England Journal of Medicine, 2021; 385:283-286.

A major driver of AI system malfunction is known as “dataset shift.” Most clinical AI systems today use machine learning, algorithms that leverage statistical methods to learn key patterns from clinical data. Dataset shift occurs when a machine-learning system underperforms because of a mismatch between the data set with which it was developed and the data on which it is deployed. For example, the University of Michigan Hospital implemented the widely used sepsis-alerting model developed by Epic Systems; in April 2020, the model had to be deactivated because of spurious alerting owing to changes in patients’ demographic characteristics associated with the coronavirus disease 2019 pandemic. This was a case in which dataset shift fundamentally altered the relationship between fevers and bacterial sepsis, leading the hospital’s clinical AI governing committee (which one of the authors of this letter chairs) to decommission its use. This is an extreme example; many causes of dataset shift are more subtle. In Table 1, we present common causes of dataset shift, which we group into changes in technology (e.g., software vendors), changes in population and setting (e.g., new demographics), and changes in behavior (e.g., new reimbursement incentives); the list is not meant to be exhaustive.

Deb Raji, There’s more to data than distributionsMar 31, 2022. 

Jose G. Moreno-Torres et al., A unifying view on dataset shift in classification, Pattern Recognition, Volume 45, Issue 1, January 2012, Pages 521-530.

4/09/2022

Efficient and targeted COVID-19 border testing via reinforcement learning

Bastani, H., Drakopoulos, K., Gupta, V. et al. Efficient and targeted COVID-19 border testing via reinforcement learning. Nature 599, 108–113 (2021). https://doi.org/10.1038/s41586-021-04014-z (EVA Public Dataset, Off-Policy and Counterfactual Analysis, Open-Source code for Project Eva)

Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a variety of ad hoc border control protocols to allow for non-essential travel while safeguarding public health, from quarantining all travellers to restricting entry from select nations on the basis of population-level epidemiological metrics such as cases, deaths or testing positivity rates. Here we report the design and performance of a reinforcement learning system, nicknamed Eva. In the summer of 2020, Eva was deployed across all Greek borders to limit the influx of asymptomatic travellers infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and to inform border policies through real-time estimates of COVID-19 prevalence. In contrast to country-wide protocols, Eva allocated Greece’s limited testing resources on the basis of incoming travellers’ demographic information and testing results from previous travellers. By comparing Eva’s performance against modelled counterfactual scenarios, we show that Eva identified 1.85 times as many asymptomatic, infected travellers as random surveillance testing, with up to 2–4 times as many during peak travel, and 1.25–1.45 times as many asymptomatic, infected travellers as testing policies that utilize only epidemiological metrics. We demonstrate that this latter benefit arises, at least partially, because population-level epidemiological metrics had limited predictive value for the actual prevalence of SARS-CoV-2 among asymptomatic travellers and exhibited strong country-specific idiosyncrasies in the summer of 2020. Our results raise serious concerns on the effectiveness of country-agnostic internationally proposed border control policies3 that are based on population-level epidemiological metrics. Instead, our work represents a successful example of the potential of reinforcement learning and real-time data for safeguarding public health.

3/10/2022

from lab bench to public office and back

Chen Chien-jen, Taiwan’s pandemic vice-president — from lab bench to public office and back, Nature 603, 203 (2022)

Two years on from the World Health Organization’s official declaration of the pandemic, I’ve been thinking about lessons I’ve learnt toggling between science and public service. I think all researchers — from bench scientists to physicists to computational social scientists — might find this exercise useful. Government advisers, too.

10/18/2021

Minimum-Distortion Embedding

Akshay Agrawal, Alnur Ali and Stephen Boyd (2021), "Minimum-Distortion Embedding", Foundations and Trends® in Machine Learning: Vol. 14: No. 3, pp 211-378. http://dx.doi.org/10.1561/2200000090. 

We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., have zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem.

This monograph is accompanied by an open-source Python package, PyMDE, for approximately solving MDE problems. Users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Because our algorithm is scalable, and because PyMDE can exploit GPUs, our software scales to data sets with millions of items and tens of millions of distortion functions. Additionally, PyMDE is competitive in runtime with specialized implementations of specific embedding methods. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes.

9/25/2021

疫情中從海外看台灣

孟買春秋,疫情中從海外看台灣,思想坦克,2021 年 9 月 24 日

經過兩年,我和丈夫終於回到普羅旺斯的家,離開台北時對台灣之外的疫情世界忐忑不安,畢竟過去一年多台灣彷彿世外桃源,不知疫情為何物。然而抵達南法十多天之後,緊張的心情似乎已經消失了,取而代之的是令我無比驕傲的台灣人身分。

9/11/2021

裕利讓疫苗物流履歷可全程追溯

余至浩,善用IT克服冷鏈運輸大挑戰,裕利讓疫苗物流履歷可全程追溯,iThome,2021-07-16

溫度控制

COVID-19疫苗配送的過程中,費而隱指出,運輸端是最大挑戰。他解釋,疫苗還在倉儲冷藏庫或冷凍庫內時,對於溫度監控相較容易許多,但只要出了物流中心,疫苗的溫度就會一直變化,很難保持恆溫,例如司機開關車門或打開保冷箱,它的溫度就會產生波動,「所以車上溫度控制要非常小心。」費而隱強調。

7/17/2021

張忠謀在 APEC 非正式領袖會議的致詞

李玟儀

This Informal Retreat has been called to discuss how Asia-Pacific can collaborate to move through the COVID health crisis, and to accelerate the post-COVID economic recovery. Chinese Taipei will address these two topics specifically.

7/14/2021

從AI到智慧醫療

蔣榮先從AI到智慧醫療商周2020

本書作者蔣榮先教授為台灣資訊傑出人才,任教於成大醫學院、擔任成大醫院資訊長,從事醫療科技的尖端研究,並熟悉產業最新發展,深諳跨領域合作的必要性。

他以深入淺出的解說,配合大量生活應用實例、資訊圖表,描繪即將到來的AI智慧醫療新世界。值得醫界、產業界、藥界、醫材界的學者、專家參考,同時適合對健康醫療科技新知有興趣的大眾閱讀。