4/29/2020

A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy

Emma Beede, Elizabeth Baylor, Fred Hersch, Anna Iurchenko, Lauren Wilcox, Paisan Ruamviboonsuk, Laura M. Vardoulakis, A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy, CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, April 2020, Pages 1–12, https://doi.org/10.1145/3313831.3376718.
Referral Determinations
All images were initially assessed by a nurse then sent to an ophthalmologist for review. The ability to assess fundus photos for DR varied from nurse to nurse. While most nurses told us they felt comfortable assessing for the presence of DR, they didn’t know how to determine the severity if present. P4 told us, “I know if it’s not normal, but I don’t know what to call it.” To make the ultimate decision of whether a patient needs to be referred to an ophthalmologist for an exam and potentially for treatment, the nurse turned to the ophthalmologist or retinal specialist, who are most often remote.
Images that appeared normal to the nurses were typically sent via email to the ophthalmologist in batches that include several weeks worth of images. The ophthalmologist then determined whether or not the patient needs to be referred for an exam, and typically sent the results back to the nurses within 1–2 weeks. For images that seemed abnormal, some nurses sent the image to the ophthalmologist via instant messaging in hopes of getting a recommendation for the patient quickly. These recommendations usually took days, but were sometimes returned within hours....
Potential benefits 
Nurses foresaw two potential benefits of having an AI-assisted screening process. The first was using the system as a learning opportunity—improving their ability to make accurate DR assessments themselves.... 
The second benefit was the potential to use the deep learning system’s results to prove their own readings to on-site doctors. Several nurses expressed frustration with their assessments being undervalued or dismissed by physicians, and they were excited about the potential to demonstrate their own expertise to more senior clinicians. As P7 explained, “They don’t believe us.” P11 stated, “It could confirm what we already know.”...
Through observation and interviews with nurses and technicians in parallel with a prospective study to evaluate the deep learning system’s accuracy, we discover several socio-environmental factors that impact model performance, nursing workflows, and patient experience. By conducting human-centered evaluative research prior to, and alongside, prospective evaluations of model accuracy, we were able to understand contextual needs of clinicians and patients prior to widespread deployment, and recommend system and environmental changes that would lead to an improved experience.
Will Douglas, Google’s medical AI was super accurate in a lab. Real life was a different story, MIT Technology Review, April 27, 2020.
When it worked well, the AI did speed things up. But it sometimes failed to give a result at all. Like most image recognition systems, the deep-learning model had been trained on high-quality scans; to ensure accuracy, it was designed to reject images that fell below a certain threshold of quality. With nurses scanning dozens of patients an hour and often taking the photos in poor lighting conditions, more than a fifth of the images were rejected. 
Patients whose images were kicked out of the system were told they would have to visit a specialist at another clinic on another day. If they found it hard to take time off work or did not have a car, this was obviously inconvenient. Nurses felt frustrated, especially when they believed the rejected scans showed no signs of disease and the follow-up appointments were unnecessary. They sometimes wasted time trying to retake or edit an image that the AI had rejected. 
Because the system had to upload images to the cloud for processing, poor internet connections in several clinics also caused delays. “Patients like the instant results, but the internet is slow and patients then complain,” said one nurse. “They’ve been waiting here since 6 a.m., and for the first two hours we could only screen 10 patients.”

沒有留言:

張貼留言