Scientists from MIT and IBM Research have developed a tool called saliency cards to assist users in selecting the most appropriate saliency method for their specific task when deploying machine-learning models. Saliency methods aim to explain how these complex models make predictions, as even the scientists who design them may not fully understand their inner workings. Saliency cards offer standardized documentation of each method’s operations, strengths, weaknesses, and explanations to aid users in correctly interpreting the model’s behavior. The researchers believe that by providing users with this information, they can make informed choices and gain a more accurate understanding of their model’s predictions. The cards allow for quick side-by-side comparisons of different saliency methods, catering to machine-learning researchers as well as lay users. The researchers hope that saliency cards will help users select the appropriate method for their model and task, leading to improved interpretation of the model’s predictions. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.
Picking the right method
The evaluation of saliency methods in terms of faithfulness, which measures how accurately a method reflects a model’s decision-making process, is not straightforward, according to Boggust. A method may perform well on one aspect of faithfulness but fail on another. Consequently, users often rely on popular or recommended methods rather than evaluating them comprehensively.
However, choosing the wrong method can have significant implications. For example, the integrated gradients saliency method compares feature importance in an image to a meaningless baseline. If the baseline is set to all 0s, which corresponds to black in images, the method may incorrectly indicate that black pixels are unimportant, potentially affecting interpretations of X-ray images.
To address these challenges, saliency cards offer a concise summary of a saliency method’s operation using 10 user-focused attributes. These attributes describe how saliency is calculated, the method’s relationship with the model, and how users perceive its outputs.
For instance, one attribute called “hyperparameter dependence” measures the sensitivity of a saliency method to user-specified parameters. The saliency card for integrated gradients would explain its parameters and their impact on performance. By consulting the card, users can quickly identify that the default parameters, such as the all-0s baseline, might yield misleading results when applied to X-ray evaluations.
Saliency cards not only benefit users but also contribute to scientific research by highlighting gaps in the field. The MIT researchers, for instance, were unable to identify a saliency method that was both computationally efficient and applicable to any machine-learning model. This discovery prompts further investigation to determine if such a method can be developed or if there is an inherent conflict between these two requirements.
Showing their cards
Following the creation of multiple saliency cards, the research team conducted a user study involving eight domain experts with diverse backgrounds, ranging from computer scientists to a radiologist who had limited familiarity with machine learning. The participants found the concise descriptions provided by the cards helpful in prioritizing attributes and comparing different saliency methods. Even the radiologist, despite being new to machine learning, could understand the cards and utilize them to participate in the saliency method selection process.
The interviews yielded some surprising insights. While researchers often assume that clinicians prefer sharp methods that focus on specific objects in medical images, the clinician in the study expressed a preference for some noise in the images to aid in reducing uncertainty.
Furthermore, it was revealed that each participant had unique priorities regarding the attributes of saliency methods, even if they held the same role or profession. This finding highlights the importance of considering individual preferences and needs when selecting saliency methods.
Moving forward, the researchers aim to explore the under-evaluated attributes of saliency methods and potentially develop task-specific methods. They also seek to enhance their understanding of how people perceive saliency method outputs, which could lead to improved visualizations. Additionally, they plan to host their work on a public repository to gather feedback that will guide future research and development.
The researchers envision the saliency cards as dynamic documents that evolve with the introduction of new saliency methods and evaluations. They believe that this work serves as a foundation for a broader discussion on the attributes of saliency methods and their implications for different tasks and user requirements.