With AI applications being more intertwined in our everyday lives, a public debate on how these interactions shape our lives takes the stage. This shift towards developing human-centered AI systems plays a vital role in our future. Reliably incorporating human impact into the design of more reliable AI applications raises questions: What does this look like in practice? How do we know if our application behaves reliably? How do we involve different stakeholders and consider different contexts?
Over the past few months, I have been working on my graduation project for the master Industrial Design at the Eindhoven University of Technology together with DEUS. Through a case study, I explored how we, as designers, can use prototypes to help create more responsible AI-based applications. I took a deep dive into the world of incentivized health insurance applications while taking a different approach to AI development. The approach uses generative prototypes to explore the impact of a system in context while keeping the stakeholders in the loop. The results of the project showcase how prototyping is a valuable tool to define what responsibility means to us through ethical value finding in a situated manner. The personalized health insurance field is an ethically loaded context, which creates room to explore both the designer’s and stakeholders’ values effectively. This post will highlight insights and reflections on the iterative design process.
This project takes an iterative approach to designing AI-based applications. The value of iteration makes sense when you consider the interactivity that AI brings to the table. AI systems have the potential to be highly interactive. They can adapt based on their interaction with humans, and in turn, humans adapt based on their interaction, creating a feedback loop of changing interactions. Considering responsible AI design, one problem is that higher interactivity makes it challenging to define stable reliability and intentions, especially at the beginning of the design process. So, how can we deal with this interactivity in practice?
The approach taken throughout this project is based on the theoretical work crafted by our responsible AI lead, Niya Stoimenova. The adaptive solutions framework describes how to create more reliable AI applications; we should use prototypes to research the impact of human-AI systems in practice. The vision is that the iterative prototyping process generates knowledge that helps us to define the AI system requirements and boundaries. The prototype embodies our knowledge at that moment and acts as a connecting factor between the user and the AI system. This allows us to reflect in a situated manner on how users behave and interact with a system and how we want the system to act.
When starting to design an AI system, the adaptive solutions framework indicates we should start with defining the purpose of our application, describing what the application is trying to achieve, and determining what “behaves as intended” means to us. Before defining my purpose, I looked at existing personalized and incentive-based (health) insurance applications. Generally, in adjacent applications, users are encouraged by incentives to improve their behavior and live healthier lives. The applications often used challenges or scores to evaluate behavior. I choose the purpose of our system in line with these applications, establishing the purpose: to promote the insured to participate in mental and physical self-care using incentives.
The next step in the adaptive solutions framework is to define a frame — an assumption of what the system could do. This was my starting point for the first prototype. When I started defining the frame, it became clear that I knew very little about how users feel about sharing data with insurance companies. So, I started with a provocative prototype aiming to elicit this information. The frame became: if the system provided a provocative health warning, users would be evoked to reflect on their data sharing and personalization needs. The first prototype embodies this frame.
The prototype was used and discussed in a user session of four participants, which produced multiple values and insights. The most interesting insight gained from deploying this prototype was the observed actuation that what the participants voiced did not align with their actions. They want to incorporate their goals. However, when watching them choose a challenge, they mainly considered the feasibility and points gained. This goes against our intended purpose. Reflecting on this behavior elicited the value, to me as a designer, that the system should be kind and supportive towards personal circumstances.
With this insight that the system must be supportive and kind, we continue to the second iteration. The frame of the second prototype shifted towards another use of AI. The implementation draws more inspiration from similar applications for car insurance, where they provide users with a weekly score. In the second iteration, users gain a weekly score to personalize the challenges. Users were allowed to adapt the challenges to see if they would feel more supported to improve their actual behavior. To achieve personalization, the prototype uses the participant’s smartwatch data.
In one case, the smartwatch data of a participant showed no sleep hours, likely because they did not wear their watch in bed. But, this assumption was purposefully ignored to see how the participant would react to the recommendations of the fictional system. Interestingly the participant was still heavily focused on following the recommendations. The participant responded to the system’s suggestion: “[Sleep: try to go to bed seven earlier than your current average bedtime for four days in a row — 250 points] — well, that sounds pretty good.” The recommendation was quite ridiculous, but the participant went along and seemed easily convinced. This elicited the value that users should be given more autonomy to decide what they personally value. In addition, it raised concerns on whether the short-term evaluation provided a realistic enough context to gather results. The following prototype aimed to move toward a more functional (feeling) system.
The frame for the third prototype became: if the system uses AI to provide daily health points and personalized tips, users will feel more autonomy in changing their behavior. Moving away from the previous Figma mock-ups, this prototype was also built to be used by participants for three days in a row. Participants uploaded screenshots of their smartwatch application to the application, which were evaluated in a Wizard-of-Oz manner for points. A progressive web application (PWA) connected to a database allowed me to easily update their points and scores remotely while keeping a realistic application for the user.
Overall, users seemed to value the design shift in the system and sensed the autonomy in usage. But interestingly, we noted that participants wanted more guidance on how to improve, both in terms of personalization and timing. The higher fidelity itself was valuable for gathering new insights. For example, one participant mentioned missing out on points because his watch went into battery-saving mode. I evaluated this as unfair, and so, as a new requirement, the system should take its share of responsibility in collecting the correct data from the user. Applications like these could aid the user by notifying them when their behavior is out of the ordinary or providing them timed tips, which could help them improve their behavior and receive the deserved incentives.
I went over some highlights to showcase the iterative conceptual design process. The prototypes aided in creating a list of requirements and showcased a shift in design throughout the iterations. The iterative and reflective design process provides insights on designing a more supportive AI system that analyzes behavior combined with incentives. The requirements were that the system should be real-time, automatic, provide guidance and share responsibility. The shift between the iterations can be summarized as an argument to move away from challenge-based applications toward more continuous reward-based evaluations.
Reflecting on the process, it should be noted how the project also showcased to us the emotional side of designing reliable AI systems. For example, what do you do as a designer when you want to be kind, but this requires more data collection. How do you deal with such value conflicts? We want to consider such questions for further development of our AI-based applications.
I believe that in the future, the role of human feedback and values will become more central to the development of AI systems. I aimed to showcase how stakeholder involvement through prototyping can help to elicit values and impacts on humans. Within the DEUS reliable AI team, we are working on projects that apply this vision in practice, like in the development of our new AI algorithm monitoring tool. Feel free to reach out if you want to know more about my graduation project or if you are curious about what our Reliable AI team can do for you!