Artificial Intelligence (AI) and Accessibility Research Symposium 2023

Introduction

Researchers, practitioners, and users with disabilities participated in an international online symposium exploring the positive and negative impacts of artificial intelligence (AI) in digital accessibility.

This online symposium took place on 10 and 11 January 2023 and brought together researchers, academics, industry, government, and people with disabilities, to explore one the most pressing emerging technologies, artificial intelligence. The symposium aimed to identify current challenges and opportunities raised by the increasing use of AI regarding digital accessibility and explore how ongoing research can leverage and hinder digital accessibility.

Opening Keynote: Jutta Treviranus

Jutta Treviranus is the Director of the Inclusive Design Research Centre (IDRC) and professor in the faculty of Design at OCAD University in Toronto. Jutta established the IDRC in 1993 as the nexus of a growing global community that proactively works to ensure that our digitally transformed and globally connected society is designed inclusively. Dr. Treviranus also founded an innovative graduate program in inclusive design at OCAD University. Jutta is credited with developing an inclusive design methodology that has been adopted by large enterprise companies such as Microsoft, as well as public sector organizations internationally. In 2022 Jutta was recognized for her work in AI by Women in AI with the AI for Good - DEI AI Leader of the Year award.

First, Do No Harm

In this symposium, Jutta Treviranus delivers an opening keynote that sheds light on the harms of AI specific to people with disabilities. She addresses ethical concerns surrounding AI, such as lack of representation, human bigotry, manipulative practices, unfair value extraction, exploitation, and disinformation. Jutta emphasizes the significance of considering the impact of AI on people with disabilities, as they are often at the margins of justice-deserving groups, making them more vulnerable to both existing and emerging harms. She discusses the increasing complexity of decision-making processes and the growing appeal and usefulness of AI decision tools. Jutta also highlights the challenges of data diversity, predictive accuracy, data privacy, and the need for transparency in data usage. The discussion expands to include topics such as ethics, bias, and the efforts being made globally to address AI ethics. Jutta concludes by exploring the potential of AI and the opportunity it presents to reassess what we want to automate, what we mean by concepts such as best, optimal, and fairness, and how we can include marginalized individuals in the development and use of AI technologies.

Transcript of Opening Keynote

CARLOS DUARTE: Now let’s move to the opening keynote, for which we’re delighted to welcome Jutta Treviranus. Jutta Treviranus is the director of the Inclusive Design Research Center and the professor in the faculty of design at the OCAD University in Toronto. The floor is yours.

JUTTA TREVIRANUS: Thank you, Carlos. It is a great pleasure to be able to talk to you about this important topic. I am going to just start my slides. I’m hoping that what you see is just the primary slide, correct?

CARLOS DUARTE: Correct.

JUTTA TREVIRANUS: Wonderful. Okay. Thank you, everyone. I will voice my slides and the information and the images. I have titled my talk First, Do No Harm. I’m usually really optimistic of a person, I’m hoping to provide an optimistic message.

To realize the benefits of AI, I believe we need to further recognize and take into account the harms. I’m going to limit my discussion to the harms that are specific to People with Disabilities. There is a great deal of work detailing the ethical concerns of currently deployed AI from lack of representation to human bigotry, finding its way into algorithms to manipulative practices, unfair value extraction and exploitation and disinformation. I’ll focus on accessibility and disability, including the recognition that disability is at the margins of all other justice deserving groups, therefore, most vulnerable to the general and emerging harms, but also the potential opportunities of AI. Carlos shared a number of questions and they’re all great questions. We agreed this is better covered through a conversation than a presentation. At the end of my talk, I’m going to invite Shari and we’ll talk more about this tomorrow after the book talk.

Our society is plagued by more and more difficulties. As the world becomes more and more complex and entangled, the choices increase in ambiguity, the risks associated with each decision becomes more consequential, the factors to consider in each decision more numerous, convoluted, confusing. Especially times of crisis, like we have been experiencing these last few years, in highly competitive situations where there is scarcity, AI decision tools become more and more attractive and useful. As an illustrative example, it is no wonder that over 90% of organizations use some form of AI hiring tool according to the U.S. equal employment opportunity commission. As work becomes less formulated and finding the right fit becomes more difficult, they are a highly seductive tool. As an employer, when choosing who to hire from a huge pool of applicants what, better way to sift through, find the gems and eliminate the potential fail choices than to use AI system, with an AI tool making the decisions we remove the risks of conflicts of interest and nepotism. What better way to determine who will be a successful candidate than to use all of the evidence we have gathered from our current successful employees, especially when the jobs we’re trying to fill are not formulating and there is not a valid test to devise for candidates to determine their suitability, AI can use predictive analytics to find the optimal candidates.

In this way, we’re applying solid, rigorous science in what would be an unscientific decision, otherwise we’re not relying on fallible human intuition. Tools are adding information beyond the application to rule out falsehoods in the applications, after all, you never know, there are so many ways to fake a work history, a cover letter or to cheat in academia. The AI hiring tools can verify through gleaned social media data and information available on the web or through networked employment data. After all, employees have agreed to share this as part of the conditions of employment, and other employers have agreed as a conditions of using the tool. If that is not enough, AI administered and processed assessments can be integrated. The tools are going beyond the practical and qualitatively determinable capacity of candidates to finding the best fit culturally to make sure that the chosen candidates don’t cause friction but integrate comfortably. The tools will even analyze data from interviews to have a socio emotional fit of candidates. If that’s not satisfactory, the employer can tweak the system to add factors like the favored university or an ideal persona and pick an ideal employee as a model and the systems are better and more sophisticated in finding a match. The same system can then guide promotion and termination ensuring consistency of employment policies.

So what’s wrong with this? Science, math, statistical reasoning, efficiency, accuracy, consistency, better and more accurate screening for the best fit of the scientifically determined optimal employee, accurate replication and scaling of a winning formula, it is a very seductive opportunity. What could be wrong? For the employing organization we have a mono culture recreating and showing the successful patterns of the past .with more data and more powerful analysis the intended target becomes more and more precise. The employer finds more and more perfect fits. What’s wrong with that? For the organization, what happens when the context changes? Then the unexpected happens, a monoculture doesn’t offer much adaptation, flexibility, alternative choices.

As a visual description, I have an image showing what happened to cloned potatoes in a plight that was survived by a diverse crop. Of course, we have diversity, equity and inclusion measures to compensate for discriminatory hiring and increase the number of employees from protected, underrepresented groups. Even there, there will be an even greater rift between the mono culture and the candidates hired through diversity and equity programs. What happens to the candidate with the disability who would otherwise be a great fit for doing the job when judged by these hiring systems. When AI is analyzing sorting and filtering data about a large group of people, what does disability look like? Where is disability in a complex entangled adopt active multivarying dataset. Self identification is often disallowed and many people don’t self identify, even if we had a way to identify the definition and boundaries of disability are highly contested. Disability statisticians are acutely aware of some of the challenges. In any normal distribution, someone with a disability is an outlier, the only common data characteristic of disability is different from the average, the norm, People with Disabilities are also more diverse from each other than people without disabilities. Data points in the middle are close together, meaning that they are more alike, data points at the periphery, they’re further apart, meaning that they’re more different from each other. Data regarding people living with disabilities are spread the furthest in what I call the starburst of human needs. As a result of this pattern, any statistically determined prediction is highly accurate for people that cluster in the middle, inaccurate moving from the middle and wrong as you get to the edge of a data plot.

Here I’m not talking about AI’s ability to recognize and translate things that are average or typical, like typical speech or text or from one typical language to another or to label typical objects in the environment or to find the path that most people are taking from one place to another. Even there, in these miraculous tools we’re using, if we have a disability, if the speech is not average, if the environment you’re in is not typical, AI also fails. Disability is the Achilles’ heel of AI applying statistical reasoning in disability. You have the combination of diversity, variability, the unexpected, complexity and entanglement and the exception to every rule or determination. AI systems are used to find applicants that match predetermined optima with large datasets of successful employees and hires. The system is optimizing the successful patterns of the past, all data is from the past. The analytical power tool is honing in on and polishing the factors that worked before and we know how much hiring there is of people with disabilities in the past. The tool is built to be biased against different disabilities, different ways of doing the job, different digital traces, different work and education history, different social media topics and entangled profile of many differences.

As AI gets better or more accurate in its identification of the optima, AI gets more discriminatory and better at eliminating applicants that don’t match the optima in some way. The assumptions the AI power tools are built on, scaling and replicating past success will bring about future success. Optimizing data characteristics associated with past successes increases future successes. The data characteristics that determine success need not be specified or known to the operators of the AI or the people who are subject to the decisions, and the AI cannot articulate at the moment the highly defused, possibly an adaptive reasons behind the choices. Current AI systems cannot really explain themselves or the choices despite the emergence of explainable AI. How many of you have experienced tools like Microsoft and other similar tools that purport to help you be more efficient and productive by analyzing the work habits. The surveillance systems provide more and more granular data about employment providing intelligence about the details of the average optimal employee. The results of the AI design is that the optima will not be a person with a disability. There are not enough successfully employed Persons with Disabilities but it is more than data gaps, even if we have full representation of data from Persons with Disabilities there will not be enough consistent data regarding success to reach probability thresholds. Even if all data gaps are filled, each pattern will still be an outlier or minority, and will lack problematic power in the algorithm. The same pattern is happening in all life altering difficult decisions. AI is being applied and offered to competitive academic admissions departments, so you won’t get admitted to beleaguered health providers in the form of medical calculators and emergency triage tools resulting in more death and illness if you’re different from your classification, to policing, to parole board, to immigration and refugee adjudications, to tax auditor, meaning more tax payer with disabilities are flagged, to loan, mortgage officers, meaning people with unusual asset patterns won’t get credit, to security departments, meaning outliers become collateral damage.

At a community level we have evidence based investment by governments, AI guiding political platforms, public health decision, urban planning, emergency preparedness and security programs. None will decide with the marginalized outlier, the outliers will be marked as security risks. These are monumental life changing decisions, but even the smaller seemingly inconsequential decisions can harm by a million cuts. What gets covered by the new, what products make it to the market, the recommended root route provided by GPS, the priority given to supply chain process, what design features make it to the market.

Statistical reasoning that’s inherently biased against difference from the average is not only used to apply the metrics, but to determine the optimum metrics. This harm predates AI. Statistical reasoning as the means of making decisions does harm. It does harm to anyone not like the statistical average or the statistically determined optima. Assuming that what we know about the majority applies to the minority does harm. Equating truth and valid evidence with singular statistically determined findings or majority truth does harm. AI amplifies, accelerates and automates this harm. It is used to exonerate us of responsibility for this harm.

We have even heard a great deal about the concern for privacy. Well, people with disability, they’re most vulnerable to data abuse and misuse. Deidentification does not work if you’re highly unique, you will be reidentified. Differential privacy will remove the helpful data specifics that you need to make the AI work for you and your unique needs. Most People with Disabilities are actually forced to barter their privacy for essential services. We need to go beyond privacy, assume there will be breaches and create systems to prevent data abuse and misuse. We need to ensure transparency regarding how data is used, by whom and what purpose. It is wonderful that the E.U. is organizing this talk, because the E.U. is doing some wonderful measures in this regard.

Wait, we’re talking about a great number of harms. Haven’t we developed some approaches, some solutions to this? Don’t we have auditing tools that detect and eliminate bias and discrimination of AI? Don’t we have some systems that certify whether an AI is ethical or not? Can’t we test tools for unwanted bias?

Unfortunately, AI auditing tools are misleading in that they don’t detect bias against outliers and small minorities or anyone who doesn’t fit the bounded groupings. Most AI ethics auditing systems use cluster analysis comparing the performance regarding a bounded justice deserving group with the performance for the general population. There is no bounded cluster for disability. Disability means a defused, highly diverse set of differences. Those AI ethic certification systems and the industry that is growing around them raise the expectation of ethical conduct, that the problem has been fixed, making it even more difficult for the individual to address harm. Many falls prey to cobra effects or the unintended consequences of over simplistic solutions to complex problems or linear thinking, falling into the the rut of mono causality, where the causes are complex and entangled.

There is some helpful progress in regulatory guidance, one example, it is the U.S. Equal Employment Opportunity Commission which has developed the The Americans with Disabilities Act and the Use of Software, Algorithms, and Artificial Intelligence to Assess Job Applicants and Employees, it is a very long title. Much of the guidance focuses on fair assessments or tests and accommodation, not on the filtering out of applicants before they’re invited to take an assessment or by employers who don’t use assessments. The data related suggestion is to remove the disability related data that is the basis of disability discrimination. What we found, it is that the data cannot be isolated. For example, in an interrupted work history, it will have other data effects and markers, making it hard to match the optimal pattern, even when that is removed.

For the ethical harms, that are common to a whole group of marginalized and individuals, there are numerous AI ethic efforts emerging globally. We have tried to capture the disability relevant ones in the We Count project. This includes standard bodies creating a number of standards that act as guidance, government initiatives that are looking at impact of decisions using automated decision tools, academic research units that are looking at the effects process and others. We have found that disability is often left out of the considerations or the ethic approaches. As the questions that were submitted indicated, we’re at an inflection point and this current inflection point we’re at, reminds me of the book the Axemaker’s Gift, by Burke & Ornstein. They wanted us to be aware of the Axemaker’s Gifts. Each time there was an offering of a new way to cut and control the world to make us rich or safe or invincible, more knowledgeable, we accepted the gift and used it and we changed the world, we changed our minds, for each gift, redefined the way we thought, the values by which we lived and the truths for which we died.

But to regain my optimism, even AI’s potential harm may be a double edged sword. The most significant gift of AI is that it manifests the harms that have been dismissed as unscientific concerns. It gives us an opportunity to step back and reconsider what we want to automate or what we want to accelerate. It makes us consider what we mean by best, by optimal, by truth, democracy, planning, efficiency, fairness, progress and the common good.

Some of the things we have done within my unit to provoke this rethinking include our inverted word cloud, a tiny little mechanism that conventional word cloud increases the size and centrality of the most popular or statistically frequent words. The less popular, outlying words decrease in size and disappear. We have simply inverted that behavior. The novel, the unique words go to the center and grow in size. We have been trying to prove, indicate with models, like the lawnmower of justice where we take the top off of the Gaussian or the bell curve, as it may be called, to remove the privilege of being the same as the majority. The model needs to pay greater attention to the breadth of data. We’re exploring bottom up community led data ecosystems where the members govern and share in the value of the data. This fills the gap left by things, like impact investing, for example. When social entrepreneurship efforts that are supposedly addressing these problems can’t scale a single impactful formula sufficiently to garner support, it also works well to grow knowledge of things like rare illnesses that won’t garner a market for the treatments and therefore are not invested in.

We’re creating tools to reduce harm by signaling when a model will be wrong, unreliable, because the evidence based guidance is wrong for the person being decided about. Here we’re using a tool, the dataset nutrition label that gives information about what data is used to train the model.

Back to the Axemaker’s Gift and the opportunity to reconsider where we’re going, from a complexity perspective, we’re collectively stuck on a local optima, unable to unlearn our fundamental summits and approaches to find the global optima. I believe there is a global optima. At the moment, as a society, we believe, or we act like that, to succeed we need to do what we have been doing more effectively, efficiently, accurately, consistently. We’re hill climb, optimizing the patterns of the past, eroding the slope for anyone following us. We need to stop doing the same things more efficiently and potentially reverse course.

I have been considering the many local optima. We keep hill climbing, not just statistical reasoning finding a single winning answer, not just winner takes all, zero sum gain capitalism and growth at all cost but majority rules, all or nothing decisions. And even in our community, the accessibility community, the notion of a single checklist of full accessibility for a group of hugely diverse people, many of whom are not represented when coming up with the list.

The people closest to the bottom are more diverse, closest to the path we need to follow to find the global optima, less invested in current conventions. We need to diversify and learn to use our complementary skills and learn from people who are currently marginalized, even in this community focused on accessibility. If anyone knows, we know that it is at the margins, outer edge of our human starburst that we find the greatest innovation and the weakest and the weak signals of crisis to come. This is where you feel the extremes of both the opportunities and the risks. One of the emerging uncertainties that holds both greater opportunities and risks, it is generative AI.

What are the implications, if you have a disability, what will it do for accessibility? I’m sure you have heard about tools like chatGPT, Stable Infusion, various versions of DALL-E, Midjourney and other tools, even today there are announcements of new tools. They don’t rely purely on statistical reasoning, they can transfer learning from context to context, they use new processes called transformers that can pivot to new applications. They can also create convincing, toxic lies. People with Disabilities tend to be most vulnerable to the misuse and abuse of toxic tools.

I’m going to invite Shari to help me address the emerging possibilities.

SHARI TREWIN: Hello, everybody. I’m from Google, Shari Trewin, a middle age white woman with a lot of smile lines on my face.

So there’s a lot to think about there! I wonder if we might start off where you ended there talking a little bit about generative AI models and language models, and they’re trained on a large amount of data that may not reflect the moral values that we would like our models to incorporate. One question I think that would be interesting for to us talk about is can we teach these large language models or generative AI to apply these moral values – even though the very large datasets may not represent that.

JUTTA TREVIRANUS: That’s a great question. Thinking of how that might be done, one of the dilemmas is that we may need to find a way to quantify abstract qualitative values. In that process, will that reduce these values. Deep winning lacks judgment, the human sort of value, the human judgment that isn’t quantitative, perhaps one way to start is by recognizing human diversity and the diversity of context. There is a lot of talk about individualizing applications, without making the cost exorbitant to the people that need them. The irony, of course, in that is that the people that need that type of individualization the most are most likely to be the people that can’t afford it. It is not yet known, can we do that? Of course, there has been surprising advances and all sorts of different areas with respect to AI and generative AI, but I think this is the issue of values and shared values, and the articulation, and making “mechanizable” because, of course, we’re talking about a machine and recognition, values that we have difficulty even fully expressing is quite a challenge.What do you think, Shari?

SHARI TREWIN: It is a good point, can we express, or can we measure whether a model meets our values, or whether we think it is free from bias or as free from bias as we can make it? Do we know how to evaluate that? I think is an important question. Some of the steps that often get missed when creating a system that uses AI, what may help with that, it would be starting off from the beginning by thinking about who are the people who may be at risk, what are the issues that might be in the data, what historical biases may that data represent or include, and then actively working with members of those communities to understand how are we going to measure fairness here? How will we going to measure bias, what’s our goal, how will we test, how will we know when we have achieved our goal. I think there is some progress that could be made in the design process and thinking about the larger system that we’re embedding AI in. Everything doesn’t have to be built into the one AI model and we can augment models, build systems around models, taking into account their limitations and create a better overall whole system.

JUTTA TREVIRANUS: Thinking about what the models are currently trained on and the masses of data used to build the models, the training data is rife with discrimination against difference, right. How do we, how do they unlearn? It is sort of, it matches some of the training that I do within my program, in that students have been socialized with very similar things and often the issue is not learning, the issue is unlearning. How do you remove those unconscious habituated values that are so embedded in our learning systems. I agree, it is a huge opportunity especially with more context aware systems. Maybe what we need to pursue is, even to address things like privacy, and the need to swim against this massive amount of data that’s not applicable to you, is on device, personalized, not personalized, because personalized, it is a term that’s also sort of been hijacked to mean cushioning, but individualized, let’s use that term, system that takes your data and creates a bottom up picture of what’s needed.

SHARI TREWIN: There is definitely interesting avenues to explore with transfer learning and to take a model that’s been trained on data and has learned some of the concepts of the task that we want, but maybe we would like it to unlearn some of the things that it has learned, can we use techniques like transfer learning to layer on top and unteach the model, and direct the model more in the direction that we want. I think the hopeful thing about that, it needs magnitudes less data to train such a model. That make it is a little more achievable, a little less daunting for the community to take on.

Do we think that current regulation systems are currently to the task of regulating current and emerging AI and preventing the kind of harms you have been talking about?

JUTTA TREVIRANUS:No. (Laughter). No, for a simple answer, I don’t think so. There is so many issues. Laws and policies are developed at a much slower pace. We’re dealing with an uncertain, very, very quickly moving, quickly adapting area. When laws well, they need to be testable. In order to be testable, we have to create these static rules that can be tested, which means we have to be fairly specific as opposed to general and abstract. That tends to lead us towards one size fits one criteria, which we know are not great if we’re trying to design for diversity or encourage diversity. I think one of the things we need to innovate, it is the regulatory instruments that we can use here. What’s you’re thinking about this?

SHARI TREWIN: Yeah. I think some of these regulatory instruments that we have do apply. If you’re a company that is using an AI system in screening job applicants, the disability discrimination laws still apply to you, somebody could still bring a lawsuit against you, saying that your system discriminated against them, you’re still liable to defend against that and to watch out for those kinds of issues. In some ways, there are important pieces that we need in place, that can be used to tackle problems introduced when AI systems are introduced. In other ways, there is a little more of a gray area when the technology is not making discriminatory decisions, but it still may make harmful mistakes or that mislead people, that people are relying on it for. You know, if anybody here has a legal background, I would love to hear their take as well on how well the current consumer protections apply, for example, if you’re using any of these tools.

JUTTA TREVIRANUS: I have become aware of and worried about the people for whom the law isn’t adequate. The fact that we have a law, the fact that we supposedly have measures that prevent abuse or unethical practice, if you are still being treated unethically, it makes it even harder for you. I think that the measures that we do have, the regulations that we do have, have to some way of continuously being it traded upon so that we can catch the individuals that are not included. We have to recognize that our “supposed” solutions are actually not solutions. This is never fixed, that it is, it requires this ongoing vigilance. Yeah. There is much more to say about that. Yes. It would be great to hear from anyone with a legal background.

SHARI TREWIN: Let’s talk a little bit more about generative AI, it was mentioned in the end there. It produces very multiple convincing statements when asked a question, but also, very plausible, it completely makes things up and isn’t always reliable. In fact, right now, is not connected to any form of ground truth, or able to assess the accuracy of what it makes. One question that I think is interesting is, will this technology reach a stage where it can support the kinds of decisions that we are using statistical reasoning for now, eventually. Obviously, right now, it is not there yet.

JUTTA TREVIRANUS: It is interesting because just recently there have been the announcements of the systems being used for medical guidance, using large language models to come up with answers to your medical questions which, of course, is quite… It will be interesting to see what happens.

SHARI TREWIN: Scary, I think.

JUTTA TREVIRANUS: Exactly, scary. And what about the medical device given to someone within the dataset that’s provided, there isn’t a lot of advice. Given that the system, I mean, doesn’t ask any of the LLM, the chatbots, how confident they are in their answers, they’ll answer that they are confident, because there isn’t a sense of, what is the risk level, or the confidence level at this particular response setup, there is no self awareness of what’s wrong, what is right, what is the context in front of me.

SHARI TREWIN: That’s a great opportunity there to explore whether we can enable models a little bit to know better what they don’t know. To know when the case that they’re dealing with right now is not well represented in their models or may be an outlier case that they should perhaps pass on to some other form of decision making or at least convey less confidence. You know, I think generative AI today gives us a glimpse of the future, the kind of interactions that are possible, the kind of ways we might interact with technology in the future. Clearly, there is a research priority to ground it better in truth and it needs to be much more reliable, much more trustworthy, much more accurate, but then, you know, today, it can’t support the applications and the idea of using it to get medical advice, it is just, that’s a very scary thing. Because it is so eloquent that it is immediately trustworthy, and it gets enough things right that we begin to trust it very quickly. In some ways, the advances that have been made, it is so good that it really highlights the dangers more effectively.

I think this is interesting to think about what a human AI interaction would look like in the future. Would we need to train it to identify bias and kind of work with a larger language model to adapt responses. Would we, you know how automatic image description has sort of evolved. At first, we would throw out words, that may should be in the picture, sometimes it was right, sometimes it would be wrong. Now you see these generated alternative texts being praised in the way that conveys the uncertainty. “Could be a tree”, or something like that. I think that the large language models could do something similar to reduce the chances of misleading people. They may say things like “many people seem to think blah, blah, blah”, or get better at citing sources. I think there is a lot of ways that we can use these in direct research to overcome some of the obvious failings that are there right now, other limitations that we currently have.

Mark has shared in the chat that I can see, from the U.S. government regulatory side much of the current laws, or regulations, they are to access government service, they’re about the technical accessibility of the interfaces rather than the more AI-focused questions around system exclusion or mismatch. That’s coming back to our point about the regulatory instances.

JUTTA TREVIRANUS: I just noticed that Mark says what a Debbie Downer my talk is. I think, by design, we decided between Shari and I, that I would provide the mournings and Shari would provide the optimism.

SHARI TREWIN: I have the best job there.

JUTTA TREVIRANUS: I think there are quite a few questions in the question and answer panel. Maybe what we should do, there are so many things to explore with the emerging models and so many uncertainties, there are some great questions there as well.

SHARI TREWIN: Yeah. How about… they’re jumping around on me. New questions. I know this is not in the right order, but, as people are adding questions, they’re kind of jumping. (Chuckle).

So, Bruce Bailey is asking, he says fantastic keynote, please expound on personalization having been hijacked to mean cushioning. I can guess, but that term and perspective is new to me.

JUTTA TREVIRANUS: I can talk about that. A way that we recognize that we’re all diverse, and especially if you have a disability, you are diverse from other People with Disabilities and that our needs are there for diverse, it is to look at how do we personalize. Personalization, it has been used as a term to look at using recommender engines, using various ways in which we’re offered only information and recommendations from people like us which, of course, removes any dissidence and any diverse thinking and our exposure to alternative views and perspectives. To some extent, it causes us to, it causes greater polarization because we’re also offered a personalized view of the current stance that we’re taking, so that it gets confirmed again and again and again. I’m not talking that type of personalization. I’m talking about the type of personalization where the interface makes it easier for us to participate and addresses our specific very diverse requirements with respect to that participation. I moved away from the term personalization simply because I don’t want it to be mistaken for the type of personalization that cushions us away from diverse perspectives because certainly, we need to be exposed to those, that diversity of perspectives and we need to consider the diverse stories that people have.

SHARI TREWIN: I think personalization is an essential part of accessibility in general but there’s, you were talking a particular kind of personalization. AI personalization, I’ll talk a bit more in the keynote at the end, about an example of AI personalization of personalized models that are permitting access to digital content which I think is a, you could have, it is the lack of use of personalization.

Yeah, so Kave Noori from EDF, thank you for this important keynote, I have seen different toolkits to test and mitigate bias in AI. What is your view on them and their usefulness?

JUTTA TREVIRANUS: We have been doing, actually as a part of a number of our projects, including ODD (Optimizing Diversity with Disability) and We Count, looking at a variety of AI ethics auditing tools and also we have done sort of the secret shopper, testing of employment tools, and seeing if we can detect the particular biases that comes, unwanted biases, as made clear by us, the tools are intended to be biased. It is the unwanted bias as a proviso. What we find, it is that they’re great at cluster analysis and then they supplement the cluster analysis with a number of questions that is asked by the implementer of the system. The primary technical key to the tools is determining whether there is unfair treatment of one bounded group with another. That works well if you have something like determining whether there is discrimination regarding gender, discrimination regarding declared race, language, those sorts of things, which do cluster well. It doesn’t, none of the tools really detect whether there is discrimination based upon disability. Because the particular discriminating characteristics are so diffuse and different from person to person, we don’t see how it is possible in a litigation perspective, or in a regulatory perspective, to prove that you have been discriminated against. It is going to very, very difficult to come up with that proof because the particular characteristics are themselves so entangled and defused. It may not be one particular characteristic associated with your disability that you would use to say, well, look at here, I’m being discriminated against because of this characteristic that relates to my disability.

SHARI TREWIN: I think there are a lot of the toolkits, many of them that are in the toolkits, they’re group fairness metrics like you say, where, and that’s an important thing to measure and to look at. When we do have the ability to identify groups and to know for sure who’s in which group, which one. The boundaries of the groups, they’re always not fuzzy, you know, there’s deeply embedded assumption that there are only two genders, for example. In the data, and many of the tools, and they have their problems, and disability emphasize these problems, the same problems. There are also individual fair metrics in measures and some of the toolkits include some of these kinds of measures. Instead of asking, is this group as a whole treated equivalently to this other group? They ask, are similar individuals treated similarly? You could imagine with an approach like that, if I as an individual with my unique data, I could make a case that I was discriminated against by creating another person who was similar to me in the respects that are important for this job. And see what kind of result they got compared to my result, and that would be a way to measure individual fairness and build up a case.

JUTTA TREVIRANUS: Yeah. Yes. Unfortunately, there is not that many tools that currently do that. The certification systems that currently exist are not implementing those. There is much to work on there.

SHARI TREWIN: Yeah. It is more of a case by case basis for this particular job. It is not so easy to make a blanket statement about it, but I think it is not impossible to assess. Do we have time for one more? How much longer do we have? Another 3 minutes?

CARLOS DUARTE: Well, you have almost 10 minutes more. You can definitely take one more.

SHARI TREWIN: Awesome. Great. Let’s see. So, Fabien Berger, I feel that AI, but before it was KPIs or else, are searched by managers to justify their decisions or run away from the responsibility of their decisions. It follows a need for them, but with a wrong, incomplete answer. Do you agree?

JUTTA TREVIRANUS: Yes. I think the issue, and I was trying to make that point but possibly not well enough, that AI is doing much of what we have done before, but it is amplifying, accelerating, and automating those things. Certainly, AI can be used for confirmation bias to find the specific justification for what it is that we need to justify, whether it is something good or something bad. A lot of the harms of AI already existed because of course AI is learning from our past practices and our data. Because, I guess I have often used the analogy of a power tool, before it was this practice that was not that we did manually, so there was an opportunity to make exceptions, to reconsider, you know, is this actually what we’re doing to do something different but with the power tool, it becomes this much more impactful thing, and there is less opportunity to craft the approach that we take.

SHARI TREWIN: I think that’s why it is really important to try to design for outliers and to consider outliers. Again, I come back to this point of a system, the system as a whole, that includes AI. If we can’t guarantee that the AI itself is going to give us the characteristics we want, then we have to design around that, and be mindful of that while we’re designing. There is also, of course, the opportunity to try to clean up our data in general, if there are, you know, in situations where we can identify problems with the data, we should certainly tackle that or imbalances in the data, we should certainly tackle that, that’s one other step and I think there are many steps to fairness and to ethical application of AI and no one step is a magic solution to all of them. But, if we stay aware of the risks, make sure that we’re talking to the right people and involving them, then I think that we can, you can at least mitigate problems and know the limits of the technologies that we’re using better.

JUTTA TREVIRANUS: I have been looking at some of the ethical questions that have come in. One of the discussions was about the Gaussian Curve or the Gaussian center, one thing that I think, one point that I may not have made as clearly, it is that, in fact, the myth that we need to have a single answer at the very middle of the Gaussian curve which, of course, matches our notion of majority rules, or as the way to decide amongst difficult decisions, an alternative to that, it is to address the very, very diverse edges initially and to prioritize those. Because, what then happens, it gives us room to change, it helps us to address the uncertainty and makes the whole design, or decision, or options that are available much more generous and, therefore, prepares us better for the vulnerabilities that we’re going to experience and the future. Of course, I’m an academic, to say that that statistical reasoning, evidence through scientific methods is at fault is a fairly dangerous thing to say, especially during a time when truth is so much under attack. But, I think what we need to do is not reduce truth to statistical reasoning but to acknowledge that there are a variety of perspectives on truth and that we need to come up with one that addresses the people that we’re currently excluding in our notions of truth.

SHARI TREWIN: There are two minutes left I think now. Maybe we can squeeze in one more question here. Jan Beniamin Kwiek asks do you think that AI and big companies driving research on it can be problematic towards societal issues that don’t necessarily give the highest revenue? If so, how can it be fixed?

JUTTA TREVIRANUS: Yeah. That’s a huge question. The government efforts are basing their decision making on profit and economic progress and impact measures. I think one of the things that we need to abandon, it is this idea that a solution needs to be formulated and we need to scale it by this formula type replication. We need to recognize that there is a different form of scaling, it is by diversification and that we need to contextually apply things. I mean, that’s one of the lessons of indigenous cultures, that what’s labeled as colonialist, it is what many governments are in fact still implementing even in things like social entrepreneurship. Yes, big companies, of course, they’re driven by profit. Is that the best approach to achieve the common good? That’s a huge question.

SHARI TREWIN: It is a huge question. It would be a great one to come back to tomorrow in the symposium. Let’s come back to that one. I see we’re out of time right now. Thank you very much, Jutta.

JUTTA TREVIRANUS: Thank you. We’ll have the positive tomorrow! (Laughter).

CARLOS DUARTE: Thank you so much, Jutta and Shari, a great keynote, a very interesting follow up, great discussion between you both. Also, there are still some open questions in the Q&A, if you feel like tackling them now offline, feel free to do so.

Panel: Computer Vision for Media Accessibility

Amy Pavel (University of Texas, US)
Shivam Singh (mavQ, India)
Michael Cooper (W3C, US)

This session began with panelists addressing the quality of automated image description, specifically focusing on how to define quality and train AI models to identify aspects like identity, emotion, and appearance in personal images. Different viewpoints were shared, including the recognition of emotions and specific characteristics by current systems, the importance of considering the context and user preferences, and the need for diverse training data. Responsibility and agency were also discussed, highlighting the roles of content creators and users in generating and consuming media descriptions. The impact of AI tools on user agency, the challenge of maintaining diversity in automated descriptions, and the role of the Web Accessibility Initiative (WAI) were examined. The legal and ethical issues related to AI-generated descriptions, including copyright, liability, and fair use, were explored. The potential uses of AI beyond generating alternative descriptions were considered, such as identifying functional and complex images and allowing authors to focus on those that require more attention. The challenges of explainable AI and the potential for improving augmented content were addressed, emphasizing the importance of ethics, transparency, and user understanding. Finally, the panelists discussed the value of richer alternative descriptions, the risks of errors with more detailed descriptions, and the need for a balance between concise and explanatory information.

Transcript of Computer Vision for Media Accessibility

CARLOS DUARTE: Let’s move on to our first panel. The topic for this panel will be computer vision for media accessibility, here we aim to foster a discussion on the current state of computer vision techniques and focus on image recognition, identification, recognition of elements and text in web images and media, and considering all of the different usage scenarios that emerge on the web. We’ll be looking here at aspects of how can we improve quality, how do we define quality for this, the quality and accuracy of the current computer vision techniques, and what are the opportunities and what are the future directions for this, in this domain.

We’ll be joined by three panelists for this first panel. Amy Pavel, from the University of Texas, Shivam Singh, from mavQ and Michael Cooper, from the W3C. Great. Everyone is online, sharing their videos. Thank you all for agreeing to join. I will ask you before your first intervention to give a brief introduction to yourself to let people know who you are and what you’re doing.

I would like to start on one of the issues with quality, how do we define quality here? I was looking at aspects such as how do we, or how can we train AI models that are able to identify aspects in an image, such as identity, emotion, appearance, which are particularly relevant for personal images. How can we get AI to do that, what we humans can do. I’ll start with you, Amy.

AMY PAVEL: Excellent. Thank you so much. My name is Amy Pavel. I’m an assistant professor at UT Austin in the computer science department. I’m super excited to be here because a big part of my research is exploring how to create better descriptions for online media. I have worked everywhere from social medias, like describing images on Twitter as well as new forms of online media like GIFs, memes and I also worked on video, educational videos making the descriptions for lectures better as well as entertainment videos to improve the accessibility of user generated YouTube videos, for instance.

I think this question you bring up it is really important and I typically think about it in two ways. I think about what does our computer understand about an image, and then how do we express what the computer understands about an image or other form of media. So, I think that we’re getting better and better at having computers that can understand more of the underlying image. For instance, we have gotten if we think about something like emotion, we have got an lot better at determining exact landmarks on the face, how they move, for instance, or we may be able to describe something specific about a person if you look at me, in this image, I have brown hair tied back into a bun and a black turtle neck on. This is a type of thing we might be able to understand using automated systems.

However, the second question is kind of how do we describe what we know about an image. If I gave you all of the information about my facial landmarks, what I’m wearing for every context, that may not be super useful. So a lot of what I think about, it is sort of how we can best describe or what people may want to know about an image given its context and the background of the user. Just briefly on that point, I usually think about who is viewing this image, what might they want to get out of it. Also, who is creating it? What did they intend to communicate? So, there are these two questions I think give us interesting ideas on what data we could use to train, to create better descriptions based on the context. For example, we might use descriptions that are actually given by people to describe their own images or their identities or aspects that they have shown in videos in the past. On the other hand, we might improve, we may use a bunch of different methods, and improve our ability to select a method based on the context of the image. For instance, when I worked on Twitter images, we would run things like captioning to describe the image like an image of a note may just say note. We also ran OCR to automatically extract the text and tried to pick the best strategy to give people, you know, what we thought may be the best amount of information given the image. That’s my initial, I’m sure more aspects of this will come up as we have a conversation. I just wanted to give that as my first part of my answer. Yes.

CARLOS DUARTE: Thank you so much. Shivam, you want to go next?

SHIVAM SINGH: Sure. Yeah. Hi, everyone. I’m Shivam Singh. I lead the document based products at mavQ, India. It is a pleasure to be here with all of you. The question here, how should we train models dedicating on identifying aspects like identity, emotion, personal appearances. That is a two part answer.

I’m more of a technical background, I will go a bit of technical diversity here. Preparing a data on diversity, that’s the first point. Most available data, it is from publicly available data. We can carefully plan and prepare the data before creating our models to include the weights for peripheral data of surrounding environment, like in an image, there can be a subject, and there can be a lot of careful data . If we train, choose an algorithm that take care of that peripheral data as well, that will be helpful in getting a better output. For example, you have a subject gesturing, its relation with the environment, and it is linking emotion to its external manifestation on our subjects area. This gives a more inclusive output, if you have a user, a person, it has a better identity, emotion, and appearance and there should be a […] where we could have a diverse dataset, not, but it is not totally depending on the availability of data.

The second part of it, it would be fine tuning the model based on personal preferences. Let’s say you have a better, bigger model, right, you use that as a general model and then you can fine tune that based on the small, little, small scale trainings and smaller datasets and you can fine tune it together to have a better result. Now, this fine tuning, it is kind of a human in the loop feature, where every time you get the data you can expect some feedback on your data and then perform a better output effect. That’s something which is a bit of, it involves some human intervention there. Yeah. That’s how I see how we can train models.

CARLOS DUARTE: Great. Thank you, Shivam. Michael.

MICHAEL COOPER: Hey. So my name is Michael Cooper, I work with the Web Accessibility Initiative. I’m speaking specifically from my role there, I’m not a machine learning professional, I’m not speaking about technology so much as some considerations for accessibility that I’m aware of for that. In terms of improving quality descriptions, the other two speakers spoke about, you know, technically how we do it. I think we may be able to give advice on some of what needs to be done. For instance, machine learning, the output should be able to conform to the media accessibility user requirements and the cognitive accessibility guidance, for instance, as sources of information about what will be useful to users.

I’m also thinking of machine learning more broadly in terms of what tools might be used in different circumstances and in particular, in the context as potential assistive technology. So the question for accessibility there is not just what is the description of this image, what was the image description in this page for me, for the purpose I’m seeking? You know, tools can get context from HML semantic, accessibility semantics like ARIA, and adaptive technology, they can also generate their own context from machine learning algorithms. I think there is going to be a need to have a way to communicate user preferences to machine learning, whether that is added to the semantics or something.

Let’s see, just a couple of closing notes on that, users need to be involved in the design and training process, that’s sort of something that needs to be repeated. You know, we have to pay attention to that as we look to improve that. I would also note that while this session is mainly focused on, you know, images and media, virtual, augmented reality has a lot of the same problems and solutions that we should be looking at.

CARLOS DUARTE: Okay. Thank you for starting that discussion. One thing, I guess it was mentioned by all of you, in different ways, it is the role of the end user and in fact, I guess both users were mentioned, the one that is viewing or requiring the image or the description of the image, but also the one that’s creating or sharing the image. For that one, there is the responsibility of generating a description and, of course, we know most people don’t do that, so, that’s why we also need these AI based systems to take on that role. But this leads me to another aspect, if we have an AI based system that’s capable of assisting both the content creator and consumer, how does this impact the agency of end users? Will end users feel this is no longer their responsibility because there is a tool that can do this for them, or if we explore this as something that we’re now looking at this from the content producer perspective, if we see this tool as something that helps someone generating a description, would this producer just start relying on the output from the AI and thinking about what Jutta had introduced earlier today, wouldn’t the, and she mentioned this as an organizational mono culture, can we also think about the description mono culture, which all descriptions would start conveying the same kind of information. What’s your perspectives on the impact that this has on the agency and end users? I will start with you.

SHIVAM SINGH: Awesome. It is a bit of a question. Let’s say we’re talking about the quality of our output based on the user, right, the end user. The quality of this description depends on how end users consume it. For example, most models currently provide high-level and grammatically correct captions in English, but that would not be true for captions generated in other native language of other users, it may not have enough of a dataset to train the model. Now, the premise of training restricts this diversity of generated captions and the use cases of what all things in the model it can comprehend and then generate the caption which includes, like, the diverse text like a diverse text, line an email, a date, or correctly explaining graphs, which is a big problem until now. Once a translation with AI is employed, how well it becomes an input is […], for example, you can have two different models, one is precise and a general one. The general output of a model can become an input for a specialized model for a model and then you can refine it. This is how we’re now achieving it.

The other thing is the caption generated by AI consumes very large amounts of data to curate content, and in many cases of live caption generation, AI should put in context the earlier events or early inputs as well, and this is true for a context of the conversational bots, but this can be also a talk where you have a live caption generation. So you have to put some context there and then you have to generate the captions. Now, we have mature engines like GPT3, but this is more complex than a simple image to text generation, the speed, and handing of the peripherals, it is very much necessary. We’re looking forward to a better solution where the end users are really satisfied with what they’re getting.

CARLOS DUARTE: Thank you. Michael what about the perspective from the end users, the agency of end users from your point of view? I guess more the Web Accessibility Initiative role and how can we guide technical creators to ensure that end users remain with autonomy to, when creating this kind of content.

MICHAEL COOPER: Yeah. So ,first I would, you know, look at, you know, what are the ways in which the machine learning generated descriptions and captions increase user agency and there are ways to decrease that as well. For instance, although we would prefer that authors provide these features, if they don’t, providing them via machine learning will help the user access the page and give them the agency that they’re looking for in the task. Descriptions don’t have to be perfect to provide that agency. That said, it is frustrating when they’re not good enough, they can often mislead users and, you know, cause them to not get what they’re looking for, spend time, et cetera. That’s a way this can be a risk for users and, as you mentioned, there is likely to be a tendency for content developers to say machine descriptions are there, so we don’t need to worry about it. You know, I think those are, you know, simply considerations that we have to pay attention to in our advocacy, in the education work in the field, also in documenting the best practices for machine learning. For instance, W3C has a publication called Ethical Principles for Web Machine Learning that talk about, they address accessibility considerations among others, and it is possible that the industry might want a documented set of ethical principles or a code of conduct that the industry organizations signed on to saying here’s accessibility ethics and machine learning in addition to other ethics that we’re paying attention to. Those could be ways that we can support the growth of user agency in the end of this. Yeah.

CARLOS DUARTE: Thank you for that perspective and raising awareness to the information that the WAI group is making it available. I think that’s really important for everyone else to know. Amy, what’s your take on this, on the impact that these tools can have on the agency of end users?

AMY PAVEL: Yeah. So I might answer this briefly from the content creator side. Say you are out to make a description, how could we use AI to improve the description, improve the quality of descriptions and the efficiency, rather than sacrificing one for the other? I’ll start with, I worked on tools a lot in this space. I’ll kind of start with what hasn’t worked in the past and then share some possibilities on things that work a little bit better. One thing that I worked on for quite a while has been creating user generated descriptions of videos. Video descriptions currently appear mostly in highly produced TV and film and they’re quite difficult to produce yourself because they’re sort of an art form. You have to fit the descriptions within the dialogue. They’re really hard to make. So one thing we worked on is some tools to make it easier for people to create video descriptions by using AI. So, what didn’t work was automatically generating these descriptions, the descriptions were often uninteresting, and they didn’t provide quite the depth that the original content creator had included in the visual, in the visual information of the scene, if it is simple, a house, a tree, it may get it. If it was something domain specific or had something extra to it that you may want to share, it was completely missing. One thing we looked at, how to identify areas where people could add description, silences or how to identify things that were not described in the narration. At this point, the narration of the video talks about, is talking about something completely unrelated to the visual content. People may be missing out on that visual content.

Rather than trying to, like, automatically generate descriptions, I think one promising approach can be to identify places where people could put in descriptions or if they write a description, identify parts of the image that that description doesn’t cover yet. I think that there is kind of some cool opportunities to use AI in unexpected ways to help people create better descriptions.

I’ll briefly address the end user part. You know, if the user is lacking, so the person using the captions, the descriptions, if they’re lacking information that can decrease the ability to have agency and responding to that information, right? If you give them all of the information, you know, in one, big, piece of Alt text, you may not give people much agency over what they’re hearing or probably not matching with the cognitive accessibility guidelines that Michael had mentioned.

I have experimented with some ways to try to, like, maybe help people get agency over their descriptions, one thing we have played with a little bit, it is, you know, asking basically alerting people to the fact that there is a mismatch between the audio and visuals, for instance, in listening to a lecture, hey, the lecturer hasn’t talked about this piece of text that’s on the slide. Would you like to hear more about it? Then people can optionally hear a little bit more about it. That’s something like OCR, automatically detecting text works quite well. There are these opportunities that you don’t want to overwhelm people with information when they’re doing a task that’s not related, but there are some cool opportunities, I think, to give people control over when they get more information. Yeah.

CARLOS DUARTE: Thank you, Amy. Before moving to the next question I have here, there is a follow up question on this by Matt Campbell on what you just mentioned, Michael. You mentioned descriptions not being good enough are a risk for user agency, what Matt is inquiring is how much can this be mitigated by just tagging the descriptions as automatically generated. Can you give a perspective on this, also, Amy, if you want to following Michael?

MICHAEL COOPER: Yeah. I’ll try to give a quick answer. So is the ARIA technology, accessible rich Internet applications, enhances HTML with the ability to point to a description elsewhere in the HML document rather than providing a simple alt text and that gives you the rich capability and we have that now in terms of identifying it is a machine generated description, we don’t have a semantic for that, but that’s the sort of thing that would get added to ARIA if the use case were emerging.

AMY PAVEL: Yeah. I will also, I’m happy to also answer this question, maybe I was looking at Matt’s other question, kind of related, I think. Are there other alternatives that are richer than alt text alone? One thing we’ve looked at a little bit for, I worked a little bit on the accessibility of complex scientific images. What you end up with, it is complex multipart diagrams that if you try to describe in one single, you know, alt text field it performs quite badly. We’re kind of starting to see, like, could we automatically break that big piece of alt text down to a hierarchy to match the image so that maybe people can more flexibly explore like they would basically an HTML version that sort of captures the structure of the image that people could explore. Kind of thinking about other ways to present all of the information that currently gets relegated sometimes to a single alt text into something that’s a little bit more rich.

SHIVAM SINGH: Carlos, you’re on mute.

CARLOS DUARTE: Thanks. What I was saying, since we have been coming always around to the topic of or to the concept of quality, also one question by Mark, Mark Urban, I think, it would be interesting to know what’s your take on this. So is there a documented metric that measures the quality of an image description, and if there is, what would be the most important priorities for the defining quality. Amy, you want to go first?

AMY PAVEL: This is a hard question for me. I think that the answer is no. It is really a good, it is a really good question and something that we constantly sort of battle with. So, we kind of abused in our work, you have used a four point description, literally nothing, there is something in the description field but it is in no way related, there is something related to the image but it is missing some key points, and this covers most of the key points in the image. We kind of have been using this and what the values mean depends a lot on the domain and what task the person is using the image for. But it’s been like… you know, we’ve used this in a couple of papers and it’s just been like a way for us to, you know, make progress on this problem. and we have also tried to for each domain we’re working in, kind of tried to inform it based on existing guidelines as well as literally the existing W3C guidelines and what users have told us about specific to that domain. I don’t know of a good one. That’s something that we just sort of worked around. It would be great to have more efforts on that in the future.

CARLOS DUARTE: Definitely something that’s more qualitative than quantitative, definitely. What you just described is a good way to start. So, Shivam, your take on the quality of image description?

SHIVAM SINGH: Sure. So I guess when we come to industry set up, we have certain evaluation tools, we evaluate our models as well as the outputs, there’s a rigorous testing that goes on, but there is no set of metrics that we have, but certainly we have some rules, we have W3C guidelines, we have some other guidelines as well that are in place. They are not set rules, but, yeah, we have tools as a yardstick and we can build that test based on that only. There can be some work done with that, yeah, certainly this is what we have currently.

CARLOS DUARTE: Okay. Michael, Amy just mentioned, answered looking also at the definitions that W3C provide, do you want to add something on how can we measure quality of image descriptions?

MICHAEL COOPER: The only thing I would really add to what she said is, so, we produce resources like understanding WCAG, ,understanding the web content accessibility guidelines which goes into when you’re writing the image descriptions, what are the considerations, how do you make a good one. A big challenge for machine learning I think, in particular, it is the quality, the appropriate description for an image would depend very much on the context. We described several different contexts in the support materials and, yeah, the right description for one is the wrong one for another. Sorting that out, I think it is one of the big challenges beyond what others have said.

CARLOS DUARTE: Yeah. Definitely. I have to agree with you. Apparently we’re losing Shivam intermittently and he’s back!

I’m going to combine two questions that we have here in the Q&A, and the one from Jan Benjamin and the other from Wilco Fiers. It is more about qualifying images than really generating descriptions for the image. Jan asked can AI differentiate between, for example, functional and decorative images rather than just generating a description, just differentiating between an image that needs a description and one that doesn’t? And Wilco asks if it is viable to spot images where automated captions will likely be insufficient, so that content authors can focus on those and leave the AI to caption, to describe others that might be easier for them. Amy, want to go first?

AMY PAVEL: Sure. Yeah. I love both of these questions. I would say to Jan’s question, I don’t think, you know, when the question is can AI do this, you know, we have tried this a little bit for slide presentations. The answer is yes to some extent. It will fail some places. To give you an idea of how, you know, AI could help may help detect decorative from non decorative, from more informative images, like in the context of a slide presentation, it is informative images might be more complex, they might be more related to the content on the rest of the slide and in the narration. Informative, they might be larger on the screen and decorative on the slides might be, you know, like little decorations on the sides, they may be logos, or like emojis, or less related to the content on the screen. What we have found out, we can do a decent job at this, but it will fail in some cases always. Like maybe an image is included, but there is no other information about it and it is tricky. In doing this, you want to be overly inclusive of the images you identified as informative so that maybe you could help content authors make sure that they at least review most of the images.

I would say to Wilco yeah, that’s a great idea. We have tried it a little bit on Twitter. One time we ran basically a bunch of different AI methods to try to describe images on Twitter, and so for each image we try to run captioning OCR, we did this URL tracing to see if we could find a caption elsewhere on the web and basically if all of those had low confidence, or they didn’t return anything, then we kind of automatically sent the image to get more human written descriptions. Another thing we explored, users optionally, retrieving the description. It is possible. The subtleties that are there, they’re difficult to view automatically. At least that was a way, given how many images were on Twitter without descriptions, it was sort of a way to filter out the ones we definitely need to get more information from a human. Yeah.

CARLOS DUARTE: Great. Thank you for sharing those experiences. Shivam?

SHIVAM SINGH: I guess I have been in contact with this scenario, where I had to get descriptions of images that most likely will not get very sufficient on a machine description. So there are ways, tools that can do that for you, on websites, there are multiple plug ins to use. You can give certain descriptions and people can put certain human descriptions over there. To mark them, to spot them in a scalable manner, it sometimes does not become scalable. That’s the whole issue. You can have a tool, it may not be scalable for every user out there, every website out there. This can be done, but, yeah, again, there are instances where it can be used and where it can’t. The technology, that’s the answer, how to scale it, that’s the question.

CARLOS DUARTE: Great. Thank you. Michael, do you have any input on this?

MICHAEL COOPER: No. Not on this one.

CARLOS DUARTE: Okay. That takes me back to one question that I had here, taking this opportunity to go back there. I will start with you, Michael. It’s going in a different direction than what we have been going so far. How do you think that we need to deal with legal copyright and responsibility issues when generating descriptions with AI based models? How do we tackle that?

MICHAEL COOPER: Yeah. Okay. You know, also, you know, not speaking as a legal professional, but issues that I know about, in general, at least for accessibility, there is often a fair use, the right to transform content, but I’ll circle back to that, but, you know, that’s the first question, but then there are issues around accuracy. If a machine has generated a caption or description, you know, how accurate is that description, who knows how accurate it is, and also publishing it, especially with potential accuracies can bring on liability consequences even if the use is otherwise allowing that publication.

Another, you know, challenge, it is meeting requirements. If accuracy is high, pretty high, but still not quite right, if it is a legal document, it may not be sufficient, so depending on the accuracy of these kind of descriptions is going to be a vague, you know, legal challenge for a bunch of different directions. Of course, you know, there is the benefit, the reason to do it, this still can be better than nothing for many users, you know, who get used to some of the inaccuracies and it does provide scalability given how image and video focused our web has become. I would highlight one of the ethical principles from the ethical machine learning document is that it should be clear that the content is machine generated allowing many actors to evaluate it.

Circling back to fair use, I think who is doing the generating, or publishing machine learning content will probably impact that if it is a user agent and assistive technology, it is probably covered by fair use. If the content producer is doing, they’re probably declaring fair use for themselves but the responsibility for accuracy, it will be higher for them because they’re now the publisher. There are third party agents of various sorts, accessibility, remediation tools, other sorts, where I assume it is a legal wild west.

CARLOS DUARTE: Definitely. To make it worse, I guess, there are many wild wests because every country, every region might have different legal constraints there. Shivam, any take on this?

SHIVAM SINGH: Yeah. So I have a holistic view of how technical this has been. This was when this is an ongoing issue with a lot of countries now. You see almost all publicly available data sets, right… These are the data that are associated in some or other form as a copyright one. There is no frame, in most part of the world that deals with the legality of the generated captions, there is no written law in any place, or it might be coming later, maybe in the U.S. first. This is a complexity these are some complexities. The owning of the AI owners of the data, if it is a machine generated data, who will be owning that data? The industry that built that model or the dataset that has been gathered from different data sources. This is a very complex challenge.

The other part of it, how would you fix the responsibility? To keep that in mind, it depends on the end user of the model. When you use it, in what context are you using it? For example, some of the models that are used in academia, these are just for the research and development purposes, there is no way where you can fix the responsibility on the academy of work. These are the this is helping in two ways. This is how you source the data, either you have to get text on the data, where it is coming from, you gather the data, based on written sources, you have the mutual understanding between the data creator and you, then you train on the data. That gives you a complexity where you have the small dataset and there is a large input going in the training to the data. These are the complexities currently and yeah, it all depends on where the model or audit is being used. That’s where the fair use policy comes.

CARLOS DUARTE: Context all the way in all scenarios, right? Amy.

AMY PAVEL: I’m not as familiar with the legal and copyright side of this. I think, you know, oftentimes I do think about the responsibility aspects of the captions that we’re generating, especially when doing these kind of new forms of we’re generating things like user generated media. This more goes back to the potential harms brought up in the keynote. For instance, I think one thing I’m often thinking about, when are errors not that big of a deal and when are they a bigger deal? Kind of looking at their risks and trade offs in terms of who like who is receiving the image and who is getting identified by the tool and who is receiving the image. For instance, if you misidentified my shirt as dark blue rather than black, this error is unlikely to be as harmful to me, but for some people might experience misgendering them with image classification to be harmful. Two ways of dealing with this, you know, not to say that either of them is good right now, one, a lot of tools back off the same person, rather than saying women or man. Another way that you could imagine doing it, it is describing physical characteristics of the person that are less subjective and a final way you may imagine doing it, it is considering people’s own identifications of how they would like to be described. Sometimes that varies in different contexts. That itself is a hard problem. Yeah. I don’t have much to say on the legal, copyright side, I wanted to bring up that’s something that’s come up in my work before.

CARLOS DUARTE: Thank you so much. We’re almost at the end. We have less than 10 minutes, and questions keep coming. That’s great. You will have the opportunity, I guess, to guess try to answer some of them offline if you wish to. I’ll still take another one. The last one we have here, it is Antonio Gambabari. The question is how do you envision the challenges of explainable AI initiatives in the context of image recognition. This relates to several of the aspects that we have dealt with, with the uncertainty of images and how to convey that to users, just by labeling something as automatically generated, would it be a way to convey that. Do you think that explainable AI initiatives have the potential to improve this kind of augmented context for the user and where did the description came from. This time, I’ll start with you.

SHIVAM SINGH: I think, yes. It is a good point. Explainable AI initiative, it deals with how metadata can help the end user to know the context of what’s being generated, any quantitative score on any of the models , it is supported by a lot of data going beyond your training. Right. There is a restriction though, whatever things you’re getting an output, the metadata can, there are multiple layers of training if you look at the training. There are multiple layers of training. How it is made by AI, it gives a different level of metadata but not all. It could augment the user but that won’t be the complete solution. That’s how I see it.

CARLOS DUARTE: Amy, any thoughts on this.

AMY PAVEL: Yeah. That’s a good question. I don’t know. I think some things, one thing I would think about a little bit in this, and I have had to think about before, it is sort of, like, the trade off between receiving information efficiently and explaining where you got all of that information from. I think both are important. I think maybe, like what my experience has been, users are used to certain different types of errors and can recover from them quickly. For instance, like when a user is reviewing their own content, for example, they took picture, video, and they hear something described as a leash. I have had the experience of users being like, Oh no, that’s my cane, it always calls my cane a leash. In some cases, people can get used to identifying the errors for the known unknowns. This is just a wrong identification, I’m used to it. I think it is harder to recover from errors that are unknowns, unknowns. There are no other contexts about it, you don’t know what else it could be. Maybe in the cases where the users haven’t identified it before that confidence, that information is extra important and so, yeah, not really sure what the answer is. I think that considering the balance between what is important and to know more information about this, will be a tricky design question, a question for how to develop technology.

CARLOS DUARTE: Great. Thank you. Michael, any input on this one?

MICHAEL COOPER: No. I would just add to all that, you know, this again falls into the question of ethics, transparency and explainability, it is one of the section of the machine learning ethics and addresses several aspects of knowing how the machine learning was built, it should be auditable for various issues. These ethics are probably less specific to some of the use cases that we’re discussing in the symposium so there may be room for adding to this section of the document.

CARLOS DUARTE: Yeah. Yeah. I think that may be a good idea. I’ll take just a final one, going back to the topic, one from Matt, it is something that we have touched upon before. I’ll mention you this, Michael, we have mentioned this already in the scope of ARIA, the question is about having richer alternatives for the image description, to the standard alt text, which is usually short. What are the thoughts on the usefulness of having richer descriptions for image alternatives.

MICHAEL COOPER: As far as the general idea in terms of the usefulness of making use of richer descriptions. So, for very simple images, as for the way that the web has started, images were largely providing small functional roles, things were sufficient for many cases, images are now used nowadays for a variety of purposes and some are reducible to an alt, like a photo of my dog, it is not really providing the experience. You know, there is definitely a need for richer alternatives and longer alternatives, ones with structures to skim them, depending on the context, ones that you can provide links to the necessary bits of alternative data, the question about images on charts, often the description for a chart is much more structured semantically than for other kinds of images, and you want to be able to take advantage of a rich text mark up. I believe that assistive technology, they’re supporting, you know, rich text descriptions whenever available, it is a question of getting people to use them more. For machine learning generated, I would rather than do richer rather than less rich output.

CARLOS DUARTE: Yeah. Following up on that , for Shivam and for Amy, by having richer and longer descriptions, are we increasing the chances that AI generated descriptions will mess up or isn’t that a risk? Who wants to start? Amy?

AMY PAVEL: Sure. I think we’re definitely I agree, oftentimes the more detailed that you get, the more opportunities there are for errors. A way we have kind of explored this a little bit, is seeing bring especially for very informative images, that maybe a lot of people will see, we thought of how to combine automated tools with human written descriptions to hopefully make some of the descriptions better, maybe automated tools could help you help automatically extract the structure of the image and humans go in to write more detail about the parts of the image that are really unlikely to be fully described by the computer. For now, the way I think about the complex images, it is often in a how are we going to help humans create descriptions more efficiently by still maintaining high quality, rather than thinking of how to do it fully automatically based on the images I have looked at in the past. Yeah.

CARLOS DUARTE: Thank you. Shivam, any input?

SHIVAM SINGH: I think the inspiration behind this question would be to give us structure to the output. So it is a structured output that makes more sense than to have a fallback estimate. You can provide more information to the output but the output would, should be shorter and more explainable, it may be grammatically incorrect, that could make more sense to the end user and he may have another option to explain that. It’s not like you have a string generated out of an image, right? When you read out to a screen, right, it should concisely read it, short, briefly. And for more description, there should be some other excellent data can be supplied to it. And then there are multiple ways we can do this. But the description of an ultimate should remain concise and grammatically correct. So that screen readers can try to read it, but that’s how I see it.

CARLOS DUARTE: Okay. Thank you so much. And I want to thank the three of you once more for agreeing to take part in this panel, also for agreeing to take part in the next panel. So as we can see, media accessibility, it’s really a rich topic and definitely computer generated descriptions are also linked with natural language processing. So what that will be the topic for the next panel in just under 10 minutes. So we’ll have a coffee break now and I hope everyone’s enjoying, and we’ll be back at ten past the hour.

Panel: Natural Language Processing for Media Accessibility

Amy Pavel (University of Texas, US)
Shivam Singh (mavQ, India)
Michael Cooper (W3C, US)
Shaomei Wu (AImpower.org, US)

During the second panel of the symposium, the focus shifted to media accessibility from a natural language processing perspective. Shaomei Wu discussed the importance of accuracy and richness in these descriptions, emphasizing the need to provide more details, particularly about people. The challenge lies in sharing personal and physical attributes accurately and conscientiously. Shivam highlighted the significance of data diversity and the quality of generated data, advocating for categorizing data carefully to ensure clearer descriptions. Amy emphasized the role of context in improving description quality and suggested using language understanding and question-answering approaches. The panelists also discussed the potential of Large Language Models (LLMs) in reducing bias and emphasized the need for inclusive workflows, careful handling of social identities, and considering the trade-off between providing comprehensive information and efficiency. They addressed biases in recognition, application, and the impact of disability bias. The future perspectives included NLP for personalization, rewriting descriptions, and using NLP in academic textbooks, context sharing, augmenting media descriptions, and supporting visually impaired individuals in media creation.

Transcript of Natural Language Processing for Media Accessibility

CARLOS DUARTE: Welcome to the second panel of the first day. This panel will aim to discuss the current status of natural language processing techniques, and, here in the context of the web we can think or, we know that they can be used to generate textual descriptions for images and also for other visual media presented on webpages. We’ll focus today our discussion or, or start to consider aspects such as providing understandable text to better meet web user needs and the different contexts of use, and also what are future perspectives for natural language processing on web accessibility or to support web accessibility. I’m glad to welcome back Michael, Shivam and Amy, there you are, Amy! Also to welcome Shaomei Wu from Almpower.org who agreed to join us on the second panel of the day. I welcome you all back, welcome, Shaomei. For the first intervention, I ask you to briefly introduce yourself, your three other copanelists have already done that in the previous panel. No need to reintroduce yourselves.

I will start by thinking about once again the quality, we go back to the quality topic and now the quality of machine generated descriptions and now no longer from the perspective of image processing but from the perspective of the natural language generation. How do we improve the quality of the machine generated descriptions, especially taking into account the personalized preferences from users. I will start with you, Shaomei.

SHAOMEI WU: Thank you all for having me here today, my name is Shaomei Wu, and right now I’m the founder and CEO of Almpower.org, a non profit that researches and cocreates in empowering technology for marginalized users and first of all, I want to also share that I do have a stutter. You may hear that there’ll be more pauses when I talk. Before Almpower.org, I was a research scientist at Facebook leading a lot of research and product work on accessibility, inclusion, equity. One of the products that I shipped was automatic alt text, a feature that provided short and machine generated description of images on Facebook and Instagram to screen reader users in realtime.

When it comes to quality of automated Alt text and other familiar systems, we saw two biggest areas of development that we wanted to do. The first one is accuracy, which I think we talked a lot about in the last panel as well and I want to talk a bit more about the second one which is the richness of the descriptions. To be honest, what we generated, the alt text, it was quite limited, and a lot of users, many say it is more of a teaser, oh, yeah, people smiling, pizza, indoor, but no more than that. What kind of environment is it? Is it a home? Is it a restaurant? So I think our users, they really wanted kind of get all kind of all the richness of someone who has eyesight can see, can handle the access.

One particular kind of area that users want to know more, it is about people, who they are, how do they look like, race, gender, even how attractive they are, because that is something that’s socially salient. That was a kind of a big challenge for us, when we were designing our system because, like how can we share those kind of attributes in the most accurate, and kind of socially conscience way. We actually chose not to show the race and the gender of the people being photographed, which we got a lot of complaints on, but how to kind of look at this in a socially respectful way and I think it is you know, we should really work on this and now I can see handling a few ways that we can make that better. For example considering the relationship between kind of people in the photo and viewers, for example, like if they’re friends, then we can put in the name, you know, other things about those people, and another thing, it is kind of to give progressive details, so to have some kind of an option to kind of allow the consumer to kind of request more details that we cannot just provide by our systems. I will be done here and allow other panelists to talk.

CARLOS DUARTE: Thank you, Shaomei Wu. Shivam, your thoughts on how can we improve the quality of machine generated descriptions?

SHIVAM SINGH: This is a two part thing. When you come to technically implementing models, how you have designed the model, how you have trained them, and who are the stakeholders of designing a particular model, it is very much necessary and how they’re going to get the quality machine generated description. When we take into account users personalized preferences, this is two parts. Let’s first take an example. I am a person who knows Spanish, right, my model, a very famous model, it gives descriptions in English. So now the model, whatever the consumption of the model is, let’s say you use an API to consume the model. That should take into account the personalized preferences of the users of the language and write the output based on that as well. This diversity of model to prepare output in multiple formats and languages, it is something that can be looked into, this is how the quality of machine generated description increases. Now, you did not train the complete model separately. What you can do is to create post-processing scripts for the models and that can help end users. There is not much of an effort when we say as a model training input, it is a simple solution to what can be the solution.

The other thing is, how you prepare the quality data. You should fully and carefully categorize it, the structure of the data, if needed, and let’s say you have input data that are blurred images and all sorts of this.. You have to carefully prepare the model and train the data, and based on that, the description would be a bit more clear and the personalization would also be affected when you look into how you can post process the data for certain groups of people. That’s how I see it.

CARLOS DUARTE: Thank you. Amy, want to share your experiences.

AMY PAVEL: Sure. A couple of ways that I have seen that are sort of promising to use NLP to improve quality, one thing I have seen recently, people starting to consider context around the image that’s going to be described to maybe create a description that’s more helpful. Imagine someone writes a post on Twitter, and they have coupled that post with an image. Considering the post and the image together, maybe it may inform models on how to create something that’s more informative. For instance, if I posted a picture of myself snowboarding, I said I learned a new trick, then it may be important to tell me what trick you learned. Whereas on the other hand, I said I just went on vacation, you know, the exact trick may not matter as much. I think that the idea of, like, using language understanding to get more information about the context before making a prediction is promising.

Another way I have sort of seen it used to maybe improve the quality, it goes back to the other answers that were given. Maybe you can use question answering about the image to gain more information when you need it. One thing I have also thought about, it is seeing if maybe users could give examples of their preferences about descriptions in natural language. This is an example of a description, maybe we can copy the style of this description when we’re applying it to other descriptions. Maybe I like to hear about the costumes someone wears in a video and I wish that future descriptions could include more information about that rather than summarizing them.

Finally, one other way, I have used NLP to improve quality, it is also based on summarization. So there can be times when there is more to describe than time to describe it, especially videos, there is often a really small amount of time to describe without overlapping the other audio. One way you can use NLP to improve that quality, it is by trying to summarize the descriptions so that they fit within the time that you have. They don’t decrease the experience of people trying to watch the video and hear the audio at the same time. Yeah.

CARLOS DUARTE: Yeah. That’s a good use for NLP. Michael, still on this topic, I would like to have your perspective on initiatives on WAI that may assist users in providing their preferences, so that eventually models can use those for anything that may be ongoing in that regard.

MICHAEL COOPER: Yeah. First of all, to repeat this, for anyone doing this session that I’m not a machine learning professional, I’m speaking from my perspective of the work on the Web Accessibility Initiative. I want to talk briefly, other panelists covered almost anything that I would have said. One thing that, based on my knowledge of how machine learning works generally today, and models tend to be focused on, you know, a particular ability and it is not universal and in the future AI models will have more abilities combined and so there may be more than one model recognizing this is a human, here are the attributes, another one saying that is this human, and another one that can say this human plus that human equals this relationship. All of that information I believe is separate right now. The ability for models to share context, it is going to be the part of the solution that we need.

What I can speak of from the Web Accessibility Initiative, we are only beginning to explore what AI and accessibility means and this symposium is part of that process. We have a practice of doing research papers, sort of literature reviews and then proposing accessibility user requirements. That would be something that we could be working on to start gathering this information and from there we decide what to do with it, does it go in guidelines, new technologies, whatever. I think most of the resources around AI would fit into new resources for those categories.

CARLOS DUARTE: Thanks. I would like to now move on to addressing something that was at the core of the keynote, discriminating bias, or any other sort of bias. Here looking at something that’s been entered in the Q&A for the previously panel but I think it is also very well, it fits well into this topic, and it brought out the use of large language models (LLMs) which are currently getting a lot of attraction and a lot of spotlight. Do you think this LLM can open up new avenues, Antonio Gambabari mentioned for reducing different types of bias that we see as a result of the use, of AI training models? Shivam, you want to go first this time?

SHIVAM SINGH: Yeah. Sure. Sure. This is quite a question, which has been close to my heart as well, how to address social bias in Large Language Models. Right. We have seen a lot of trainings on this, so socializing this, this is also data that they have been trained on, how the social attitude is of the data presented within that model. Most of the available data, it is used to train models and we use old ones, containing a certain degree of bias, most of the data generated on the Internet, it is basically those, people who can consume it, it is not everybody, some don’t even know what the Internet is, they cannot create data over there. Most of the data available to train the model is based out of that. That’s how you see the bias in one way.

The other instance I can give an example, you will see a lot of violence, homelessness, other things, all of those things are over represented in the text and that’s, these are both not similar, but you will find these kinds of representations in the LLM outputs. How to address this, there is another way of human in the loop feedback on existing models where you provide some feedback to the already existing model, that this is the sort of output that’s not correct, this can be a correct version, this can be another version. Some human interface is needed with that. Now, this aspect of the data, it is a representation of the models now and the underlying data of model, the main source of the issue here. You need to correctly source the data and correctly structure the data so that you’re not over representing one section of data, for example, let’s say that you have a bigger society and the society can be underprivileged, over privileged, maybe other sections of the society. You cannot just take the data from one section of society and train that model and say this is a picture of this particular area. There are many areas underrepresented, that’s happening with all of the models at the start of LLM you can see.

Now, what we can also do to mitigate this, you can create an inclusive workflow and develop the models and the designing of that model and you give the inclusive workflow training to them, you get them aware of what’s happening and what and how to mitigate this. All of the persons who are included in the generation, there is a lot going on, a lot of data extraction goes on. All of these people can be trained for inclusiveness. There are multiple tools that help to us do that. Like, if you’re creating a model, you can test that and Google helps us, Google has the tools, how the models are performing when you talk about a lot of inclusive outputs of the data. Also, you need to do a thorough testing of the models when you go ahead to include tha,t all of the outputs, they’re properly aligned and properly represented, all of the sections of the model which it is intended to be used, it should be represented well. The testing, it should be there in case of any models that you’re creating. Now we have noted that we’re at the stage that AI and LLMs, they’re quite mature right now and we’re seeing a lot of data technologies and we can do this going forward, I guess this can be a solution.

CARLOS DUARTE: Thank you, Shivam. Shaomei, can I have your input on how can we address social bias or other types of bias?

SHAOMEI WU: Yeah. So on these, I want to kind of go back to what I had talked about before, in particular on the sensitive social identities about people on the photos. I don’t see a good way for the current machine learning system to accurately come out with those labels. The key kind of issues here, it is a lot of those systems, they really assume these like fixed social categorizations such as race and gender. I think maybe we should think beyond the machine learning systems and kind of find the way to kind of attribute people respectfully through, you know, having the agencies, of those being photographed and being described. For example, I think now a lot of people has been kind of intensifying their pronouns in their social media bios, all of this information should be made use of or should be kind of made use of when assigning on the way that we’re describing the gender of somebody in the photo.

We have the other directions that we have been exploring, it is sort of describing the appearances instead of identities. For example, kind of describing skin tones or hairstyle and outfit instead of assigning a kind of race or a gender label of somebody. I don’t think any of those solutions can really adjust the kind of the real cause of the problem, so I don’t really have a very good answer on this. I think maybe we should, maybe the alternative, is to kind of think of a way to come away and kind of share who we are., so much relying on the images like we are today. You know, how can we convey the information that we want to share online in a not so visual centric way. I think that’s kind of a bigger question.

CARLOS DUARTE: Thank you, Shaomei Wu. Amy, next, to you

AMY PAVEL: I think that the prior answers mostly covered the things I was going to mention. I loved Shaomei Wu’s answer about describing ourselves in ways, or like figuring outweighs that don’t rely on the visual information and giving agency to people to add their own identities that they want to be shared. I will say that I think that depends in different context, you may want to share different parts of your identity if it is important to you and you might, I think, even things that give end users agency may have a lot of subtlety and how they would be applied in different cases. I like the idea of describing, you know, aspects of appearance. I think you’re missing one challenge with that, you might be sort of trading off between these aspects of appearance that you’re describing and the efficiency with which someone can, like, maybe they’re not going to get the information as quickly as a sighted person would perceiving that person and just because, you know, audio occurs over time. I think that it is an extremely difficult challenge and in some cases it can matter, like I can imagine, you know, seeing a photograph of the leadership of a company, you may want to know some quick details about the demographics of who is leading it for instance?

One thing, I have noticed, it is that it is sort of related to this. You know, when I’m, when I’m asking, so I sometimes, you know, have people describe videos. There can be a lot of differences in which aspects, even if they’re going to describe the aspects of someone’s appearance, the way they describe those based on who is in front of them can also differ based on biases people have. So if people see a women, they may describe her differently than they would describe a man, they may focus on different aspects of appearance and different things going towards describing aspects of appearance will have to be very carefully designed and it feels like a challenging problem. Yeah.

CARLOS DUARTE: Thank you so much, Amy. Michael, any thoughts on this and I would add something here, especially for you, it is that do you see any future role of accessibility guidelines in contributing to preventing bias in machine learning generated descriptions or whatever results in these models.

MICHAEL COOPER: I have an answer for that question, it could be longer than my prepared answers. Let’s see where we go. I would like to add a couple of thoughts to what others have been saying, I want to first categorize bias, we’re talking so far of being labeled bias in recognition, is there biases of how machine learning recognizes objects, people, et cetera, context, and in that, one thing that magnifies this challenge, in accessibility context, it is that the sample size of People with Disabilities can be smaller in various training sets and there is a risk that, you know, images of People with Disabilities, the training set, contexts that are important for them, wheelchair ramps, something, it will be excluded as outliers or will be less well recognizable by the AI than, you know, the images of other people.

You know, that’s just another, another added dimension to the aspects that we need to look at. We also need to look at the bias in the application of this. You know, we have talked a few times during the session about the risk of relying on machine generated descriptions and captions as being good enough, whereas content that has more mainstream audience may also have captions and descriptions that get more curated in what you will call assurance. That kind of bias could creep in and that can magnify the impact on disability bias because it can cause people to be excluded from the fora that, from which people are recruited to be part of training sets, et cetera. So again, the ethical principles of where machine learning speaks to that, and I think we may be identifying some content that we need to add to that.

Moving on to what WAI can do about that, you know, I do believe it is within the scope of the Web Accessibility Initiative, or the W3C to provide guidance in some form about how AI and accessibility should work together addressing many of these things. You know, typically, this sort of thing would be a Working Group note which means it is a formal document published by the W3C that has a certain level of review. There is even opportunities for versions that have had more review and sign off. I think that’s one thing we may need to do.

I will talk briefly about the work that we’re doing on the Web Content Accessibility Guidelines 3.0, sorry, the W3C accessibility guidelines or WCAG3, it is a substantial re-envisioning and it has been a clarifier since the beginning that he want to address , that we want to address equity in the guidelines, how to make sure they’re equitable to People with Disabilities. We have explored that in certain ways in the Working Group really unpacking that, understanding the relationship with equity, accessibility, bias and other dimensions. That’s turning, you know, we’re connecting that with other work W3C has been doing to make itself more equitable of an organization and, you know, this is to say that I believe that WCAG3 will have some structure built in and support resources addressing the issues of bias specifically. These are hopes, not promises, but, you know, that’s the direction from activities like this.

CARLOS DUARTE: Thank you so much. Those are exciting avenues we hope will come to fruition in the near future. I guess final question for everyone, and it is I would like to know a bit about your future perspectives on the use of Natural Language Processing for the field or in the field of accessibility. I will start with you this time, Amy.

AMY PAVEL: I think this is an exciting area. One thing, one shift I have found recently among people in NLP I talk to, as models are getting better at creating fluent text that looks reasonable that a lot of people are becoming more interested in what are the actual applications of this and how can we build tools that support these applications, rather than relying on automated metrics and that may not capture people’s experiences. I wanted to note that that’s a direction that I find exciting. A couple of things that could be promising and I mentioned them in other response, you know, as we gain the ability to describe more and more about the image, I think that NLP can provide a really good opportunity to personalize those descriptions based on the person and what they want as well as the context. There is, if you think about walking in a room, there is so much you could possibly describe, if we could make it easier for people getting the information they’re looking for quickly from their media, that would be a great improvement and, you know, combining computer vision to recognize things in the underlying image and using something like NLP to summarize that description I think is promising and exciting.

Another way I think I’m excited about it, it is opportunities to maybe help people with their own description tasks. When we have humans working on descriptions, it is really hard. You know, novices sometimes have a hard time remembering and applying the guidelines that exist. Maybe we could rewrite people’s descriptions of videos to be more in line with how an expert would write them by making them more concise or changing the grammar a bit so that it fits what people are expecting from their guidelines, or we may alert people to aspects of their own descriptions that may need, that could be changed a little bit to perhaps reduce something like bias that they have in a description. There is really lots of exciting opportunities in terms of authoring descriptions as well as making the end descriptions a little bit better. Yeah.

CARLOS DUARTE: Great, thanks a lot. Shivam.

SHIVAM SINGH: I see a bit more of an opportunity now than earlier because now model engines are advanced. I see a good context aware solution giving you faster processing of data and it works on text, videos and audio. This could be a reality. A good use case I have been following also, it is how to make the academic textbooks, academic assignments, they have multiple graph, all associated data, if some models could create better understanding of those things, it would help a lot of people in understanding who has difficulties, or maybe in absence of good quality descriptions of these charts, I see this happening in the next few years. As a closing comment, I would say there are different sets of consumers of media, right. Some can read but not comprehend, some can comprehend easily and have difficulty consuming it visually. In that sense, the coming, NLP technology, it would help designers of contextual description of outputs and that I will say in simple terms, if you give me a simple efficient output that’s familiar, aesthetic, it would be the pinnacle of what I see as NLP. These are for natural language processing, understanding as well as generation for all technologies.

CARLOS DUARTE: Thank you. Exciting times ahead, definitely. Michael, you want to share your vision?

MICHAEL COOPER: Based on my knowledge of how machine learning and how present day works, the tools tend to be more focused on specific abilities which means that the context is isolated. I think I’m speaking more of the person working in the field, recognizing a need that may not be a technological potential, but in the Internet of Things used as APIs to exchange data between different types of devices and if we can model some structure and share context with each other, the tools, and negotiate a better group description, I think that may be an opportunity for early evolution of this field. Long term, of course, tools will emerge with greater sense of context built in, but that will probably be another tier/similarity/whatever, that’s my view on the near term future based on my knowledge.

CARLOS DUARTE: Good suggestions to look at also. Shaomei.

SHAOMEI WU: Yeah. Looking into the future, I can see kind of two areas that I think will have a lot of potentials. First one, it is from the technology perspectives which I agree with my colleagues, I see a lot of gain in kind of incorporating the context surrounding photos and by taking advantage of the recent progress in deep learning models that handles the modal representations spaces. So we can embed both the image as well as the kind of text surrounding it and then going through the metadata, the author, you have the time where the photo was taken or posted. A lot of those, they can be kind of joined and represented in a sufficient space that provides us a lot more than just visual information alone. I think that’s kind of a big technology breakthrough that we can see in the near term future. The kind of second thing, I think it is more important to me, it is a use case perspectives. I think right now when we think about or talk about the media accessibility, we are mostly thinking about the consumption case, how do we help somebody that cannot see to kind of, to consume the photos that are posted by others, and mostly posted by sighted folks. I think it is equally important but largely kind of overlooked, it is a media creation use cases, so how can we support people with visual impairment to create and to kind of share photos and videos.

In my own work, into these use cases, which is why there is such a gap in what the current technology can do, for example, all of the modern AI models, they really failed when it came to processing photos taken by people with visual impairments because they’re just not in the same kind of photo that are used and share. You know, there is a huge gap in what kind of current, like, the current fundamentals of those models and what they can do. Then second, it is there is a need for a lot more personalized and aesthetic needs. I take 10 selfies, I want to find the one that I want to post to kind of share who I am and that is , we cannot do, you know, we can kind of tell you, okay, you have ten photos, and they are all kind of containing your face inside, but then how can we share the models that really kind a represent somebody’s taste and somebody’s kind of aesthetics and that’s another interesting future development that I want to see. That’s all.

CARLOS DUARTE: Thank you so much, Shaomei Wu. I think we only have 4 minutes more. I won’t risk another question because we need to end at the top of the hour. I will take the opportunity to once again thank our panelists. I hope everyone enjoyed it as much as I did. It was really interesting, very optimistic perspectives so that we can see that it is not just the more risky or risk enabling outputs that AI can have. It is nice to have this perspectives. Thank you once again, Shivam Singh, Amy Pavel, Shaomei Wu, Michael Cooper, it was brilliant to have you here.

Thanks, everyone who attended. We’ll be back tomorrow starting at the same time, 3:00 p.m. Central European time. I thank you to those attending especially on the West Coast of the U.S. where it is really early and also India, I guess it is the other way around, right, where it is really late, Shivam, thank you all for joining. So as I was saying, tomorrow we’ll start at the same time, we’ll have another two panels, first panel on machine learning for web accessibility evaluation and in the second panel we will come back to the topic of natural language processing but now focusing on accessible communication and we’ll close with what I’m sure will be another really interesting keynote, from Shari, and looking forward to a similar discussion between Shari and Jutta Treviranus at the end of the keynote. Thank you again, Jutta Treviranus, for your informative, provoking keynote. I hope to see you all tomorrow. Good bye!

Panel: Machine Learning for Web Accessibility Evaluation

Willian Massami Watanabe (Universidade Tecnológica Federal do Paraná, BR)
Yeliz Yesilada (Middle East Technical University, TR)
Sheng Zhou (Zhejiang University, CN)
Fabio Paternò (CNR-IST, HIIS Laboratory, IT)

This panel featured researchers discussing web accessibility assessment and the challenges it presents. The panelists highlighted various obstacles, including the diversity and fast-changing nature of dynamic elements on web pages, the complexity of data collection and user requirements, and the subjectivity of some current evaluation rules. They emphasized the need for datasets specifically designed for accessibility and the incorporation of additional factors into the sampling process. The panelists also discussed how AI can support conformance assessment to accessibility guidelines, such as selecting representative samples and evaluating them using machine learning algorithms. AI was seen as a valuable tool for repairing accessibility issues, distinguishing complex web structures, and supporting developers in creating accessible products. The panelists envisioned future applications of machine learning techniques in web accessibility evaluation, including user interaction classification, simulating user interactions, and efficient page sampling. They emphasized the importance of scalability, automating evaluations, and generalizing approaches to integrate accessibility more easily. The development of automated testing and problem fixing, along with the availability and sharing of datasets, were seen as crucial for advancing research in AI and accessibility. Addressing bias in media accessibility was also highlighted as an important consideration for inclusive development and assessment.

Transcript of Machine Learning for Web Accessibility Evaluation

CARLOS DUARTE: So, we have Willian Massami Watanabe, from the Universidade Tecnologica Federal do Parana in Brazil. We have Yeliz Yesilada from the Middle East Technical University. We have Sheng Zhou from Zhejiang University in China. I hope I pronounced it correctly. And Fabio Paterno from CNR-IST, HIIS Laboratory in Italy. Okay. Thank you all for joining us. And for some of you, it is earlier in the morning. For others of you it is later, well, for some of you, I guess, it is really late in the evening, so, thank you all for your availability.

Let’s start this discussion on how, I would say, current machine learning algorithms and current machine learning applications can support or can improve methodologies for automatically assessing Web Accessibility. And, from your previous works, you have touched different aspects about how this can be done. So, machine learning has been used to support Web Accessibility Evaluation through different aspects, just a sampling, such as metrics, such as evaluation prediction and handling dynamic pages. I understand not all the domains you have worked on those, but some of you have worked on specific domains, so, I would like you to focus on the ones that you have been working more closely in. For us to start, just let us know, what are the current challenges that prevent further development and prevent further use of machine learning or other AI techniques in these specific domains. Okay? I can start with you, Willian.

WILLIAN WATANABE: First of all, thank you so much for everything that is being organized. Just to give you some context, I am Willian, I’m a professor here in Brazil, where I work with web accessibility. My research focuses is on web technology, the ARIA specification, to be more specific, and just in regard to everything that has been said by Carlos Duarte, my focus is on evaluation prediction, according to the ARIA specification. I believe the main, I was invited to this panel considering my research on identification of web elements on web applications. The problem I address is identifying components in web applications, when we implement web applications, we use same structured language such as HTML. My job is to find what these elements in the HTML structure represent in the web page. Like, they can represent some specific type of widget, there are some components, some landmarks that we need to find on the web page, and this is basically what I do.

So, what I have been doing for the last year, I have been using machine learning for identifying these elements. I use supervised learning and I use data provided by the DOM structure of the web application. I search for elements in the web page and I cross file them as an element or a widget or anything else. The challenge in regards to that, they are kind of different from the challenges that have been addressed yesterday. Yesterday applications of machine learning, I think they work with video and text that are unstructured data, so they are more complicated, I would say. The main challenge that I address in my research is associated to data acquisition, data extraction, identifying what kind of features I should use to identify those components in web applications. Associated to that I should say, to summarize, my problems are associated with the diversity of web applications, there are different domains and this kind of bias, any dataset that we use, it is difficult for me, for instance, to identify a number of websites that implement, that represent all the themes of websites that can be used in web applications, variability in the implementation of HTML and JavaScript, and the use of automatic tools to extract this data, such as WebDriver API, the DOM structure dynamics, annotation observers. There are a lot of specifications that are currently being developed that I must use, and I always must keep my observing to see if I can use them to improve my research. And, lastly, there is always the problem of manual classification in AI for generating the datasets that I can use. That is it, Carlos. Thank you.

CARLOS DUARTE: Thank you, Willian. Thank you for introducing yourself, because I forgot to ask all of you to do that. So, in your first intervention, please give us a brief introduction about yourself and the work you are doing. So, Yeliz, I will follow with you.

YELIZ YESILADA: Hi, everybody. Good afternoon. Good afternoon for me. Good afternoon, everybody.I’m Yeliz, I’m associated professor at Middle East Technical University in Northern Cyprus Campus, I’ve been doing accessibility web research for more than 20 years now. Time goes really fast. Recently I have been exploring machine learning and AI, specifically, for Web Accessibility, supporting Web Accessibility from different dimensions.

Regarding the challenges, I think there are, of course, many challenges, but as Willian mentioned, I can actually say that kind of the biggest challenge for my work has been data collection. So, I can actually say that data, of course, is critical, as it was discussed yesterday in the other panels. Data is very critical for machine learning approaches. For us, collecting data, making sure that the data is representing our user groups, different user groups, and not biasing any user groups, and also, of course, preparing and labeling the data, certain machine learning algorithms, of course, supervised ones, require labeling. Labeling has also been a challenge for us, because sometimes certain tasks, it is not so straightforward to do the labeling. It is not black and white. So, it has been a challenge for us, I think, in that sense.

And the other two challenges I can mention, I think the second one is the complexity of the domain. When you think about the Web Accessibility, sometimes people think, oh, it is quite straightforward, but it is actually a very complex domain. There are many different user groups, different user requirements, so, understanding those and making sure that you actually address different users and different requirements is quite challenging. And, since we also are working, this is last one that I wanted to mention, since we are also working with web pages, they are complex. They are not well designed or well properly coded. As we always say, browsers are tolerating, but for developing algorithms, machine learning algorithms, they also have to deal with those complexities, which makes the task quite complex, I think. So, just to wrap up, I think, in my work, there are three major challenges. Data, or the lack in quality of data. Complexity of the domain, different users and different user requirements. And the complexity of the resources we are using. So, web pages, the source codes and the complexity of pages that are not conforming to standards. I think they are really posing a lot of challenges to algorithms that we are developing. So, these are all I wanted to say.

CARLOS DUARTE: Thank you, Yeliz. A very good summary of major challenges facing everyone that works in this field. So, thank you for that. Sheng, I wanted to go with you next.

SHENG ZHOU: Thank you, Carlos. Hi, everyone. I am Sheng Zhou from Zhejiang University in China. From my point of view, I think three challenges occurs currently now, first, I totally agree that it is hard to compare labels for model training. Since the success of machine learning heavily relies on a large number of label data, however assessing these label data usually takes a lot of time, which is hard to realize, especially in the accessibility domain. I want to take a, I am sorry, I am a little bit nervous here. Sorry. I want to take the WCAG rule, the image or text, as an example, as we discussed in the panel yesterday. Most of the current image captioning or OCR methods are trained on images dataset, rather than the image, like a logo that is essential in text alternative, the label for Web Accessibility solution should fully consider the experience of different populations. There are very few datasets that are specifically designed for the accessibility of evaluation tasks and satisfies the requirements. So, the machine learning models that are trained, or traditional models cannot be aware, generalized to accessibility evaluation. Second one, I think, about the web page sampling, since I had a little bit of work on this, I think currently, there are much factors that affect the sampling subject.

First, sampling has been a fundamental technique in Web Accessibility Evaluation when dealing with millions of pages. The previous page sampling methods are usually based on the features of each page of such elements or the DOM structure. Similar features are assumed to generated by the same development framework and have similar accessibility problems. However, with the fast growth of web development framework, pages have developed with diverse tools, for example, pages that look very similar may be developed by totally different framework, and some pages that look totally different may be developed by the same framework. This poses great challenges for feature-based web accessibility evaluations. It is necessary to incorporate more factors into the sampling process, such as the connection typology among pages, and visual similarity, and typesetting. So, how to identify similarity between pages considering multiple factors into a unified sampling probability is critical for sampling. I think this can be a problem that the literature, the graph typology, could try to understand, and metric learning, which is a comprehensive research program.

So, the third one, the third challenge, I think is the subjective evaluation rules. When we evaluate the Web Accessibility, there are both subjective and objective rules, right? So, for example, when evaluating the WCAG success criteria 1.4.5, image of text, the image is expected to be associated with accurate description texts which has been discussed in the panel yesterday. It is still challenging to verify the matching between the (speaking paused)

CARLOS DUARTE: I guess there are connection issues? Let’s see. Okay. He has dropped. We will let Sheng, okay, he is coming back, so, you are muted.

SHENG ZHOU: Sorry.

CARLOS DUARTE: It is okay. Can you continue?

SHENG ZHOU: Okay, okay. I am so sorry. I think there are three challenges under the first challenges, as same as Yeliz described, it is hard to…

CARLOS DUARTE: You dropped when you were starting to talk about the third challenge.

SHENG ZHOU: Okay.

CARLOS DUARTE: We got the first and second challenge. We heard that loud and clear, so now you can resume on the third challenge.

SHENG ZHOU: Okay, okay. So, the third challenge is the subjective evaluation rules. There are both subjective and objective rules. For example, when evaluating the WCAG success criteria 1.4.5 image of text, the image is expected to be associated with accurate description text, as discussed in the panel yesterday, it is still challenges to verify whether the matching between image with text since we do not have access to the ground truth of the text of image. So, I think (video freezing)

CARLOS DUARTE: Apparently, we lost Sheng again. Let’s just give him 10 seconds and see if he reconnects, otherwise we will move on to Fabio. Okay. So, perhaps it is better to move on to Fabio and get the perspective of also someone who is making an automated accessibility evaluation tool available, so it is certainly going to be interesting, so, Fabio, can you take it from here?

FABIO PATERNO: Yes. I am Fabio Paterno. I’m a researcher in the Italian National Research Council where I lead the Laboratory on Human Interfaces in Information Systems. We have now a project funded by the National Recovery and Resilience Plan which is about monitoring the accessibility of the public administration websites. In this project we have our tool, MAUVE, which is a tool open, freely available, and it has already more than 2000 registered users. Recently, we performed the accessibility evolution of 10000 websites and considered grounded pages for each website, obviously it was an effort .So, we were very interested in understanding how machine learning can get passed in this larger scale monitoring work. So, for this panel, I did a systematic literature review, and I went to the ACM digital library, I entered machine learning and accessibility evaluation to see what has been done so far. I got only 43 results, which is not too many, I would expected more, and actually only 18 actually applied because other works were more about machine learning can be interesting in future work and so. To say the specific research effort has been so far limited in this area. And another characteristic was that there are other valid attempts. There are people trying to predict web site accessibility based on the accessibility of some web pages, others trying to check the rules of the alternative description, and trying to make the user control the content areas. So, I would say challenge is, well, machine learning can be, you know, used for a complementary support to automatic tools that we already have. There are many, in theory there are many opportunities, but in practice… there are a lot of progress. The challenge I think is to find the relevant one with the accessibility features that are able to collect the type of aspect that we want to investigate.

And I would say the third and last main general challenge is that we really, really want to continuously work with the changes not only the web but also how people implement, how people use the application, this continuously change. So, there is also the risk that the dataset will become obsolete, not sufficiently updated for addressing all the methods for that.

CARLOS DUARTE: Okay, thank you for that perspective. Sheng, I want to give you now the opportunity to finish up your intervention.

SHENG ZHOU: Okay. Thank you, Carlos. Sorry for the lagging here. So, I will continue my third opinion of the challenge. From my opinion, the third challenge is the subjectivity evaluation rules. In relation to Web Accessibility, there are subjective and objective rules, and for example, when evaluating, an image to text rule. The image is expected to be associated with the accurate description text. And as discussed in the panel yesterday, it is still challenging to verify the matching between the image and the text, since there are no ground truth of what kind of text should describe the image. As a result of the accessibility evaluation system, it is harder to justify whether the alternative text really matches the image. So, thanks.

CARLOS DUARTE: Okay, thank you. I will take it, from, I guess, most of you, well, all of you have in one way or another mentioned one aspect of Web Accessibility Evaluation, which is conformance to requirements, to guidelines. Several of you mentioned the web content Accessibility Guidelines in one way or another. Checking, what we do currently, so far, and following up on what Sheng just mentioned, are objective rules. That is what we can do so far, right? Then when we start thinking about, because the guidelines are themselves also subject to subjectivity, unfortunately. How can we try to make the evaluation of those more subjective guidelines, or more subjective rules, and how do you all think that Artificial Intelligence, algorithms, or machine learning-based approaches can help us to assess conformance to those technical requirements to Accessibility Guidelines? Okay? I will start with you, now, Yeliz.

YELIZ YESILADA: Thank you, Carlos. So, regarding the conformance testing, so, maybe we can actually think of this as two kinds of problems. One is the testing, the other one is confirming, basically repairing, or automatically fixing the problems. So, I see, actually, that machine learning and AI in general can, I think, help in both sides, in both parties. So, regarding the testing and auditing, if we take, for example, WCAG Evaluation Methodology as the most systematic methodology to evaluate for accessibility, it includes, for example, five stages, five steps. So, I think machine learning can actually help us in certain steps.

For example, it can help us to choose a representative sample, which is the third step in WCAG-EM. We are currently doing some work on that, for example, to explore how to use unsupervised learning algorithms to decide, for example, what is a representative sample. Fabio, for example, mentioned the problem of evaluating a large-scale website with millions of pages. So, how do you decide, for example, which ones to represent, I mean, which ones to evaluate. Do they really, for example, if you evaluate some of them, how much of the site you actually cover, for example. So, there, I think, machine learning and AI can help. As I said, we are currently doing some work on that, trying to explore machine learning algorithms for choosing representative samples, making sure that the pages that you are evaluating really represent the site, and reduces the workloads, because evaluating millions of pages is not an easy task, so maybe we can pick certain sample pages.

Once we evaluate them, we can transfer the knowledge from those pages to the other ones, because more or less the pages these days are developed with templates or automatically developed, so, maybe we can transfer the errors we identified, or the ways we are fixing to the others which are representative. Regarding the step four in WCAG-EM, that is about auditing the select sample, so how do you evaluate as test the sample, I think in that part, as we all know, and Sheng mentioned, there are a lot of subjective rules which require human testing. So, maybe there we need to explore more how people, I mean, how humans evaluate certain requirements, and how we can actually automate those processes. So, can we have machine learning algorithms that learn from how people evaluate and assess and implement those. But, of course, as we mentioned in the first part, data is critical, valid data, and quality of data is very critical for those parts.

Regarding the repairing, or automatically fixing certain problems, I also think that machine learning algorithms can help. For example, regarding the images Sheng mentioned, we can automatically test whether there is an Alt Text or not, but not the quality of the Alt Text, so maybe there we can explore more, do more about understanding whether it is a good Alt Text or not, and try to fix it automatically by learning from the context and other aspects of the site. Or, I have been doing, for example, research in complex structures like tables. They are also very difficult and challenges for accessibility, for testing and for repairing. We have been doing, for example, research in understanding whether we can differentiate, and learn to differentiate a layout table from a data table, and if it is a complex table, can we actually, for example, learn how people are reading that and guiding the repairing of those. We can, I guess, also do similar things with the forms. We can learn how people are interacting with these forms and complex structures with the forms like reach and dynamic content like Willian is working on. Maybe we can, for example, do more work there to automatically fix, which can be encoded in, let’s say, authoring tools or authoring environments, that include AI, without the developers noticing that they are actually using AI to fix the problems. So, I know I need to wrap up. I think I would say contributing two things, both testing and repairing can help.

CARLOS DUARTE: I agree. Some of the things you mentioned, they can really be first steps. We can assist a human expert, a human evaluator, and take away some of the load. That is also what I take from the intervention. So, Fabio, I would like your take on this, now.

FABIO PATERNO: I mean, I think ideally what Yeliz said before. We have to be aware of the complexity of accessibility evaluation. Because just think about WCAG 2.1. It is composed of 78 success criteria, which are associates with hundreds of techniques, specific validation techniques, so, this is the current state and it seems like it is going to increase the number of techniques and so on. So, the automatic support is really fundamental.

And, secondly, when you use automatic support, the results of the check are to be ok, this pass, this fails, or cannot tell. So, one possibility that I think would be interesting is how to explore machine learning in the situation in which automatic solution is not able to deterministically provide an ok or fail, these could be an interesting opportunity to also explore in other European projects. Ideally this would have a group, accessibility, human accessibility expert, in this case to provide the input, and then to try to use this input to train an intelligent system. And then if it was not possible to validate these solutions, but for sure, it might be really easy for AI to detect whether an alternative description exists, but it is much more difficult to say whether it is meaningful.

So, in this case, for example, I have seen a lot of improvement of AI in recognizing images and the content keys, I have also seen some of (Muffled audio). You can think in a situation in which AI provides the descriptors and then there is some kind of similarity checking between these automatic generated descriptions and the ones being provided by the developer and see in what extent these are meaningful. This is something I think is possible, what I’m not sure is how much we can find a general solution. I can see this kind of AI, associated with some level of confidence, and then I think is a part of the solution let the user decide what should be level of confidence that is acceptable, when these automatic supporters use it to understand the way the description is meaningful. So that would be the direction where I would try from a perspective of people working on tools for automatic evaluation, trying to introduce AI inside of such an automatic framework. But another key point we have to be aware of is the transparency. When we are talking about AI, we are talking about the Blackbox, there is a lot of discussion about explainable AI. Some people say AI is not able to explain why this data generated this result, or how can we change it to obtain different results (Muffled audio), so this is a question that people encounter when they happen to run an evaluation tool.

And also, in addition to the study about the transparency of tools, the tools that are now available, it was published on ACM Transactions in computing anything about that often these tools are a little bit Blackboxes, they are not sufficiently transparent. For example, they say, we support these success criteria, but they do not say which techniques they actually apply, how these techniques are implemented. So, they say that often the users are in disadvantage because they use different tools and get different results, and they do not understand the reason for such differences. Let’s say this is point of transparency is already for now, with such validation tools that do not use AI. We have to be carefully that if it is added AI it should be added in such a way that is explainable, so we can help people to better understand what happened in the evaluation and not just give the results without any sufficient explanation.

CARLOS DUARTE: I think that is a very important part, because if I am a developer, and I am trying to solve accessibility issues, I need to understand why is there an error, and not just that there is an error. That is a very important part. Thank you, Fabio. So, Sheng, next, to you.

SHENG ZHOU: Thanks. Incorporating the artificial intelligence, I will try to find some way to help the developers. First of all is the code generation for automatically fixing the accessibility problems. As Yeliz just said, always web accessibility evaluation has been targeted, but we have to stand at the view of the developers. If it is the evaluation system only identifies or located the accessibility problem, it may be still hard for developers to fix these problems since some developers may lack experience on this. And the recently artificial intelligence based code generation has been well developed and give some historical code of fixing accessibility problems. We have tried to train artificial intelligence model to automatically detect the problem, make a code snippet, fix the problem code and provide suggestions for the developers. We expect this function could help the developers fix the accessibility problem and improve the websites more efficiently.

And the second reason for the developer is the content generation. As discussed in the panel yesterday, there have been several attempts in generating text for images or videos with the help of the computation vision and NLP techniques. It may not be very practical for the images generators to provide an alt text since the state of art methods requires large models deployed on GP servers which is not convenient for frequently updated images. Recently we have been working on some knowledge distillation method, which aims at distilling a lightweight model from a large model. We want to develop a lightweight access model that can be deployed in the broader extension, or some like lightweight software. We hope to reduce the time cost and competition cost of image providers and encourage them to conform to the accessibility technique or requirements. Okay. Thank you.

CARLOS DUARTE: Thank you. That is another very relevant point. Make sure that whatever new techniques we develop are really accessible to those who need to use them. So the computational resources are also a very important aspect to take into account. So, Willian, your take on this, please.

WILLIAN WATANABE: First, I would like to take from what Yeliz said, that we have basically, it is nice to see everyone agreeing, before we didn’t talk at all so it is nice to see that everywhere is having the same problems. And, about what Yeliz said, she divided the work into automatic evaluation into two steps. The first one is testing, and the second one is automatically repairing accessibility in websites. From my end, specifically, I don’t work with something, I would say subjective, like image content generation. My work mostly focuses on identifying widgets, it is kind of objective, right? It is a dropdown, it is not a tooltip… I don’t need to worry to be sued over a bad classification, or something else. So, that is a different aspect of accessibility that I work on. Specifically, I work with supervised learning, as everyone, I classify the elements as a specific interface component. I use features extracted from the DOM structure to, I think everyone mentioned this, Sheng mentioned it, as well, Yeliz mentioned the question about labels and everything else.

I am trying to use data from websites that I evaluate as accessible to enhance the accessibility of websites that I don’t, that don’t have these requirements. For instance, I see a website that implements rules, that implements the ARIA specification. So, I use it. I expect data from it to maybe apply it on a website that doesn’t. This is kind of the work that I am working, this is kind of what I am doing right now.

There is another thing. So, Fabio also mentioned the question about confidence. I think this is critical for us. In terms of machine learning, I think the word that we use usually is accuracy. What will guide us, as researchers, whether we work on test or automatically repair is basically the accuracy of our methodologies. If I have a lower accuracy problem, I will use a testing approach. Otherwise, I will try to automatically repair the web page. Of course, the best result we can get is an automatic repair. This is what will scale better for our users, ultimately offer more benefit in terms of scale. I think that is it. Everyone talked about everything I wanted to say, so this is mostly what I would say differently. This is nice.

CARLOS DUARTE: Okay. Let me just, a small provocation. You said that, in your work, everything that you work with widget identification is objective. I will disagree a little bit. I am sure we can find several examples of pages where you don’t know if that is a link or a button, so there can be subjectivity in there, also. So, yes. But just a small provocation, as I was saying.

So, we are fast approaching, the conversation is good. Time flies by. We are fast approaching the end. I would ask you to quickly comment on the final aspect, just one minute or two, so please try to stick to that so that we don’t go over time. You have already been in some ways approaching this, but just what do you expect, what would be one of the main contributions, what are your future perspectives about the use of machine learning techniques for web accessibility evaluation. I will start with you now, Fabio.

FABIO PATERNO: Okay. If I think about a couple of interesting, you know, possibilities opened up, about machine learning. When we evaluate a user interface, generally speaking we have two possibilities. One is to look at the code associated, the generated interface and see whether it is compliant with some rules. And another approach is to look at how people interact with the system. So, look at the levels of user interaction. In the past we did some work where we created a tool to identify various usability patterns, which means patterns of interaction that highlight that there is some usability problem. For example, we looked at mobile devices where there’s a lot of work on (?) machine, that means that probably the information is not well presented, or people access computers in different (?) it means the (?) are too close. So, it is possibly to identify a sequence of interaction that highlight that is some usability problem. So, one possibility is to use some kind of machine learning for classifying interaction with some Assistive Technology, that highlights this kind of problem. So, allow us from the data (?), yes, there are specific accessibility problems.

And the second one is about, we mentioned before, the importance of providing an explanation about a problem, or why it is a problem, and how to solve. So, that would be, the idea in theory, an ideal application for a conversational agent. Now there is a lot of discussion on this, about ChatGTP, but is very difficult to actually design, in this case, a conversational agent that is able to take into account the relevant context, which in this case is the type of user that is actually now asking for help. Because now there are really many types of users, when people look at accessibility results, that can be a web commission, the person who decide to have a service but doesn’t know anything about its implementation, then the user, the developer, the accessibility expert, each of them require a different language, different terms, a different type of explanation, because one day, I look “is this website accessible?”. They really have different criteria in order to understand the level of accessibility and how to operate it in order to improve it. So, this is one dimension of the complexity.

The other dimension of the complexity is the actual implementation. It is really not, this (?) we are conducting in our laboratory (?). It is really amazing to see how different implementation languages, technical components that people use in order to implement the website. Even people that use the same JavaScript frameworks, they can use it in very different ways. So, when you want to provide an explanation, of course, there is a point just providing the standard, the description, the error, some of the standards examples, how to solve the problem, because fften there are different situations that require some specific system consideration for explaining how, or what can be done. But this complex conversational agent for accessibility, it would be a great result.

CARLOS DUARTE: Thank you, Sheng?

SHENG ZHOU: In the sake of time, I will talk about the future perspective about the efficient page sampling. According to our data analysis, we found that the pages, the web pages with similar connection structures with other pages visually have some similar accessibility problem. So, we try to take this into account for the accessibility evaluation. And recently we used a graph knowledge that works, that has been a hot research topic in the machine learning community. It combines both the network topology and the node attributes, to an only unified representation for each node. Each node (frozen video)

CARLOS DUARTE: Okay. I guess we lost Sheng again. In the interest of time, we will skip immediately to you, Willian.

WILLIAN WATANABE: Okay. My take on this, I think it will be pretty direct. I think Fabio talked about it, but we are all working with specific guidelines, a set of Accessibility Guidelines of WCAG. And I think the next step that we should address is associated to generalization, and incorporating it into relevant products, just incorporating any automatic evaluation tool. So, in regard to all the problems that we mentioned, data acquisition, mental classification, we had to find a way to scale our experiment so that we can guarantee it will work in any website.

In regards to my work, specifically, I think that are some, I’m trying to work on automatic generation for structure websites, for instance, generating heading structures and other specific structures that users can use to righteous and automatically enhance the accessibility of the web page. I think that is it. In regard to what you said, Carlos, just so that I can clear myself, what I wanted to say is that, different from the panelists from yesterday, and different from Chao, for instance, I think I am working with a similar machine learning approach. I don’t use deep learning, for instance. Since I don’t see the use for it yet, in my research, because for my research I think it is mentioned that she might use for labeling and other stuff, data generation. I haven’t reached that point yet. I think there are a lot of things we can do just with classification, for instance. That is it.

CARLOS DUARTE: Okay, thank you, Willian. Yeliz, do you want to conclude?

YELIZ YESILADA: Yes. I actually, at least I hope, that we will see developments, again, in two things. I think the first one is automated testing. I think we now are at the stage that we have many tools and we know how to implement and automate, for example, certain guidelines, but there are a bunch of others that they are very objective, they require human evaluation. It is very costly and expensive, I think, from an evaluation perspective. So, I am hoping that there will be developments in machine learning and AI algorithms to support and have more automation in those ones that are really now requiring a human to do the evaluations. And the other one is about the repairing. So, I am also hoping that we will also see developments in automating the kind of fixing the problems automatically, learning from the good examples, and being able to develop solutions while the pages are developed, they are actually automatically fixed. And, sometimes, maybe seamless to the developers so that they are not worried about, you know, certain issues. Of course, explainability is very important, to explain to developers what is going on. But I think automating certain things there would really help. Automating the repairment. Of course, to do that I think we need datasets. Hopefully in the community we will have shared datasets that we can all work with and explore different algorithms. As we know, it is costly. So, exploring and doing research with existing data, it helps a lot.

So, I am hoping that in the community we will see public datasets. And, of course, technical skills are very important, so human-centered AI I think is needed here and is also very important. So hopefully we will see more people contributing to that and the development. And, of course, we should always remember, as Jutta mentioned yesterday, the bias is critical. When we are talking about, for example, automatically testing, automating the test of certain rules, we should make sure we are not bias with certain user groups, and we are really targeting everybody in different user groups, different needs and users. So, that is all I wanted to say.

CARLOS DUARTE: Thank you so much, Yeliz. And also, that note I think is a great way to finish this panel. So, thank you so much, the four of you. It is really interesting to see all those perspectives and what you are working on and what you are planning on doing in the next years, I guess.

Let me draw your attention. There are several interesting questions on the Q&A. If you do have a chance, try to answer them there. We, unfortunately, didn’t have time to get to those during our panel. But I think that are some that really have your names on it. (Chuckles) So, you are exactly the correct persons to answer those. So, once again, thank you so much for your participation. It was great.

We will now have a shorter break than the ten minutes. And we will be back in 5 minutes. So, 5 minutes past the hour.

Panel: Natural Language Processing for Accessible Communication

Chaohai Ding (University of Southampton, UK)
Lourdes Moreno (Universidad Carlos III de Madrid, ES)
Vikas Ashok (Old Dominion University, US)

The last panel of the symposium focused on NLP for accessible communication. The panelists discussed the challenges hindering breakthroughs in this field. Chaohai Ding highlighted the lack of data availability for AAC systems, as they require large amounts of user data and AAC data. Another challenge is the lack of data interoperability in AAC symbol sets. Cultural differences and personalization are also important considerations. Lourdes Moreno emphasized the need to address bias, particularly disability bias, in language models. She also highlighted the scarcity of datasets related to accessibility. Vikas Ashok discussed the understandability of social media content for blind individuals and the challenges of bias in natural language models. The panelists explored the issues of disability bias, accountability, and personalization in NLP tools. They discussed the importance of considering the target audience’s knowledge and the need for data management. Future perspectives included the exploration of NLP metrics for accessibility, advancements in dialog systems, personalized communication, accessible modal communication, and AI assistant communication. Language simplification, data integration, and evolving apps were also mentioned as opportunities. The panelists addressed the challenge of collecting more data for accessible communication, suggesting approaches such as creating larger datasets, data repositories, and involving human experts in data generation.

Transcript of Natural Language Processing for Accessible Communication

CARLOS DUARTE: Hello, everyone. Welcome back to the second panel. I am now joined by Chaohai Ding from the University of Southampton, Lourdes Moreno of the Universidad Carlos III de Madrid in Spain, and Vikas Ashok from the Old Dominion University in the US. It is great to have you here. As I said before, let’s bring back the topic of natural language processing. We have addressed yesterday, but not from the perspective of how it can be used to enhance Web Accessibility on the web.

So, now, similarly to what I’ve done in the first panel, you have been working on different aspects of this large domain of accessible communication. You have pursued advances in machine translation, in Sign Language, AAC, so from your perspective and your focus on the work, what are the current challenges that you have been facing and that are preventing the next breakthrough, I guess. Also, I would like to ask you to, for your first intervention, also, to do a brief introduction to yourself and what you have been doing. Okay? I can start with you, Chaohai.

CHAOHAI DING: Hi. Thank you for having me today. I am a senior research fellow at the University of Southampton. My research interest is on AI and inclusion, which includes Data Science and AI tactics to enhance accessible learning, travelling and communication. So, yes, we use, AI has been widely used in our research to support accessible communication. Currently we are working on several projects on AAC. For example, we applied the concept map, not the single knowledge graph, to interlinking AAC symbols from different symbol sets. This can be used for symbol-to-symbol translation. And we also adapted a NLP model to translate the AAC single sequence into spoken text sequence.

So, those are the two projects we are working on currently. We are also working on an accessible e-learning project that applies the machine translation to provide transcripts from English to other languages for our international users. So, that is another scenario we are working with machine translation for accessible communication. So, there are a few challenges we have identified in our kind of research. The first one is always the data. Data availability and data optimality. So, as you know, NLP models are a large amount of data, especially for AAC.

We are, well, one of the biggest challenges is the lack of data, like user data, AAC data, and also how the user interacts with the AAC. So, also, we have several different AAC single sets used by the different invidious which makes it very difficult to develop NLP models, as well, because the AAC symbols are separated from each single set. And another challenge, the lack of data interoperability in AAC symbol sets. The third challenge we are identifying is the inclusion. Because we are working on AAC single sets from Arabic, English and Chinese. So, there are cultural and social differences in AAC singles, which is important to consider the needs of different user groups under the cultural and social factors and to involve them in the development of NLP models for AAC.

The first one is data privacy and safety. This has been identified in our web application from AAC symbols to spoken text. So, how do we, if we want to more accurately, or more personalized application, we need the user’s information. So, the challenge is how do we store this personal information and how do we prevent the data misuse and the bridge and how to make the tradeoff between the user information and the model performance.

The last one is always the accessible user interface, and how to make this AI power tool, and NLP power tools accessible for end users. And also there are more generic issues in AI like accountability, explainability. So I think that is the list of challenges we have identified in our current research. Thank you.

CARLOS DUARTE: Thank you. A great summary of definitely some of the major challenges that are spread across the entire domain. Definitely. Thank you so much. Lourdes, do you want to go next?

LOURDES MORENO: Thank you. Thank you for the invitation. Good afternoon, everyone. I am Lourdes Moreno, I work as an Associate Professor in the Computer Science Department at the Universidad Carlos III de Madrid in Spain. I am an accessibility expert. I have been working in the area of technology for disability for 20 years. I have previously worked on sensory disability, and currently I work on cognitive accessibility. In my research areas, I combine method from Human Computer Interaction and Natural Language Processing areas, to obtain accessible solutions from the point of the view of reliability and the stability of the language in the user interface.

So, the question currently in natural language research is being developed at our language model, in recent years there have been many advances due to the increase in resources, such as large datasets and cloud platform that allow the training of large models. But the most crucial factor is the use of transforming technology, and the use of transfer learning. These are method based on the learning to create language model based on the neural network. They are universal models, but they support different natural processing language tasks. Such as questions and answering and translations, summarization, speech recognition, and more. The most expensive use models are the GPT from OpenAI, and Bearly from Google. But new and bigger models continue to appear, and out outperform previous ones, because their performance continues to scale as more parameters are added to the models and more data are added.

However, despite these great advances, there are issues in the accessibility scope and challenges to address. One of them is bias, language models have different type of bias, such as gender, race, and disability. But a gender and race biases are highly analyzed. However, it isn’t the case with disability biases. It has been relatively under-explored. There are studies related to these models, for example, in these words, in the sentiment analysis text, the terms related to disability have a negative value. Or in another work, you see a model to moderate conversation classifying text with mention to disability as more toxics. That is, algorithms are trained to give results that can be offensive and cause disadvantage to individuals with disabilities. So, an investigation is necessary to study that model to reduce biases. We cannot only use this language model and directly use the outcome.

Another problem with these models is that there aren’t too many datasets related to the accessibility area. So, this time there are a few labels corpora to be used in training simplification algorithms, lexical or syntactic simplification, in natural language processing. I work in cognitive accessibility in Spanish, to simplify text to plain language, easy reading language. To carry out this task we have created a corpus with expert initial reading and with participation of older people and with People with Disabilities, intellectual disabilities, because the current corpora have been created with non experts in disability and non experts in plain language and they haven’t taken into account people with disabilities. Also, efforts devoted to solving the scarcity of resources are required in languages with low resources. English is the language we’ve more developed with many natural language processing, but others, such as Spanish, have hadn’t many resources. We need systems trained for English language words and for Spanish, as well. Finally with the proliferation of GPT models and its applications, such as ChatGPT, another problem to address is the regulation and ethical aspect of Artificial Intelligence.

CARLOS DUARTE: Thank you so much, Lourdes. Definitely some very relevant challenges in there. Vikas, I will end this first talk with you.

VIKAS ASHOK: Thank you. I’m Vikas Ashok, from Old Dominion University, Virginia, in the United States. I have been working, researching in the area of accessible computing for ten years now. My specialty focus area is people with visual disabilities, so mostly concentrated on their accessibility, as well as usability needs, when it comes to computer applications.

So, with the topic at hand, which is accessible communication, so, one of the projects that I am currently looking at is understandability of Social Media content, for people who listen to content, such as, you know, people who are blind. So, listening to Social Media content text is not the same as looking at it. So, even though the Social Media text is accessible, it is not necessarily understandable because of the presence a lot of non-standard language content in Social Media, such as Twitter. People create their own words, they are very inventive there. They hardly follow any grammar. So, text-to-speech systems such as those used in screen data, cannot necessarily pronounce these out of vocabulary words in the right way. Because most of the words, even though they are in text form, they are mostly intended for vision consumption, some type of exaggeration where the letters are duplicated just for additional effect. Sometimes emotions are attached to the text itself, without any emoticons or anything else. And sometimes to phonetically match it, use a different with spelling of the word just for fun purposes.

So, as communication increases, tremendously with social media, people are depending on social media to understand or getting news even, you know, some kind of disaster news or if something happens anywhere, some event, they first flock the social media to get it. So, people that listen to content should also be able to easily understand. I am focusing on that area, how to use NLP to make this possible. Even though this is not exactly a question of accessibility in a conventional sense, but it is more like accessibility in terms of being able to understand the already accessible content. So, it is one of the things.

The other thing I am looking at that is related to this panel is the disability bias of natural language models, especially those Large Language Models. So, unfortunately, these models are reflective of the data it is trained on, because most of the data associates words that are used to describe People with Disabilities, somehow end up having negative connotation, they use negative context. Nobody is telling the models to learn it that way, except that the documents of the text corpus that these models are looking at inherently put these words that are many times not offensive into the negative category.

So, I am looking at how we can counter this. The example is toxicity detection in discussion forum, online discussion forums are very popular. People go there, sometimes anonymously, post content, interact with each other. You know, some of the posts get flagged as, you know, toxic, or they get filtered out. So, even if they are not toxic, because of the use of certain words to describe disabilities or something. So, we want to avoid that. How can we use an NLP to not do that. These two projects are what are closely related to the panel specifically to this session.

CARLOS DUARTE: Thank you, Vikas. I will follow up with that, with what you mentioned and Lourdes has also previously highlighted, the disability bias. I am wondering if you have any ideas, suggestions on how can NLP tools address such issues, I’m thinking for instance, text summarization tools, but also other NLP tools. How can they help us address the issues of disability bias, also how can they explore other aspects like accountability or personalization, in the case of text summaries. How can I personalize a summary for specific audiences, or for the needs of specific people. I will start with you, Lourdes.

LOURDES MORENO: Text summarization is a natural language task, is a great resource because improve cognitive accessibility in order to help people with disabilities to process long and tedious texts. Also, In the Web Content Accessibility Guidelines, following success criteria 3.1.5 Reading Level, the readable summary is very socially recommended. But these tasks have challenges, such us disability biases, and the summaries that are generated and are not understandable for people with disabilities. Therefore, some aspects must be taken into account. It is necessary to approach these tasks with a summary of the extract type, where the extract sentences can be modified with paraphrasis resources, and help the understandability and reliability of the text. To summarize this, different input are required. Not only knowledge about the sequences of word, not only about sentences, but also about the targeted audience is important. Different types of users require different types of personalization of summaries.

It was also, I think that it will be recommendable to include a readability metric in the summary generation process to ensure that the result summary is minimally readable. For instance, if we are in the context of assistant that provides summaries of public administration information for all people, it is necessary to take into account that the summary must be in plain language. Therefore, in addition to extract relevant sentences and paraphrases, it will necessary to include knowledge about guidelines of plain language to make the text easier to read.

Finally, corpora used to train natural language process assistants should be tested with the user in order to obtain a useful solution. Only then it will be possible to obtain understandable summaries for all the society and their elderly. Then with respect to accountability, as in every Artificial Intelligence algorithm, it must be explainable. So, it is necessary to respond, to answer, to questions such as how processing actually performed, a limitation of the dataset use to train and test algorithms and these outcome of the model. Therefore, good data management and machine learning models training practices should be promoted to ensure quality results. Nothing else.

CARLOS DUARTE: Thank you, Lourdes. Vikas, do you want to, even though from what I understand you don’t work directly with text summarization, but how does this aspect of disability bias accountability, and personalization impact what you are doing?

VIKAS ASHOK: I use a lot of text summarization, so I can add to it. To add to what Lourdes said, simplification is also as important at summarization because sometimes it is not just summarizing, or shortening the content to be consumed, but it is also making it understandable, like I said. It means that certain complex sentences structures and some more tricky words, we need to replace them with equal and easier to understand, more frequently used words. There is some work there that has been done into text simplification, we created some kind of text summarization, in this special case if from the same language, text between the same language, so the input is text in the same language as the output text, except that the output text is more readable, more understandable. So, that is extremely important.

The other thing is summarization, most of them tend to rely on extractive summarization, where they just pick certain sentences from the original piece of text so that they don’t have to worry about the grammatical correctness and proper sentence structures, so that because they rely on humans who have written the text in order to generate the summaries. So I can speak how summarization need to be personalized in a certain way, for certain groups, especially for people with visual disabilities. What I have noticed in some of my study is that, even though they can hear it, they don’t necessarily understand it, because the writing is sort of visual, in other words it needs you to be visually imaginative. So, what is the non-visual alternative for such kind of text. How do you summarize the text that includes a lot of visual elements to it? How do you convert it into non-equal and non-visual explanations. This necessarily goes beyond the extractive summarization. You cannot just pick and choose. You need to replace the wordings in the sentence with other wordings that they can understand. Some of the text, you know, these days, especially the articles, in news articles and all, they don’t come purely as text. They are sort of multi-modal in the sense that there are pictures that are the GIFS, everything, and the text sort of refers to these pictures. So, this is another problem, because then it becomes highly visual. So, you have to take some of the visual elements of the picture, probably through computer vision techniques or something, and then inject it into the text in order to make it more self-sufficient and understandable for people who cannot see the images. So, that is my take on it.

CARLOS DUARTE: Yes. That is a very good point about the multimedia information and how do we summarize everything into text. Yes. That is a great point. Chaohai, your take on this?

CHAOHAI DING: Yes. We don’t have must experience on text summarization. Most of our research is on AAC and the interlinking of the AAC generation. But we do held a project that involves part of text summarization. We constructed a knowledge graph for an e-learning platform and then we needed to extract the text summarization from lecture notes to make it easier and accessible for people, students with disabilities. So, based on that project, what we learned is that text summarization is a very difficult task in NLP, because it is highly dependent on the text, context domain and target audience, and even the goal of the summary. For example, in our scenario, we want to have a summary of each lecture notes, but we have very long transcripts in that lecture. So, we use a few text summarization models to generate the summaries, but the outcome is not good. As Vikas just said, some of the text summarization is just pick some of the text, and replace some of the words. That is it. And some of it doesn’t make sense. So, that is one problem we identified in text summarization.

And we also have some method to, because we need to personalize, because the project is related to adapted learning for individual students, so we need personalization for each student. So, text summarization could be customized and adapted to a user’s need. But this actually can be improved with user’s personal preference or feedback, and also allow the user to set their summary goal and also, the simplification is very important, because some students may have cognitive disabilities, or other types of disabilities that they need to have simplified into plain language. Yes, I think that is mainly what we have for text summarization.

CARLOS DUARTE: Thank you so much. Let’s move on to, we started with the challenges and now I would like to move onto the future perspectives. What are the breakthroughs that you see happening, promoted by the use of NLP for accessible communication. I will start with you, Vikas.

VIKAS ASHOK: So, my perspective is that there are plenty of NLP in the tools out there already that haven’t been exploited to the fullest extent to address accessibility and usability issues. The growth in NLP techniques and methods have been extremely steep in recent years. And the rest of us in different fields are trying to catch up. Still, there is a lot to be explored as to how they can be used to address real world accessibility problems, and we are in the process of doing that, I would say. So, text summarization is one thing we discussed already that can be explored in a lot of scenarios to improve the efficiency of computer interaction for People with Disabilities. But the main problem, as we discussed not only in this panel, but also on other panels is the data.

So, for some languages there is enough of that little corpus where the translation is good, because the translation depends on how much data you have trained on. But for some pair of languages it may not be that easy, or even if it does something, it may not be that accurate, so that could be a problem. Then the biggest area where I see, which can be very useful for solving many accessibility problems is the improvement in dialogue systems. So, national language dialogue is a very intuitive interface for many users, including many People with Disabilities.

So, those are physical impairments which prevent them from conveniently using the keyboard or the mouse, and those that are blind who have to use screen readers, which is time consuming , it is known to be time consuming. So, dialog systems are under explored. They are still exploring it. You can see the commercialization is going on, like with Smartphones and all, but still, with some high-level interaction, like setting alarms, turning on lights and answering some types of questions, but what about using that to interact with applications in the context of an application. So, if I see a play, I had a user comment to this particular document text, say in Word or Docs. Can an assistant spoken, dialog assistant, understand that, and automate it. So that automation I feel of address many of the issues that people face interacting with digital content. So, that is one of the things I would say we can use NLP for.

The other thing is the increased availability of Large Language Models, pre-trained models, like one Lourdes mentioned, GPT, which is essentially a transformer decoder or generator base model. Then there is also Bert, which is encoder based. So, this help us in a wat that we don’t need large amount of data to solve problems because they’re already pre-trained on a large amount of data. So, what we need are kind of small datasets that are more fine-tuned toward the problem here we are addressing. So, the accessibility datasets, there I think there needs to be a little more investment. It doesn’t have to be that big, because the Large Language Models, already take care of most of the language complexities. It is more like fine tuning to the problem at hand. So, that is where I think some effort should go, and once we do that, obviously, we can fine tune and solve the problems, and then there is a tremendous advancement in transfer learning techniques, of which we can explore that, as well, in order to not do that from scratch, instead borrowing somethings that are already there, I mean, similar problem. There is a lot to be explored, but we haven’t done that yet. So, there is plenty of opportunity for research using NLP expertise for problems in accessible communication, especially.

CARLOS DUARTE: Yes. Definitely some exciting avenues there. So, Chaohai, can we have your take on this? Your breakthroughs?

CHAOHAI DING: Yes, I totally agree with Vikas’ opinions. For my research, because I mainly work with AAC, currently, I would take AAC, for example. The future perspective for AAC and NLP for AAC, I think, first of all would be the personalized adaptive communication for each individual. Because each individual has their own communication, their own way to communicate with each other. And NLP techniques can be used to make this communication more accessible, more personalized and adapted based on their personal preference, feedback. So, this can be used for personalized AAC symbols. Currently AAC users are just using standard AAC symbol sets for their daily communication. So, how can we use NLP to, and the generic AI models to create more customized, personalized AAC symbols. Which could be having the ability to adapt to their individual’s unique culture and social needs. I think that is one potential contribution to AAC users.

The second one will be accessible multi modal communication. Because NLP techniques, they have the potential to enhance this accessible communication by improving interoperability in training data, and the between the verbal language, Sign Language and the AAC. So, data interoperability can provide a more high-quality training data for this language with elastic set. Additionally, it can provide the ability to translate different communication models and make it more accessible and inclusive. So, in AAC, we can have multiple AAC symbol sets that can be linked, mapped and interlinked by NLP models, and this can be contributed to translation between the AAC to AAC, and the AAC to text, AAC to Sign Language and vice versa. That is the second perspective I think about.

And the third one is the AI assistant communication that Vikas just talked about, the ChatGPT. So, with this, this large language model has been trained by these big companies and they have been widely spreading on the Social Media. So, how to use this trained Large Language Models incorporated with other applications, then you can use it for more accessible communication to help People with Disabilities. That is another future we are looking for.

The last one I am going to talk about is more regarding the AAC. AAC is quite expensive. So, affordability is very important. It can be achieved by a NLP or AI. That is one thing I mentioned that we are currently looking into how to turn image into symbols, and how to generate AAC symbols automatically by using image generative AI models, like stable diffusion. So, that is another future we are looking forward, how to reduce the cost for accessible communication. Thank you.

CARLOS DUARTE: Thank you, Chaohai. Definitely a relevant point, reducing the cost of getting data and all of that. That is important everywhere. So, Lourdes, what are you looking for in the near future? And you are muted.

LOURDES MORENO: Sorry. As we mentioned before, there are two trends, the appearance of newer and better language models than the previous one, working in these new models, and to reduce disability biases. Also, I will list a specific natural language processing task and Data application that I will work in the coming years. One is accessibility to domain specific task, such as health. The health language is highly demanded and needed, but patients have problems understanding information about their health condition, diagnosis, treatment, and the natural processing methods could improve their understanding of health related documents. Similarly, sample appearance in legal and financial documents, the language of administration, government, … Current natural process language technology that simplify and summarize this could help in the roadmap.

Another line is speech-to-text. Speech-to-text will be a relevant area of research in the field of virtual meetings in order to facilitate accessible communication by generating summaries of meetings, as well as minutes in plain language.

Another topic is the integration of natural language processing methods into the design and development of multimedia use interface. It is necessary to face accessible communication from a multidisciplinary approach between different areas, such as human computer interaction software engineering and natural language processing.

Finally, another issue is advancing application in smart assistant in natural language processing method to support People with Disabilities and the elderly, assist them in their daily task and promote active living.

CARLOS DUARTE: Thank you so much, Lourdes, and every one of you for those perspectives. I guess we still have five minutes more in this session. So, I will risk another question. I will ask you to try to be brief on this one. But, the need for data was common across all your interventions. And if we go back to the previous panel, also, it was brought up by all the panelists. So, yes, definitely, we need data. What are your thoughts on how can we make it easier to collect more data for the specific aspect of accessible communication, because we communicate a lot, right? Technology has allowed us, opened up several channels to where we can communicate even when we are not co-located. So, yes, everywhere one of us is in different parts of the planet and communicating right now. Technology has improved that possibility a lot. However, we always hear that we need more data, we can’t get data. So, how do you think we can get more data? And, of course, we need the data to train these models, but can’t we also rely on these models to generate data? So, let me just drop this on you now. Do any of you want to go first?

CHAOHAI DING: I can go first. Yes. We have been working on open data four years ago, I mean the AI and the Data Science, because when I started my PhD we worked on open data and there is an open data initiative in the UK. We wanted to open our data, government data, and the public transport data. That is how long I have been working on public transportation with accessibility needs. So, there is a lack of data. At the beginning of my PhD, so, a few years later we still lack accessibility information data. So, how can we, how this, I mean, in the accessibility area, how can we have such a data to train our models? What I used to do with public transport data, I used to map available data into a larger dataset. That’s incurred into a lot of label work like cleaning, data integration, and all this method to make this data available. That is the first approach.

Secondly, we think about how can we contribute like a data repository, or something like an image net or a word net that we can collaboratively to, together, to contribute to identify data related to accessibility and research. I think that is a way, as a community, we can create such a universal repository or some kind of data initiative that we can work on accessibility research.

Then the third approach is that definitely we can generate data based on small data. We can use generative AI model to generate more, but the question is, is that data reliable? Is the data generating enough or is that have been bias? That is my conclusion. Thank you.

CARLOS DUARTE: Yes. Thank you. I think the big question mark is that synthetic data reliable or not. Vikas or Lourdes, do you want to add something?

VIKAS ASHOK: Yes. I have used synthetic data before based on the little bit of real data. And in some cases, you can generate synthetic data. One of the things I had to do was extract user comments in documents. Most of this word processing applications allow you to post comments to the right, for your collaborators to look at and then, you know, address them. So, automatically extracting that, I had to generate synthetic data, because obviously only a few documents with collaborative comments. So, the appearance there is like, okay, comments will appear somewhere on the right side, right corner, which will have some text in it with a few sentences, so there are some characteristics. So, in those cases we were able to generate synthetic data, we train the machine learning model. It was pretty accurate on this data, which was like real data. So, in some cases you can exploit the way data will appear, and then generate the synthetic data. But in many cases, it may not be possible. Like the project I mentioned in Social Media where the text contains a lot of non standard words. Simply replacing the non standard words with synonyms may not do the job, because then you take the fun aspect away from Social Media, right? It should be as fun and entertaining when you listen to Social Media text as it is when you look at it.

So, you have to do some kind of clever replacement. So for that you need some kind of human expert going there and doing that. Crowdsourcing, I think, is one way to get data quickly. It is pretty reliable. I have seen in the NLP community, like NLP papers that appears in ACL and they rely heavily on the Amazon Mechanical Turk and other online incentivized data collection mechanisms. So, that I think is one thing.

The other thing I do, you know, in my classes, especially, I get the students to help each other out to collect the data. It doesn’t have to be that intensive. Every day, if they just, even one student collects, like, ten data points, over the semester there can be enough data for a lot of things. So, you know, in each of their projects and in the end of the course, pretty much they will have a lot of data for research. So, everybody can contribute in a way. Students, especially, are much more reliable because they are familiar with the mechanisms, how to label, collect data and all that stuff. They can understand how things work, as well. So, it is like a win win.

CARLOS DUARTE: Yes. Thank you for that contribution. Good suggestion. And Lourdes, we are really running out of time, but if you still want to intervene, I can give you a couple of minutes.

LOURDES MORENO: Okay. I think that also we don’t find, we need a few data but in my vision is also negative because obtaining the dataset is expensive. And in accessible communication, I work in simplification, this data must be prepared by the expert in Accessibility. It is important that this data is validated by people with accessibility, and use plain language resources. And then it is a problem to obtain data with quality.

CARLOS DUARTE: Okay. Thank you so much, Lourdes. And thanks, a very big thank you to the three of you, Chaohai, Vikas and Lourdes. It was a really interesting panel. Thank you so much for your availability.

Closing Keynote: Shari Trewin

Dr Shari Trewin is an Engineering Manager at Google, leading a team that develops new assistive technologies and features. Her background is in research, with 21 patents and 70 peer-reviewed articles including AI fairness, accessibility tools for designers and developers, web accessibility, access to virtual worlds, and self-adaptive input devices. Shari is a Distinguished Scientist of the Association of Computing Machinery (ACM), where she has chaired the ACM Special Interest Group on Accessible Computing (SIGACCESS), sat on ACM’s Diversity and Inclusion Council, and helped develop ACM’s accessibility guidance for authors and conference organizers. As Program Director of the IBM Accessibility team, she worked to elevate IBM’s product accessibility through the open source Equal Access toolkit.

Where next for assistive AI?

The closing keynote of the symposium by Shari Trewin focused on the digital capabilities of AI and its potential for assistive AI. Shari discussed the transformative power of AI in improving digital accessibility for people with disabilities and provided examples of how AI can contribute to this goal. She also highlighted the limitations and challenges of AI, such as biases in training data and inaccuracies in predictions. Shari emphasized the importance of research and innovation in moving digital accessibility forward with AI. She explored the concept of AI at source, where AI tools can assist content creators in generating accurate descriptions and making written content more accessible. Shari also discussed the use of AI in text-to-speech applications and the benefits of applying AI at authoring time. She highlighted the potential of AI in generating accessible code but stressed the need for training AI models on accessible code to avoid propagating past accessibility issues. Shari concluded by emphasizing the need for AI integration with authoring tools and processes to improve accessibility standards. The keynote ended with a discussion on the role of big companies in AI research, the challenges of personalization, and the importance of democratizing access to AI tools.

Transcript of Closing Keynote

CARLOS DUARTE: Okay. Since we are, we should have already started the closing keynote, I am going to move on to introducing Shari Trewin. She is an Engineering Manager at Google, leading a team that develops Assisted Technologies. So, I am really looking forward to your vision of what is next, what is the future holding for us in Assisted AI. So, as we had yesterday, at the end of the keynote Jutta will join us and we will have this even more interesting conversation between Shari and Jutta, making it really appetizing for the keynote. So, Shari, the floor is yours.

SHARI TREWIN: Okay, thank you very much. Can you hear me okay?

CARLOS DUARTE: Yes.

SHARI TREWIN: Okay. What a pleasure it is to participate in the symposium and hear from our opening keynote Jutta, and all out panelist over the last two days. Thank you so much for inviting me. It’s my privilege to finish this up now. Yesterday Jutta grounded us all in the need to do no harm and talked about some of the ways we can think about detecting and avoiding harm. Today I will focus on digital accessibility applications of AI in general and ask what is next for Assistive AI.

So, my name is Shari Trewin. I am an Engineering Manager in the Google Accessibility team. I’m also the past chair of the ACM’s SIGACESS, Special Interest Group on Accessible Computing. My background is computer science and AI. I have been thinking about the ways that AI plays into accessibility for many years. Much of my work in thinking on AI and the AI fairness was done when I worked at IBM as a Program Director for IBM Accessibility. A shout out to any IBM friend in the audience. At Google, my team focuses on developing new assistive capabilities and as we have been discussing for the last few days, AI has an important role to play.

There has been a lot of buzz in the news lately, both exciting and alarming, about generative AI, especially these Large Language Models. For example, the ChatGPT model from OpenAI has been in the news quite a bit. In case you haven’t played with it yet, here is an example. I asked ChatGPT how will AI enhance Digital Accessibility. Let’s try to get it to write my talk for me. It responded with a positive viewpoint. It said AI has the potential to significantly improve Digital Accessibility for People with Disabilities. Here are a few ways that AI can contribute to this goal. It went on to list four examples of transformative AI. All of these have been major topics at this symposium. For each one it gave a one or two sentence explanation of what it was, and who it is helpful for.

Finally, it concluded that AI has the potential to make digital content and devices more accessible to People with Disabilities, allowing them to fully participate in the digital world. It seems pretty convincing and well written. Perhaps I should just end here and let AI have the last part. But, you know, it is kind of mind blowing, although it was pretty terrible at jokes. So, what it can do without explicitly being connected to any source of truth, it does get things, sometimes, flat out wrong, with the risk of bias in the training data being reflected in the prediction.

This limits the ways we can apply this technology today, but it also gives us a glimpse into the future. I am not going to take medical advice from a generative AI model yet, but as we get better at connecting this level of language fluency with knowledge, improving the accuracy, detecting and removing bias, this opens up so many new possibilities for interaction models, and ways to find and consume information in the future. So, I will come back to that later.

For today’s talk, I am going to slice the topic a little bit differently. I want to focus on some of the general research directions that I see as being important, moving Digital Accessibility forward with AI. In our opening keynote, Jutta laid out some of the risks that can be associated with AI. It is not created and applied with equity and safety in mind. It is important to keep these considerations in mind as we move forward with AI. When the benefits of AI do outweigh the risks in enabling digital access, we still have a way to go in making these benefits available to everyone, in fact, to make them accessible. So, start by talking about some current effects in that direction, making Assistive AI itself more inclusive. The second topic I want to cover is where we choose to apply AI, focusing in what I call AI at source. And finally, Web Accessibility work in role emphasizes the need to shift left, that is to bake accessibility in as early as possible in the development of the digital experience. So, I will discuss some of the places where AI can help with that shift left, and highlight both opportunities and important emerging challenges that we have for Web Accessibility.

So, we know that AI has already changed the landscape of assistive technology. So, one research direction is how do we make these AI models more inclusive? And I want to start with a little story about captions. In 2020, I was accessibility chair for a very large virtual conference. We provided a human captioner, who was live transcribing the sessions in a separated live feed. I am showing an image of a slide from a presentation here with a transcription window to the right. I spoke with a Hard of Hearing attendee during the conference who used captions to supplement what he could hear. He told me, well, the live feed had quite a delay, so he was also using automated captions that were being streamed through the conference provider, let’s add them to this view, highlighted in green. This had a little less delay but had accuracy problems, especially for foreign speakers or people with atypical speech. And especially for people’s names or technical terms. You know, the important parts. So, he also turned on the automated captions in his browser which used a different speech detect engine. I added those on the screen, too. And he supplemented that with an app on his phone, using a third different speech recognition engine capturing the audio as it was played from his computer and transcribing it. So that is four sources of captions to read. None of them was perfect, but he combined them to triangulate interpretations where the transcriptions seemed to be wrong. So, we could say AI powered captions were helping him to access the conference, no doubt about it, but it wasn’t a very usable experience. He was empowered but he also had a huge burden in managing his own accessibility, and there were still gaps.

As Michael Cooper pointed out yesterday, imperfect captions and descriptions can provide agency, but can also mislead users and waste their time. I also want to point out this particular user was in a really privileged position, because he knows about all these services, he has devices powerful enough to stream all these channels. He has good internet access. He has a Smartphone. He has the cognitive ability to make sense of this incredible information overload. This really isn’t equitable access, right? And the captions themselves were not providing accurate representation of the conference speakers, so those with atypical speech were at a disadvantage in having their message communicated clearly, so there is an important gap to be filled. One of the current limitations of automated captions is poor transcriptions of people with atypical speech, especially when they are using technical or specialized language. For example, Dimitri Kavensky is a Google researcher and inventor, he is an expert in optimization and algebraic geometry, among many other topics. He is Russian and deaf, both of which affect his English speech. I will play a short video clip of Dimitri.

(Pre Captioned Video)

So, Dimitri said, Google has very good general speech recognition, but if you do not sound like most people, it will not understand you. On the screen a speech engine translated that last part of his sentence as “but if you look at most of people, it will look and defended you”. So, People with Disabilities that impact speech such as Cerebral Palsy, stroke, Down Syndrome, Parkinson’s, ALS, are also impacted by lack of access to speech recognition, whether it is for controlling a digital assistant, communicating with others or creating accessible digital content. I want to go to the next slide.

So, Google’s project Euphonia, has set out to explore whether personalized speech recognition models can provide accurate speech recognition for people with atypical speech, like Dimitri. And this is a great example of the way research can move the state of the art forward. The first challenge, as many people have mentioned today, is the lack of suitable speech data. Project euphonia collected over a million utterances from people with speech impairments and the researchers built individual models for 432 people and compared them to state of the art general models. They found the personalized models could significantly reduce the word error rates, and so the error rates had gone from something like 31% with the generated models down to 4.6%. So, it is not just a significant improvement, but it is enough of improvement that gets to a high enough point to make the technology practical and useful. In fact, they found these personalized models could sometimes perform better than human transcribers for people with more several disorder speech. Here is an example of Dimitri using his personal speech recognition model.

(Captions on Smartphone demonstration in video)

So, the transcription this time is make all voice interactive devices be able to understand any person speak to them. It is not perfect but it is much more useful. Project Euphonia started in English but it is now expanding to include Hindi, French, Spanish and Japanese. So, that project demonstrated how much better speech recognition technology could be, but the original data wasn’t shareable outside of Google and that limited the benefits of all that data gathering effort.

So, the Speech Accessibility Project at the University of Illinois is an example of what we might do about that problem. It is an initiative to make a dataset for broader research purposes. It was launched in 2022, and it is a coalition of technologists, academic researchers and community organizations. The goal is to collect the diverse speech dataset for training, speech recognition model, to do better at recognizing atypical speech. It is building on some of the lessons learned in project euphonia, paying attention to ethical data collection, so individuals are paid for participating, their samples are de-identified to protect privacy. The dataset is private, and it is managed by UIUC and made available for research purposes and this effort is backed by cross-industry very broad support from Amazon, Apple, Google, Meta, and Microsoft. It’s going to enable both academic researchers and partners to make progress. Although the current work is focus on speech data, this is in general a model that could be used for other data that’s needed to make models more inclusive. We could think of touch data. There are already significant efforts going on together. Sign Language video data for Sign Language translation.

And Project Relate is an example of the kind of app that can be developed with this kind of data. It is an Android app that provides individuals with the ability to build their own personalized speech models and use them for text to speech, for communication and for communicating with home assistants.

Personalized speech models look really promising, and potentially a similar approach to be taken to build personalized models for other things like gesture recognition, touchscreen interactions, interpreting inaccurate typing. I think there is a world of opportunity there that we haven’t really begun to explore. So, now that we know we can build effective personal models from just a few hundred utterances, can we learn from this? How to build more inclusive general models, would be a very important goal.

Can we improve the performance even further by drawing on a person’s frequently used vocabulary? Can we prime models with vocabulary from the current context? And as Shivam Singh mentioned yesterday, we’re beginning to be able to combine text, image, and audio sources to provide a richer context for AI to use. So, there’s very fast progress happening in all of these areas. Just another example, the best student paper at the ASSETS 2022 conference was using vocabularies that were generated automatically from photographs to prime the word prediction component of a communication system for more efficient conversation around those photographs.

Finally, bring your own model. I really agree with Shaomei Wu when she said yesterday use cases of media creation are under investigated. We can apply personalized models in content creation. Think about plugging in your personal speech model to contribute captions for your live streamed audio for this meeting. The potential is huge, and web standards might need to evolve to support some of these kind of use cases.

When we talk about assistive AI, we’re often talking about other technologies that are being applied at the point of consumption, helping an individual to overcome accessibility barriers in digital content or in the world. I want to focus this section on AI at source and why that is so important. Powerful AI tools in the hands of users don’t mean that authors can forget about accessibility. We have been talking about many examples of this through this symposium, but here are a few that appeal to me.

So, I am showing a figure from a paper. The figure is captioned user response time by authentication condition. And the figure itself is a boxplot that shows response times from an experiment for six different experimental conditions. So, it is a pretty complex figure. And if I am going to publish this in my paper, my paper is available, I need to provide a description of this image. There is so much information in there. When faced with this task, about 50\% of academic authors resort to simply repeating the caption of the figure. And this is really no help at all to a blind scholar. They can already read the caption. That is in text. So, usually the caption is saying what information you will find in the figure, but it is not giving you the actual information that is in the figure.

Now, as we discussed in yesterday’s panel, the blind scholar reading my paper could use AI to get a description of the figure, but the AI doesn’t really have the context to generate a good description. Only the author knows what is important to convey. At the same time, most authors aren’t familiar with the guidelines for describing images like this. And writing a description can seem like a chore. That is why I really love the idea that Amy Pavel shared yesterday for ways that AI tools could help content creators with their own description task, perhaps by generating an overall structure or initial attempt that a person can edit.

There are existing guidelines for describing different kinds of charts. Why not teach AI how to identify different kinds of charts and sort of generate a beginning description. And Shivam Singh was talking yesterday as well about recent progress in this area. Ideally the AI could refine its text in an interactive dialogue with the author, and a resulting description would be provided in the paper and anyone could access it, whether or not they had their own AI. So, that is what I mean by applying AI at source. Where there is a person with the context to make sure the description is appropriate, and that can provide a better description. Of course, it can only provide one description. There is also an important role for image understanding that can support personalized exploration of images. So that a reader could clearly read information that wasn’t available in a short description, like what were the maximum and minimum response times for the gesture condition in this experiment. I am not saying that AI at source is the only solution, but it is important, and perhaps, an undeveloped piece.

Here is a second example. I love examples! As we were just talking about in the earlier panel, text transformations can make written content more accessible. So, for example, using literal language is preferable for cognitive accessibility. So, an idiom like “she was in for a penny, in for a pound,” can be hard to spot if you are not familiar with that particular idiom and can be very confusing if you try to interpret it literally. Content authors may use this kind of language without realizing. Language models could transform text to improve accessibility in many ways, and one is by replacing idioms with more literal phrasing. So, I asked the language model to rephrase this sentence without the idiom and it came up with a sensible, although complex literal replacement. “she decided to fully commit to the situation, no matter the cost.” Again, this can be applied as a user tool, and as a tool for authors to help them identify where their writing could be misinterpreted. So, one puts the onus on the consumer to bring their own solution, apply it and be alert for potential mistakes. The other fixes the potential access problems at source, where the author can verify accuracy.

As I mentioned earlier, because today’s Large Language Models are not connected to a grounded truth, and they do have a tendency to hallucinate, applying them at source is one way to reach the benefit much more quickly without risking harm to vulnerable users. Once we collect language models, connect them to facts, or connect speech to the domain of discourse, well, we will really see a huge leap in performance, reliability and trustworthiness. So, in the previous two examples, AI could be applied at source. What about when the AI has to be on the consumer side, like when using text to speech to read out text on the web?

On the screen here is the start of the Google information side bar about Edinburgh, the capital city of Scotland. There is a heading, subheading and main paragraph. Text to speech is making huge advances with more and more natural sounding voices becoming available and the capability of more expressive speech, which itself makes comprehension more easy. And expressiveness can include things like adjusting the volume, verbosity. When reading a heading, maybe I would naturally read it a little louder. Pause afterwards. For a TTS service to do the best job reading out text on the web, it helps to have the semantics explicitly expressed. For example, the use of heading markups on Edinburgh on this passage. It is also important that domain specific terms and people’s names and or place names are pronounced correctly. Many people not from UK on first sight would pronounce Edinburgh as Edinburgh. Web standards, if they’re applied properly, can mark up the semantics like headings and pronunciation of specialized or unusual words, helping the downstream AI to perform better. AI can also be used to identify the intended structure and compare against the markup or identify unusual words or acronyms where pronunciation information could be helpful. And then the passage can be read appropriately by your preferred text to speech voice, at your preferred speed and pitch.

It can also be used by a speech to text model to marry the vocabulary on the page with what you are saying as you are interacting with the page, to use voice controls. So, I am showing you this example to illustrate that Web Accessibility standards work together with Assistive AI techniques to enable the best outcome. And many uses of Assisted Technology can benefit from this information. So, thinking about applying AI at source, there is an important role here for AI that makes sure that the visual and structural DOM representations are aligned. So, I want to reiterate the powerful benefits of applying AI at authoring time, that these examples illustrate.

So, first off, we are removing the burden from People with Disabilities to supply their own tools to bridge gaps. Secondly, it benefits more people, including those people who don’t have access to the AI tools. People with low end devices, poor internet connectivity, less technology literacy. Thirdly, a content creator can verify the accuracy and safety of suggestions, mitigating harms from bias or errors, because they have the context. And AI can also potentially mitigate harms in other ways. For example, flagging videos, images or animations that might trigger adverse health consequences for some people, like flashing lights.

So, AI inside is likely to reach more people than AI provided by end users. I think this is how we get the most benefit for the least harm. It is also a huge opportunity to make accessibility easier to achieve. AI can make it much quicker and easier to generate the accessibility information, like captions or image descriptions, as we discussed. And lower the barrier entry with assistive tools is one way to encourage good accessibility practice. AI can proactively identify where accessibility work is needed. And evaluate designs before even a line of code has been written.

But perhaps the biggest opportunity and the greatest need for our attention is the use of AI to generate code, which brings us to the final section of this talk.

So, in the previous section we talked about ways that AI can be applied in content creation to help build accessibility in. But AI itself is also impacting the way websites are designed and developed, independent of accessibility. So, in this section, let’s think about how this change will impact our ability to bake accessibility in, and can we use AI to help us?

As accessibility advocates, we have long been pushing the need to shift left. By that, we mean paying attention to accessibility right from the start of a project, when you are understanding the market potential, when you are gathering the requirements, when you are understanding and evaluating risks, developing design, and developing the code that implements those designs. In a reactive approach to accessibility, which is too often what happens, the first attention to accessibility comes when automated tools are run on an already implemented system. Even then they don’t find all issues and may not even find the most significant ones which can lead teams to prioritize poorly. So, with that reactive approach, teams can be kind of overwhelmed with hundreds or even thousands of issues, kind of linked in their process, and have difficulty tackling it and it makes accessibility seem much harder than it could be.

So, this morning’s panel, we discussed ways AI can be used in testing to help find accessibility problems. Ai is also already being used earlier in the process by designers and developers. In development, for example, GitHub Copilot as an AI model that makes code completion predictions. GitHub claims in files where it is turned on, nearly 40% of code is being written by GitHub Copilot in popular coding languages. There are also systems that generate code from design wireframes or from high resolution mockups, or even from text prompts. So, it is incumbent on us to ask, what data are those systems trained on. In the case of Copilot, it is trained on GitHub open source project code. So, what is the probability that this existing code is accessible? We know that we still have a lot of work to do to make Digital Accessibility the norm on the web. Today is the exception. And many of you probably know WebAIM does an annual survey of the top million website Home Pages. It runs an automated tool and imports the issues that it found. Almost 97% of their million pages had accessibility issues. And that is only the automatically detectable ones. They found an average of 50 issues per page, and they also found that page complexity is growing significantly. Over 80% of the pages they looked at had low contrast text issues. More than half had alternative text missing for images. Almost half had missing form labels. So, even though these are issues, they’re easy to find with the automated tools we have today, these are still not being addressed. These are very basic accessibility issues and they are everywhere. Though we know what this means from AI models learning from today’s web.

Here is an example of how this might be playing out already. So, code snippets are one off the most common things that developers search for. A Large Language Model can come up with pretty decent code snippets and it is a game changer for developers and is already happening. Let’s say a developer is new to Flutter, the new Google’s open source mobile app development platform. They want to create a button labeled with an icon known as an icon button. On the slide is the code that ChatGPT produced when asked for a Flutter code for an icon button. Along with the code snippet, it is also provided some explanation and it even links to the documentation page, so it is pretty useful. The code it gave for an icon button includes a reference to what icon to use, and a function to execute when the button is pressed. There is really just one important difference between the example generated by ChatGPT, and the example given in the Flutter documentation. The ChatGPT didn’t include a tool tip, which means there is no text label associated with this button. That is an accessibility problem. Let’s give it credit, ChatGPT did mention that it is possible to add a tooltip, but developers look first at the code example. If it is not in the example, it is easily missed. But in the training data here, it seems the tooltip was not present enough of the time for it to surface as an essential component of an icon button.

So, there are a lot of example code available online, but how much of that code demonstrates accessible coding practices given the state of Web Accessibility, it is likely the answer is not much. So, our AI models are not going to learn to generate accessible code. It is really just like the societal bias of the past being entrenched in training sets of today. The past lack of accessibility could be propagated into the future. So, here we have an opportunity, and a potential risk. AI can help to write accessible code, but it needs to be trained on accessible code, or augmented with the tools that can correct accessibility issues. And I think that is important to point out, as well, I deliberately used an example in a framework, rather than an HTML example, because that is what developers are writing in these days. They are not writing raw HTML. They are writing frameworks, and there are many, many different frameworks, each with their own levels of accessibility, and ways to incorporate accessibility.

So, one thing is that the theme of this morning about data being really essential comes up here again. Do we have training data to train a code prediction model, perhaps with transfer learning to generate more accessible code. Do we have test settings, even, that we can test code generation for its ability to produce accessible code. So, when we are developing datasets for other training or testing, we have to think in terms of the diversity of frameworks and methods that developers are actually working with, if we want to catch those issues at the point of creation. Again, where AI is generating code for a whole user interface based on a visual design, we need to be thinking about what semantics should that design tool capture to support the generation of code with the right structure, the right roles for each area, kind of the basic fundamentals of accessibility.

So, um, a final call to action for the community here is to think about, what do we need to do here? Whether it is advocacy, awareness raising, research, data gathering, standards, or refining models to write accessible code. This technology is still really young. It has a lot of room for improvement. This is a perfect time for us to define how accessibility should be built in, and to experiment with different ways. And, you know, in my opinion, this, perhaps more than anything, is the trend we need to get in front of as an accessibility community, before the poorer practices of the past are entrenched in the automated code generators of the future. AI is already shifting left, we must make sure accessibility goes with it.

So, to summarize, we can broaden access to Assistive AI through personalization. To get the benefits of AI based empowerment to all users, we should make sure that AI integration with authoring tools and processes is applied where it can, to make it easier to meet accessibility standards and improve the overall standard. Born accessible is still our goal and AI can help us get there if we steer it right. As a community we have a lot of work to do, but I am really excited about the potential here.

So, thank you all for listening. Thanks to my Google colleagues and IBM Accessibility team, also, for the feedback and ideas and great conversations. Now I want to invite Jutta to join. Let’s have a conversation.

JUTTA TREVIRANUS: Thank you, Shari. I really, really appreciate your coverage of authoring and the prevention of barriers and the emphasis on timely proactive measures. There may be an opportunity actually to re-look at authoring environments, et cetera, within W3C.

SHARI TREWIN: Yes, just to respond to that really quickly. I do wonder, like, should we be focusing on evaluating frameworks more than evaluating individual pages? You know? I think we would get more bang for our buck if that was where we paid attention.

JUTTA TREVIRANUS: Yes. Exactly. The opportunity to, and especially as these tools are now also assisting authors, which was part of what the authoring standards were looking at prompting, providing the necessary supports, and making it possible for individuals with disabilities to also become authors of code and to produce code. So, the greater participation of the community, I think, will create some of that culture shift. So, thank you very much for covering this.

So, in terms of the questions that we were going to talk about, you had suggested that we might start with one of the thorny questions asked yesterday that we didn’t get time to respond to. So, the question was: Do you think that AI and big companies such as Google and Meta driving research in AI can be problematic with respect to social, societal issues, which don’t necessarily garner the highest revenue? And, if so, how do you think we can approach this?

SHARI TREWIN: Yes. Thank you, Jutta and thank you to the person who asked that question, too. You know, it is true that company goals and society can pull in different directions. I do think there are benefits to having big companies working on these core models, because they often have better access to very large datasets that can, you know, bring breakthroughs that others can share in, that can help raise the tide to raise all votes, but advocacy and policy definitely have an important role to play in guiding the application of AI and the direction of AI research, the way it is applied. Also, I wanted to say one approach here could be through initiatives like the speech accessibility project that I talked about. So, that is an example of big tech working together with advocacy groups and academia to create data that can be applied to many different research projects, and that is a model that we can try to replicate.

JUTTA TREVIRANUS: Do you think, you talked quite a bit about the opportunity for personalization. Of course, one of the biggest issues here is that large companies are looking for the largest population, the largest profit, which means the largest customer base, which tends to push them toward thinking about, not thinking about minorities, diversity, etc. But the training models and the personalization strategies that you have talked about are things that are emerging possibilities within large learning models. We have if opportunity to take what has already been done generally, and apply more personalized, smaller datasets, etc. Do you think there is a role for the large companies to prepare the ground, and then for the remaining issues to piggy back on that with the new training sets? Or, do you think even there we are going to have both cost and availability issues?

SHARI TREWIN: Well, yeah. I think that the model that you described is already happening in places like with the speech accessibility project. The ultimate goal would be to have one model that can handle more diverse datasets. And it takes a concerted effort to gather that data. But if the community gathered the data and it was possible to contribute that data, then that is another direction that we can influence the larger models that are depending on large data. But personalization, I think will be very important for tackling some of that tail-end. So, personalization is not just an accessibility benefit. There are a lot of tail populations, small end populations, that add up to a large end for a lot of people. The more the, I think that the big companies benefit greatly by exploring these smaller populations and learning how to adapt models to different populations, and then, as I mentioned, the ultimate goal would be to learn how to pull that back into a larger model without it being lost in the process.

JUTTA TREVIRANUS: Yes. We have the dilemma that the further you are from the larger model, the more you actually need to work to shift it in your direction. So, that is something I think that will need to be addressed whatever personalization happens. The people that need the personalization the most will have the greatest difficulty with the personalization. Do you think there is any strategies that might be available for us to use to address that particular dilemma?

SHARI TREWIN: Yeah. Yes. You are touching my heart with that question, because I really, that has been an ongoing problem in accessibility forever. Not just in the context of AI, but people who would benefit the most from personalization may be in a position that makes it hard to discover and activate even personalizations that are already available. So, one approach I think that works in some context is dynamic adaptation. Where, instead of a person needing to adapt to a system, the system can effectively adapt to the person using it. I think that works in situations where the person doesn’t need to behave any different to take advantage of that adaptation. It doesn’t work so well where there is maybe a specific input method that you might want to use that would be beneficial where you need to do something different. So, for language models, maybe we can manage an uber language model that, first, recognized oh, this person’s speech is closest to this sub model that I have learned. And I am going to use that model for this person, and you can think of that in terms of…

JUTTA TREVIRANUS: Increasing the distance, yeah.

SHARI TREWIN: Yeah. So, that is one idea. What do you think?

JUTTA TREVIRANUS: Yes. I am wondering if there is an opportunity, or if there ever will be taken an opportunity, to re-think just how we design, what design decisions we make, how we develop and bring the systems to market, such that there is the opportunity for greater the democratization or access to the tools, and that we don’t begin with the notion of, let’s design first for the majority, and then think about, I mean, this is an inflection point. There is an opportunity for small datasets, zero shot training, et cetera, transfer, transformation transfer. Is this a time where we can have a strategic push to say, let’s think about other ways of actually developing these tools and releasing these tools. Maybe that is a little too idealistic, I don’t know what your thinking is there?

SHARI TREWIN: Yes. I think especially if you are in a domain where you have identified that there is, you know, real risk and strong risk of bias, it should be part of the design process to include people who would be outliers, people who are going to test the boundaries of what your solution can do, people that are going to help you understand the problems that it might introduce. So, it is what should happen, I think, in design, in any system. But especially if you are baking in AI, you need to think about the risks that you might be introducing, and you can’t really think about that without having the right people involved.

Somebody yesterday, I think, mentioned something about teaching designers and developers more about accessibility and I think that is a really important point, too. That building diverse teams is really important. Getting more diversity into computer science is really important. But teaching the people who are already there, building things, is also important. I don’t meet very many people who say, oh, I don’t care about accessibility. It is not important. It is more that it is still too difficult to do. And that is one place when I think AI can really, really help in some of the tools that people have talked about today. The examples of, where, if we can make it easy enough and lower that barrier, and take the opportunity of these creation points to teach people, as well, about accessibility. So, not always to fix everything for them, but to fix things with them so that they can learn going forward and grow. I think that is a really exciting area.

JUTTA TREVIRANUS: And a great way to support born accessible, accessible by default with respect to what is the tools used to create it. You contributed some questions that you would love to discuss. And one of the first ones is: Is AI’s role mostly considered as improving Assistive Technology and Digital Accessibility in general? Of course, this gets to the idea of not creating a segregated set of innovations that specifically address People with Disabilities, but also making sure that the innovations that are brought about by addressing the needs of people whose needs, well, who face barriers, can benefit the population at large. So, what do you think? What is the future direction?

SHARI TREWIN: Yeah. This was a question that came from an attendee that they put into the registration process. I do think it is really important to view AI as a tool for Digital Accessibility in general, and not to just think about the end user applications. Although those personal AI technologies are really important, and they are life changing, and they can do things that aren’t achievable in any other way. But AI is already a part of the development process, and accessibility needs to be part of that, and we have so many challenges to solve there. I think it is an area that we need to pay more attention to. So, not just applying AI to detect accessibility problems, but engaging those mainstream development tools to make sure that accessibility is considered.

JUTTA TREVIRANUS: One sort of associated piece to this that came to mind, and I am going to take the privilege of being the person asking the questions, I mean, the focus of most of AI innovation has been on replicating and potentially replacing human intelligence, as opposed to augmenting, or thinking about other forms of intelligence. I wonder whether the, I mean, our experiences in Assistive Technology, and how technology can become an accompaniment or an augmentation, rather than a replacement, might have some insights to give in this improvement of digital inclusion?

SHARI TREWIN: Yeah. I think you are absolutely right. It is human AI cooperation and collaboration that is going to get us the best results. The language models that we have, the promise that they have, to be more interactive, dialogue like interactions, are heading in a direction that are going to support much more natural human AI dialogue. And accessibility is such a complex topic, where it is not always obvious what I am trying to convey with this image. How important is this thing. It is not necessarily easy to decide what exactly is the correct alternatives for something, or there is plenty of other examples where the combination of an AI that has been trained on some of the general principles of good accessibility practice, and a person who may not be as familiar, but really understands the domain and the context of this particular application, it is when you put those two things together that things are going to start to work, so the AI can support the person, not replace the person.

JUTTA TREVIRANUS: And, of course, the one issue that we need to, thorny issue, that we need to overcome with respect to AI is the challenge of addressing more qualitative, non-quantitative values and ideas, etc. So, it will be interesting to see what happens there.

SHARI TREWIN: Yes. Yes. Yeliz had a very good suggestion this morning, perhaps we should pay attention to how people are making these judgments. How do accessibility experts make these judgments? What are the principles and can we articulate those better than we do now, and communicate those better to designers.

JUTTA TREVIRANUS: Right. This notion of thick data, which includes the context. Because frequently we isolate the data from the actual context. And many of these things are very contextually bound, so, do you see that there might be a reinvestigation of where the data came from, what the context of the data was, et cetera?

SHARI TREWIN: I think there may be a rise in methods that bring in the whole context, bring in more on the context, multimodal inputs. Even for speech recognition. It is doing what it does without even really knowing the domain that it is working in. And that is pretty mind blowing, really. But when it breaks down is when there are technical terms, when you are talking about a domain that is less frequently talked about, less represented. And bringing in that domain knowledge, I think is going to be huge, and, similarly, in terms of hoping to create text alternatives for things, the domain knowledge will help to get a better kind of base suggestion from the AI. Perhaps with dialogue, we can prompt people with the right questions to help them decide, is this actually a decorative image, or is it important for me to describe what is in this image? That is not always a trivial question to answer, actually.

JUTTA TREVIRANUS: Right. That brings in the issue of classification and labeling, and the need to box or classify specific things. And many of these things are very fuzzy context, and classifiers are also determined hierarchically and maybe there is…

SHARI TREWIN: Yes. Maybe we don’t need a perfect classifier, but we need a good dialogue where the system knows what questions to ask to help the person decide.

JUTTA TREVIRANUS: Right. And, oh, I just saw a message from Carlos saying we are only down to a few more minutes. Can we fit in one more question?

SHARI TREWIN: I actually have to stop at the top of the hour.

JUTTA TREVIRANUS: Oh, okay. We will have an opportunity to answer the questions that people have submitted in the question and answer dialogue, and we have access to those, so Shari will be able to respond to some of these additional questions that have been asked. Apologies that we went a little over time, Carlos. I will turn it back over to you.

CARLOS DUARTE: No. Thank you so much. And thank you, Shari, for the keynote presentation. Thank you Shari and Jutta, I was loving this discussion. It is really unfortunate that we have to stop now. But, thank you so much for your presentations. Thank you, also, to all the panelists yesterday and today for making this a great symposium. Lots of interesting and thought provoking ideas.

And, thank you all for attending. We are at the top of the hour, so we are going to have to close. Let me just, a final ask from me. When you exit this Zoom meeting, you will receive a request for completing a survey, so if you can take a couple of minutes from your time to complete it, it will be important information for us to make these kinds of events better in the future.

Okay. Thank you so much, and see you in the next opportunity.

Conclusions and Future Directions

The symposium on AI and digital accessibility highlighted the potential of AI to enhance digital accessibility and empower stakeholders. Embedding accessibility considerations early in technology development is more than ever necessary. However, data challenges, including collection, labeling, regulation, and protection, influence disability bias in AI systems. Diversifying and reevaluating automation and acceleration is crucial. Laws and policies must be developed promptly for accountability and to address discriminatory decisions. Several opportunities rise with the emergence of Explainable AI and user involvement has the potential to promote fairness and ethical practices. AI can improve media accessibility by guiding authors and automating web accessibility evaluation. Challenges remain in accessible communication due to limited diverse data. Dialog systems offer solutions across domains. Prioritizing user and data diversity and integrating accessibility during content authoring are essential. Ethical discussions and standards are needed to ensure fair and ethical use of AI in digital accessibility.

Organizing Committee

Symposium Chairs

Carlos Duarte (LASIGE, Faculty of Sciences of the University of Lisbon)
Letícia Seixas Pereira (LASIGE, Faculty of Sciences of the University of Lisbon)

Scientific Committee

Carlos Duarte (LASIGE, Faculty of Sciences of the University of Lisbon)
Janina Sajka (Sajka Associates)
Jason White
Letícia Seixas Pereira (LASIGE, Faculty of Sciences of the University of Lisbon)
Matthew Atkinson (TPGi)
Michael Cooper (W3C)
Scott Hollier (University of South Australia)