Alt Mate: A browser plugin that automatically generates alt text for images

Plain Language Introduction

Many websites have images without alt text. Screen readers cannot describe these images to screen reader users. Even if websites do include alt text, it is often written in complicated language. This makes it hard for screen reader users to understand what is on the page.

There are tools to generate alt text for images. But they are mostly made for web developers. These tools let developers upload images and generate alt text for them. However;

Developers must manually add the alt text to the website. If developers skip this step, then their website is still inaccessible to screen reader users.

These tools take in one image at a time. This is slow if developers want to create alt text for many images.

Alt text should match the context of the image. This means that an image may have different alt texts depending on how it is used on different pages. Most tools do not use context. Tools that do use this context often require developers to manually add descriptions.

Alt Mate solves these problems for developers and screen reader users by scanning web pages for images without alt text. Then, Alt Mate generates plain language alt text using the image and surrounding context. This context is the raw HTML code surrounding the image. For developers, all images and their alt text are also displayed in Alt Mate’s Chrome Extension. This helps them review all images and alt text in one place. Alt Mate also automatically updates the webpage’s code locally with the generated alt text. This allows screen readers to read out descriptions of the image.

Alt Mate makes websites more accessible even if developers didn’t originally focus on accessibility. For developers who want to improve accessibility, Alt Mate makes it faster for them to add alt text. This encourages more developers to make their websites more accessible.

Methodology and Results

For our project Alt Mate, we designed and implemented a browser plugin/extension that automatically generates alt text for images with missing alt text. Our goal was to create a tool that essentially makes web content more accessible for screen reader users. On the flip side, we also hoped to encourage web developers to prioritize alt text on their sites through this extension as well since it detects all images that are missing alt text on a site.

The first step of Alt Mate is opening the Chrome Extension. When opened, it scrapes every image on the page, retrieving the following from each image element: the URL of the image, the alt property, and 300 characters of HTML source code above and below the image. We realized that the context of an image matters, which is why we collect the surrounding HTML source code. An image’s understanding relies on how it’s used, but also what’s around it. For example, the Facebook logo at 8x8 next to other social media icons in the footer is different from the Facebook logo at 100x30 in the navbar. After we’ve collected all this data from images, we go through each one to check the alt parameter to see if it’s empty or doesn’t exist. If so, we begin the alt text generation process.

The alt text generation process happens per-image, from the images that we’ve flagged as not having alt text. First, we contact the OpenAI API and create a thread: this is similar to a conversation. From here, we send a message to the thread we made, with the image source URL and context we collected. After the message is queued, we run the thread, and then wait for the run to complete. Here, we’re waiting on a response from the OpenAI assistant we instructed. Once we’re done waiting, we retrieve the alt text and repeat the process for the next image.

Now that we’ve generated the alt text for all the images, we do two things. First, we inject the generated alt text back into the page with every image. During the generation stage, all images without alt text say “Loading from Alt Mate!” Here, this is replaced with the generated alt text. We’ve associated every image with a unique index that allows us to know which images to update. Second, we start populating the Alt Mate popup. The popup will no longer show a “Processing” label, and now shows information about every image. This includes: the image itself, alt text (with whether it was Alt Mate-generated or not), the image URL, and the surrounding context of the image. This way, individuals can see what was used to generate the image’s alt text.

For testing, we relied heavily on manual testing across many different web pages, to find the strengths and weaknesses of Alt Mate. Some examples of websites include: Five Guys, Emerald City Athletics, and the UW Department of Biology. We saw that Alt Mate was reading through images of calendars, generating markdown-equivalents. We also saw Alt Mate using full-sized images when they were being rendered at much smaller sizes, with no need to process at their full scale. Due to this, we updated the instructions slightly, as well as our post-processing to reduce these shortcomings.

Ultimately, Alt Mate addresses a critical accessibility gap by generating alt text for images with missing descriptions, making web content more accessible for screen reader users while also encouraging developers to prioritize alt text on their sites. Our design focuses on context-based alt text generation so that the alt text being generated is relevant to the site the user is on.

Learnings and Future Work

One of the biggest takeaways from our project was recognizing that “good enough” descriptions aren’t always sufficient—users need a way to improve them if somehow the alt text description falls short. This leads us to a future improvement where we would also like to add the functionality to actually regenerate alt text in case users want a different alt text description for an image. It would also be helpful in the situation that the alt text generated happened to be incomplete for some reason so that users can have control when things go wrong. Not only that, our current design only generates alt text for images completely missing alt text. We hope to expand this to also generate alt text for images with insufficient alt text like “Image” or so on either with some sort of word count or other design idea.

We also learned how essential context is when generating alt text. Without considering parent and sibling elements, alt text descriptions for images will fall short of capturing the entire meaning if we only send just the image. This learning was what drove us to send contextual information along with image data to OpenAI’s API so that we could have more accurate and useful alt text being generated.

OpenAI API calls are expensive, especially at scale. We felt it was particularly frustrating when testing our design that we were regenerating alt text over and over for sites we’ve already been on. In the future, we would like to optimize this with some form of caching. One of the ways we could implement this is by storing the URL of the image and its generated alt text in some key-value system so that future visitors to the same page won’t trigger additional API calls—instead they’ll immediately receive the previously generated alt text. We think this will not only significantly reduce the API costs but also speed up the user experience so they don’t need to wait for existing alt text to regenerate each time.

Another area for development is this idea of opening threads that continuously interact with the OpenAI API to prompt images about their content. It would introduce the potential for a Q&A format where users could ask questions about images on a webpage. For example, users might ask, “What is the person in this image doing?” which would help extend our use case here to beyond just simple alt text generation.

In the long term, we see Alt Mate as more than just a tool for end users. It could also function as an accessibility audit tool for web developers in helping them identify images with missing or unhelpful alt text on their sites. By making alt text visibility a core feature and goal of our plugin, we hope to drive better accessibility practices across the web for users and developers alike.