Video Summarisation for Product Reviews: A Important Tool Toward Accessible VR Shopping for Visually Impaired Users

Nilotpal Biswas
May 25
3 min read

When a product page is covered in images, star-ratings and long comment threads, blind or low-vision shoppers have to sift through every word with a screen-reader. Product-review videos are an even greater hurdle: the visual demonstrations can last half an hour, yet most of the essential information, e.g. battery life, build quality and price might be tucked into a few scattered sentences. In today's blog, we will talk about a research article that came up with solutions to tackle this gap.

The authors pose a straightforward question: Can we distil the key take-aways from review videos so that a shopper who cannot see the screen still hears the highlights quickly? To explore the idea they contribute two things:

PVS10, an open dataset of product-review videos assembled specifically for accessibility research. It contains 100 YouTube videos, ten top-ranked clips for each of ten consumer electronics items such as smartphones and headphones.
A baseline summarisation pipeline that slices every video into one-minute segments, converts the soundtrack to text and then scores each segment on keywords, word-cloud prominence, sentiment and visible hand-activity. The system ranks the segments, removes redundancy with Jaccard similarity and stitches the top ten into a concise montage that preserves both factual content and tone.

How well does it work?

Using a question answer F-measure that checks whether the summary still answers practical shopper queries (“How is the camera?”, “What are the drawbacks?”), the prototype scored considerably higher than text-only or vision-only baselines evaluated on the same data.The summaries answered roughly three-quarters of those questions correctly, clearly outperforming earlier methods that relied on text alone or visuals alone. In one trial, the system trimmed more than an hour of iPhone-13 review footage down to a concise ten-minute highlight reel while still addressing about two-thirds of the prepared queries.

Why the dataset matters

PVS10 fills an important gap: most existing summarisation corpora focus on sports, news or generic web videos. None are curated around the patterns found in technology-review channels hands-on demos, pros-and-cons lists, subjective impressions, nor are they labelled with accessibility in mind. The authors release both the videos and the accompanying transcripts, giving developers a ready-made sandbox for multimodal approaches that go beyond vision alone.

From concise reviews to immersive stores

So where does virtual reality fit into all of this? At present, accessible e-commerce largely relies on screen-readers that read product details aloud in a fixed, linear order. The next generation of VR shopping, by contrast, will invite blind and low-vision customers to wander digital aisles, feel tactile controller feedback and chat with voice assistants. Even in such immersive settings, though, shoppers will still want a quick sense of what other buyers think before they commit. That is where a video-summarisation engine like the one built on the PVS10 dataset becomes valuable. As a customer examines, say, a virtual camera, the system could automatically play a concise, context-aware audio digest of trusted review clips rather than forcing the user through an entire half-hour video. Because the summariser flags positive and negative points, designers could link upbeat opinions to subtle spatial sounds or gentle controller vibrations and associate criticisms with a contrasting cue, giving users an intuitive way to compare options without sight. And since the pipeline records which words and visual actions prompted each highlight, those explanations can be spoken on demand, fulfilling the broader call for assistive AI that is both helpful and transparent.

Looking ahead

The authors end with two clear take-aways: build new tools side-by-side with visually impaired users, and make sure those tools can explain how they work. Both points hit a real pain-point in VR accessibility. We can already render dazzling 3-D stores, but we still struggle to fill them with quick, reliable information that honours shoppers’ time and privacy.

The PVS10 dataset helps close that gap. It gives researchers and developers a ready-made benchmark and a working example to build on. The logical next step is to weave this kind of summarisation directly into VR shopping demos, so that blind or low-vision customers can stroll up to a virtual shelf, ask, “What do people really think about this product?” and get an instant, trustworthy answer, no marathon video-watching required.

Reference

Pal, R., Kar, S. and Sekh, A.A., 2024. Enhancing Accessibility in Online Shopping: A Dataset and Summarization Method for Visually Impaired Individuals. SN Computer Science, 5(8), p.1010.

Video Summarisation for Product Reviews: A Important Tool Toward Accessible VR Shopping for Visually Impaired Users

How well does it work?

Why the dataset matters

From concise reviews to immersive stores

Looking ahead

Reference

Recent Posts

Comments

Embedded Interaction Lab