PRISM Demo Gallery

Cosmos-Reason2-2B zero-shot vs PRISM fine-tuned on real retail video clips

17
Examples
20+
Task Types
1 Goal-Conditioned Reasoning
48 frames | 9.8s
Q Is the person still evaluating the product or ready to place it in the basket? What should be his action next?
Cosmos-Reason2-2B Zero-shot
The person is still evaluating the product. Next, he should place the guava in the basket.
PRISM Fine-tuned
<Think>The person is currently holding and inspecting the guava, indicating a decision-making process about its quality. The current sub-goal is 'Evaluate Product Quality', which is part of the larger goal 'Select Produce Item'. Since the inspection is ongoing, the immediate next step is to make a final judgment. If the guava is deemed acceptable, the next action would be to place it in the basket. If not, they would likely return it to the display. Given the prompt states 'Evaluate Product Quality', the most logical next step after inspection is to place it in the basket if it meets the criteria. </Think> The person is still evaluating the guava's quality. The next action would be to place the guava in the basket if it meets their standards, or return it to the display if it does not.
2 Activity Recognition
44 frames | 5.1s
Q What is he doing in the scene?
Cosmos-Reason2-2B Zero-shot
He is reaching into a fruit display case.
PRISM Fine-tuned
The person is inspecting a blackberry. They hold it in their hand and examine it closely.
3 Scene Understanding
40 frames | 3.3s
Q Describe the scene.
Cosmos-Reason2-2B Zero-shot
The shelves are stocked with a variety of products, including personal care items, food, and household goods.
PRISM Fine-tuned
The scene shows the person walking down a wide aisle between shelves. The shelves are filled with personal care products, including shampoos, conditioners, and other toiletries.
4 Scene Localization
40 frames | 2.2s
Q Where is the person located ?
Cosmos-Reason2-2B Zero-shot
The person is located in the aisle of a store.
PRISM Fine-tuned
The person is in the Aisle (Personal Care/Flour).
5 Activity Recognition
40 frames | 3.3s
Q What activity is this person performing and why?
Cosmos-Reason2-2B Zero-shot
The person is carefully reading the label on the shampoo bottle, likely to ensure they are selecting the correct product based on their needs.
PRISM Fine-tuned
The person is holding the Vatika shampoo bottle and examining the back label. They are looking for information about the product, likely to check for details such as ingredients, usage instructions, or expiration date.
6 Object Counting & Tracking
40 frames | 5.7s
Q How many product the person evaluated? Why?
Cosmos-Reason2-2B Zero-shot
The person evaluated two products, the blue box and the green box, because they were the only two products being held and examined at that moment.
PRISM Fine-tuned
<Think>The person is holding two different products, a green box and a blue box, simultaneously. This indicates a comparison is being made. The action of 'Comparing Products' directly supports the idea that the person is evaluating these items for a decision. The presence of both items in hand, rather than just one, is a clear sign of a comparison. </Think> The person is evaluating two products, a green box and a blue box, to compare them for a decision.
7 Hand Interaction
40 frames | 3.3s
Q Which hand did the person use to pick up the green product?
Cosmos-Reason2-2B Zero-shot
The person used their right hand to pick up the green product.
PRISM Fine-tuned
The person's left hand was holding the item, and their right hand was reaching and grasping it.
8 Object Counting & Tracking
40 frames | 1.9s
Q How many products did the person check? Respond with a number only.
Cosmos-Reason2-2B Zero-shot
1
PRISM Fine-tuned
2
9 Domain Knowledge
40 frames | 2.7s
Q Would you expect to find beef in this aisle?Answer short.
Cosmos-Reason2-2B Zero-shot
yes
PRISM Fine-tuned
No, the visible items are tea boxes, indicating this is a tea aisle, not a grocery aisle for beef.
10 Domain Knowledge
40 frames | 2.8s
Q What material is the grabbed item made of?
Cosmos-Reason2-2B Zero-shot
The grabbed item is made of plastic.
PRISM Fine-tuned
The grabbed item is a blue packet of Bru coffee. The packet is made of plastic.
11 Object Counting & Tracking
36 frames | 2.7s
Q How many products did the person put into the basket?
Cosmos-Reason2-2B Zero-shot
Two products.
PRISM Fine-tuned
The person put one product, a small red container, into the basket.
12 Object Counting & Tracking
36 frames | 2.3s
Q Count the products in the basket at the beginning and at the end.
Cosmos-Reason2-2B Zero-shot
There are 3 products in the basket at the beginning and 4 products at the end.
PRISM Fine-tuned
The basket contains 2 items at the start, and 3 items at the end.
13 Hand Interaction
36 frames | 2.2s
Q What actions are the hands performing?
Cosmos-Reason2-2B Zero-shot
The hands are holding a blue basket and placing a jar into it.
PRISM Fine-tuned
The left hand is holding the basket, and the right hand is grasping an item.
14 Object Counting & Tracking
36 frames | 3.5s
Q Count the number of products do we have in basked by the end? What was the last item?
Cosmos-Reason2-2B Zero-shot
There is one product in the basket by the end, which is a blue bottle.
PRISM Fine-tuned
The shopping basket contains 2 items. The last item added was the Hershey's Cocoa container.
15 Spatial Reasoning
30 frames | 5.9s
Q Can we open the refrigerator from where we are standing?
Cosmos-Reason2-2B Zero-shot
Yes, the refrigerator is on the left side of the aisle, and it is possible to open it from where you are standing.
PRISM Fine-tuned
<Think>The refrigerator unit in the foreground is a large, multi-door appliance. To open it, one would need to reach over the counter and pull or push the doors. The person's current position is at a distance from the refrigerator, and their hands are not visible, indicating they are not in a position to interact with the appliance directly. Therefore, it is not possible to open the refrigerator from this vantage point. </Think> No, the refrigerator is too far away to be opened from this position.
16 Spatial Reasoning
30 frames | 5.4s
Q How can we get to the refrigerator from where we are standing?
Cosmos-Reason2-2B Zero-shot
We can walk straight ahead to the refrigerator, which is located on the left side of the aisle.
PRISM Fine-tuned
<Think>The scene shows a wide aisle in a grocery store. The refrigerator units are positioned along the left side of this aisle, directly adjacent to the bulk item display. To reach the refrigerator, one would need to walk forward along the aisle, moving past the bulk items. The aisle itself appears to be wide enough to accommodate a person walking comfortably. </Think> Walk forward along the aisle, moving past the bulk item display, to reach the refrigerator on the left side.
17 Activity Recognition
44 frames | 3.9s
Q What is the person in the black-and-white shirt doing?
Cosmos-Reason2-2B Zero-shot
The person in the black-and-white shirt is pushing a shopping cart and walking towards the right side of the frame.
PRISM Fine-tuned
The person in the black-and-white shirt is pushing a shopping cart down the aisle. They appear to be looking at the items on the shelves as they move.