Google has launched a new AI feature called Agentic Vision for Gemini 3 Flash. The company announced it on January 28, 2026. The new feature changes how AI understands images.

Earlier AI models looked at images only once. If they missed small details, they guessed the answer.

Agentic Vision solves this problem. It allows the AI to review images repeatedly. The model can zoom, crop, and analyse images step by step.

Google says this makes image understanding more accurate and reliable.


Google Gemini AI: How Agentic Vision Works

Google Gemini AI

Agentic Vision follows a simple loop. The AI thinks, acts, and observes. This helps it study images like a human would.

Agentic Vision Key Specifications

  • Uses a Think, Act, Observe process
  • Rechecks images multiple times
  • Runs Python code to analyse visuals
  • Crops, rotates, and annotates images
  • Reads small text and fine details
  • Improves vision accuracy by 5โ€“10%
  • Works with Gemini 3 Flash

During the Think step, the AI plans what to check.
In the Act step, it runs code to zoom or crop images.
In the Observe step, it studies the new image details.

Google says code execution is the first major tool used by Agentic Vision. It helps the model avoid guessing.

Also read about: Geminiโ€™s PDF: Google Gemini Transforms Content Creation


Real Uses and Availability

Google shared real examples of how Agentic Vision helps businesses.

A building plan checking platform used it to scan large design files. The AI zoomed into roof edges and layouts. Accuracy improved by 5%.

The feature also supports image marking. The AI can draw boxes and labels on objects. Google showed this with a finger-counting task in the Gemini app.

For visual maths, the AI can read dense tables. It uses Python to calculate results instead of guessing.

Availability Details

  • Available via Gemini API
  • Works in Google AI Studio
  • Supported on Vertex AI
  • Rolling out in the Gemini app
  • Requires โ€œThinkingโ€ mode selection

Google says some actions still need user prompts. These include rotating images and complex visual maths. Future updates will make these steps automatic.

Google is also testing new tools like web search and reverse image search for Agentic Vision.

With this launch, Google strengthens Gemini AI as a powerful multimodal platform.

More News To Read:

Similar Posts