Gemini -> Image Understanding

Gemini → Image Understanding

Overview

This action enables you to analyze and understand images using the Gemini API. By providing an image file along with a prompt (your question or instruction), you can instruct Gemini to interpret, describe, or extract insights from the image content using a selected Gemini model.

Inputs

Name	Type	Required	Description
`gemini_api_key`	text (registry)	Yes	Your Gemini API key (from settings registry).
`model`	text	Yes	The Gemini model to use (e.g., `gemini-1.5-flash`, `gemini-pro-vision`).
`prompt`	text	Yes	The question, instruction, or prompt about the image.
`image`	file resource	Yes	The image file you want Gemini to analyze.

Function Stack

Create file resource from image
- Reads the file payload from the provided image input.
Gemini API Request
- Sends a POST request to:

https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={gemini_api_key}

- The request body includes both the raw image data and the prompt, following Gemini API’s content structure.

Precondition
- Verifies that the response status is 200 to continue processing.
Response
- Returns the result from the response: gemini_api.response.result.

Example Usage

Request

{
  "gemini_api_key": "AIzaSyD...",
  "model": "gemini-1.5-flash",
  "prompt": "Describe what is happening in this image.",
  "image": "(attach image file)"
}

Response

{
  "result": "The image shows a group of people hiking on a mountain trail under a clear sky."
}

Notes

The model parameter lets you select among available Gemini models for vision tasks (such as gemini-1.5-flash or similar).
The image input must be an actual image file (PNG, JPEG, etc.).
Ensure your Gemini API key has the necessary permissions and quota for vision/model usage.
This action handles encoding and passing the image to Gemini as required by the API.

Troubleshooting

PERMISSION_DENIED or UNAUTHORIZED: Check your Gemini API key and model permissions.
INVALID_ARGUMENT: Make sure your prompt is a string and your image is a valid file format.
UNSUPPORTED_MEDIA_TYPE: Only supported image formats (like jpeg, png) can be processed.
Other errors: Refer to the Gemini API documentation for detailed troubleshooting guidance.

References

Version notes

2025-06-16 16:45:39

Current

2025-06-30T11:09:30.000+00:00

Action summary