ACTIONS
  • Home
  • My Actions
  • My Projects
  • My Packages
Gemini -> Image Understanding
Readonly
Gemini → Image Understanding

Action summary

Gemini -> Image Understanding

Xano / Google Gemini

Gemini → Image Understanding

Overview

This action enables you to analyze and understand images using the Gemini API. By providing an image file along with a prompt (your question or instruction), you can instruct Gemini to interpret, describe, or extract insights from the image content using a selected Gemini model.

Inputs

Name Type Required Description
gemini_api_key text (registry) Yes Your Gemini API key (from settings registry).
model text Yes The Gemini model to use (e.g., gemini-1.5-flash, gemini-pro-vision).
prompt text Yes The question, instruction, or prompt about the image.
image file resource Yes The image file you want Gemini to analyze.

Function Stack

  1. Create file resource from image
    • Reads the file payload from the provided image input.
  2. Gemini API Request
    • Sends a POST request to:
https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={gemini_api_key}
- The request body includes both the raw image data and the prompt, following Gemini API’s content structure.
  1. Precondition
    • Verifies that the response status is 200 to continue processing.
  2. Response
    • Returns the result from the response: gemini_api.response.result.

Example Usage

Request

{
  "gemini_api_key": "AIzaSyD...",
  "model": "gemini-1.5-flash",
  "prompt": "Describe what is happening in this image.",
  "image": "(attach image file)"
}

Response

{
  "result": "The image shows a group of people hiking on a mountain trail under a clear sky."
}

Notes

  • The model parameter lets you select among available Gemini models for vision tasks (such as gemini-1.5-flash or similar).
  • The image input must be an actual image file (PNG, JPEG, etc.).
  • Ensure your Gemini API key has the necessary permissions and quota for vision/model usage.
  • This action handles encoding and passing the image to Gemini as required by the API.

Troubleshooting

  • PERMISSION_DENIED or UNAUTHORIZED: Check your Gemini API key and model permissions.
  • INVALID_ARGUMENT: Make sure your prompt is a string and your image is a valid file format.
  • UNSUPPORTED_MEDIA_TYPE: Only supported image formats (like jpeg, png) can be processed.
  • Other errors: Refer to the Gemini API documentation for detailed troubleshooting guidance.

References

  • Gemini API: Content Understanding
  • Gemini API: Overview \& Vision Models

Version notes

2025-06-16 16:45:39
Current
2025-06-30T11:09:30.000+00:00