ACTIONS
  • Home
  • My Actions
  • My Projects
  • My Packages
Gemini -> Audio Understanding
Readonly
Gemini → Audio Understanding

Action summary

Gemini -> Audio Understanding

Xano / Google Gemini

Gemini → Audio Understanding

Overview

This action enables you to perform audio understanding and analysis using the Gemini API. By referencing an existing, uploaded audio file and providing a question, you can instruct Gemini to analyze, summarize, or extract insights from the audio content.

Note: The file_uri must refer to an audio file that has already been uploaded and is accessible to the Gemini API. This action does not perform file uploads.

Inputs

Name Type Required Description
gemini_api_key text Yes Your Gemini API key (from settings registry).
file_uri text Yes The URI of an audio file already uploaded to Google Gemini.
question text Yes Your prompt, task, or question for Gemini about the audio content.

Function Stack

  1. Talk to Uploaded Content
    • Uses the given file_uri and question to instruct Gemini to analyze the specific audio file.
  2. API Request
    • Posts a request to
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=YOUR_API_KEY

Including a payload such as:

{
  "contents": [
    {
      "parts": [
        { "file_data": { "file_uri": "<file_uri>" } },
        { "text": "<question>" }
      ]
    }
  ]
}
- Authenticates with your Gemini API key.
- Parses Gemini's response.
  1. Precondition
    • Ensures the API response status is 200.
  2. Response
    • Returns the result from gemini_api.response.result.

Example Usage

Request

{
  "gemini_api_key": "AIzaSyD...",
  "file_uri": "https://storage.googleapis.com/path-to-your-audio.wav",
  "question": "Summarize what is being discussed in this meeting recording."
}

Response

{
  "result": "The meeting discusses quarterly revenue figures, marketing strategy, and upcoming project deadlines."
}

Notes

  • The file_uri must already point to a publicly or API-accessible audio file uploaded to Google Gemini.
  • Common use cases: summarize calls, extract action items, identify speakers, or answer specific questions about audio content.
  • Ensure your API key’s quota and model access cover the requested usage.

Troubleshooting

  • INVALID_ARGUMENT or NOT_FOUND: Check that your file_uri is correct, exists, and is accessible.
  • PERMISSION_DENIED: Your Gemini API key might be invalid or lack the necessary permissions.
  • UNSUPPORTED_MEDIA_TYPE: Make sure your audio file format is supported.
  • For more help, refer to the Gemini API documentation.

References

  • Gemini API: Overview \& Docs
  • Gemini API: Supported Models

Version notes

2025-06-30 16:36:53
Current
2025-06-30T11:09:29.000+00:00