Gemini -> Audio Understanding

Gemini → Audio Understanding

Overview

This action enables you to perform audio understanding and analysis using the Gemini API. By referencing an existing, uploaded audio file and providing a question, you can instruct Gemini to analyze, summarize, or extract insights from the audio content.

Note: The file_uri must refer to an audio file that has already been uploaded and is accessible to the Gemini API. This action does not perform file uploads.

Inputs

Name	Type	Required	Description
`gemini_api_key`	text	Yes	Your Gemini API key (from settings registry).
`file_uri`	text	Yes	The URI of an audio file already uploaded to Google Gemini.
`question`	text	Yes	Your prompt, task, or question for Gemini about the audio content.

Function Stack

Talk to Uploaded Content
- Uses the given file_uri and question to instruct Gemini to analyze the specific audio file.
API Request
- Posts a request to

https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=YOUR_API_KEY

Including a payload such as:

{
  "contents": [
    {
      "parts": [
        { "file_data": { "file_uri": "<file_uri>" } },
        { "text": "<question>" }
      ]
    }
  ]
}

- Authenticates with your Gemini API key.
- Parses Gemini's response.

Precondition
- Ensures the API response status is 200.
Response
- Returns the result from gemini_api.response.result.

Example Usage

Request

{
  "gemini_api_key": "AIzaSyD...",
  "file_uri": "https://storage.googleapis.com/path-to-your-audio.wav",
  "question": "Summarize what is being discussed in this meeting recording."
}

Response

{
  "result": "The meeting discusses quarterly revenue figures, marketing strategy, and upcoming project deadlines."
}

Notes

The file_uri must already point to a publicly or API-accessible audio file uploaded to Google Gemini.
Common use cases: summarize calls, extract action items, identify speakers, or answer specific questions about audio content.
Ensure your API key’s quota and model access cover the requested usage.

Troubleshooting

INVALID_ARGUMENT or NOT_FOUND: Check that your file_uri is correct, exists, and is accessible.
PERMISSION_DENIED: Your Gemini API key might be invalid or lack the necessary permissions.
UNSUPPORTED_MEDIA_TYPE: Make sure your audio file format is supported.
For more help, refer to the Gemini API documentation.

References

Version notes

2025-06-30 16:36:53

Current

2025-06-30T11:09:29.000+00:00

Action summary