Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Microphone Input to MultimodalTextbox #10186

Merged
merged 19 commits into from
Dec 17, 2024
Merged

Add Microphone Input to MultimodalTextbox #10186

merged 19 commits into from
Dec 17, 2024

Conversation

dawoodkhan82
Copy link
Collaborator

@dawoodkhan82 dawoodkhan82 commented Dec 11, 2024

Description

Add microphone support to multimodal textbox. To test run demo: chatbot_multimodal.py

Screen.Recording.2024-12-11.at.6.48.29.PM.mov

Closes: #9094

🎯 PRs Should Target Issues

Before your create a PR, please check to see if there is an existing issue for this change. If not, please create an issue before you create this PR, unless the fix is very small.

Not adhering to this guideline will result in the PR being closed.

Testing and Formatting Your Code

  1. PRs will only be merged if tests pass on CI. We recommend at least running the backend tests locally, please set up your Gradio environment locally and run the backed tests: bash scripts/run_backend_tests.sh

  2. Please run these bash scripts to automatically format your code: bash scripts/format_backend.sh, and (if you made any changes to non-Python files) bash scripts/format_frontend.sh

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Dec 11, 2024

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
Storybook ready! Storybook preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-pypi-previews.s3.amazonaws.com/509488017882094dd3b3e3834ea4b5f35819db5f/gradio-5.9.1-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@509488017882094dd3b3e3834ea4b5f35819db5f#subdirectory=client/python"

Install Gradio JS Client from this PR

npm install https://gradio-npm-previews.s3.amazonaws.com/509488017882094dd3b3e3834ea4b5f35819db5f/gradio-client-1.8.0.tgz

Use Lite from this PR

<script type="module" src="https://gradio-lite-previews.s3.amazonaws.com/509488017882094dd3b3e3834ea4b5f35819db5f/dist/lite.js""></script>

@gradio-pr-bot
Copy link
Collaborator

gradio-pr-bot commented Dec 11, 2024

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
@gradio/audio minor
@gradio/multimodaltextbox minor
gradio minor
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Add Microphone Input to MultimodalTextbox

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

Sorry, something went wrong.

dawoodkhan82 and others added 2 commits December 11, 2024 19:14
@abidlabs
Copy link
Member

After building the frontend and running python demo/chatbot_multimodal/run.py, the styling looks off for me:

image

Are you seeing this?

@dawoodkhan82
Copy link
Collaborator Author

@abidlabs I see that for a split second as the page loads. But looks fine afterwards.
Screenshot 2024-12-12 at 11 44 06 AM

@abidlabs
Copy link
Member

You've built the frontend? I am seeing the UI persistently broken. Console looks like this:

image

dawoodkhan82 and others added 2 commits December 12, 2024 14:59
@abidlabs
Copy link
Member

Thanks @dawoodkhan82, loading now and works well. (Passed along some UI feedback in our 1-1)

dawoodkhan82 and others added 5 commits December 13, 2024 15:39

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@abidlabs
Copy link
Member

abidlabs commented Dec 16, 2024

Tested this out @dawoodkhan82 and works well! Just a few suggestions, mostly around design and documentation.

(1) The padding around the "Record" box looks off. The left border should be aligned with the left of the microphone icon underneath. The padding on the top should be increased to match the vertical gap between the box and the icons.

image

(2) Bug: If I set sources=["microphone"], the upload file icon still shows up

(3) I created an example using gr.ChatInterface and for some reason, the "record" button always appears, even before you click on the microphone icon. Code to reproduce:

import gradio as gr

def echo_sound(msg, history):
    print("msg", msg)
    return "x"

gr.ChatInterface(echo_sound, textbox=gr.MultimodalTextbox(sources=["microphone"]), multimodal=True).launch()

Side note: you could make it such that clicking on the microphone button for a second time hides the record button. then you wouldn't need the "X" icon

(4) It would be great to document this feature in both in the ChatInterface guide and in the custom Chatbot guide, as well as to add a storybook story for the Multimodal component.

Otherwise lgtm!

@dawoodkhan82 dawoodkhan82 merged commit 9b17032 into main Dec 17, 2024
22 of 23 checks passed
@dawoodkhan82 dawoodkhan82 deleted the microphone branch December 17, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add microphone input to gr.MultimodalTextbox
3 participants