multimodal-large-language-models

Here are 137 public repositories matching this topic...

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

multi-modality instruction-following in-context-learning large-language-models chain-of-thought instruction-tuning visual-instruction-tuning large-vision-language-model multimodal-instruction-tuning large-vision-language-models multimodal-large-language-models multimodal-in-context-learning multimodal-chain-of-thought

Updated Dec 6, 2024

X-PLUG / MobileAgent

Star

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Sep 26, 2024
Python

modelscope / modelscope-agent

Star

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

agent data-science code chatbot android-application multi-agents rag mobile-agents gpts llm multimodal-large-language-models qwen assistantapi chatglm-4 open-gpts mobile-agent codexgraph data-science-assistant

Updated Dec 4, 2024
Python

ictnlp / LLaMA-Omni

Star

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

speech-to-text speech-to-speech large-language-models multimodal-large-language-models speech-language-model speech-interaction

Updated Nov 14, 2024
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Oct 30, 2024
Python

YangLing0818 / RPG-DiffusionMaster

Star

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

text-to-image image-editting large-language-models multimodal-large-language-models

Updated Oct 10, 2024
Jupyter Notebook

X-PLUG / mPLUG-DocOwl

Star

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Sep 28, 2024
Python

VITA-MLLM / VITA

Star

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

multimodal-large-language-models large-multimodal-models

Updated Oct 24, 2024
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

LLaVA-VL / LLaVA-Plus-Codebase

Star

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models multimodal-large-language-models large-multimodal-models

Updated Feb 1, 2024
Python

BradyFU / Woodpecker

Star

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Jun 17, 2024
Python

X-LANCE / SLAM-LLM

Star

Speech, Language, Audio, Music Processing with Large Language Model

speech-processing audio-processing peft music-processing large-language-model multimodal-large-language-models

Updated Dec 4, 2024
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

Star

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated Nov 11, 2024

AIDC-AI / Ovis

Star

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Nov 26, 2024
Python

rese1f / MovieChat

Star

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

computer-vision dataset llama large-language-models long-video-understanding multimodal-large-language-models

Updated Dec 4, 2024
Python

实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，无须训练，支持音色克隆，首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.

streaming real-time end-to-end tts lip-sync dialogue-systems asr talking-head digital-human multimodal-large-language-models musetalk gradio-python-app

Updated Nov 15, 2024
Python

BradyFU / Video-MME

Star

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

video mme large-language-models large-vision-language-models multimodal-large-language-models video-mme

Updated Jun 18, 2024

deepglint / unicom

Star

MLCD & UNICOM : Large-Scale Visual Representation Model

embodied-artificial-intelligence vision-transformer large-language-models large-sacle-pretrained-model laion400m multimodal-large-language-models

Updated Nov 26, 2024
Python

SkyworkAI / Vitron

Star

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

segmentation mllm multimodal-large-language-models

Updated Oct 20, 2024
Python

Paranioar / Awesome_Matching_Pretraining_Transfering

Star

The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

tutorial awesome-list vision-and-language video-text-recognition cross-modal-retrieval visual-semantic-embedding image-text-matching video-text-retrieval image-text-retrieval multimodal-pretraining large-language-models large-vision-language-models multimodal-large-language-models memory-efficient-tuning parameter-efficient-fine-tuning large-vision-models

Updated Jul 11, 2024

Improve this page

Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-large-language-models

Here are 137 public repositories matching this topic...

BradyFU / Awesome-Multimodal-Large-Language-Models

X-PLUG / MobileAgent

modelscope / modelscope-agent

ictnlp / LLaMA-Omni

cambrian-mllm / cambrian

YangLing0818 / RPG-DiffusionMaster

X-PLUG / mPLUG-DocOwl

VITA-MLLM / VITA

BAAI-DCAI / Bunny

LLaVA-VL / LLaVA-Plus-Codebase

BradyFU / Woodpecker

X-LANCE / SLAM-LLM

richard-peng-xia / awesome-multimodal-in-medical-imaging

AIDC-AI / Ovis

rese1f / MovieChat

Henry-23 / VideoChat

BradyFU / Video-MME

deepglint / unicom

SkyworkAI / Vitron

Paranioar / Awesome_Matching_Pretraining_Transfering

Improve this page

Add this topic to your repo