A framework for few-shot evaluation of language models.
-
Updated
Dec 5, 2024 - Python
A framework for few-shot evaluation of language models.
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
🐢 Open-Source Evaluation & Testing for ML & LLM systems
The LLM Evaluation Framework
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Data-Driven Evaluation for LLM-Powered Applications
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Python SDK for running evaluations on LLM generated responses
The official evaluation suite and dynamic data release for MixEval.
A research library for automating experiments on Deep Graph Networks
AI Data Management & Evaluation Platform
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
Evaluation suite for large-scale language models.
Multilingual Large Language Models Evaluation Benchmark
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."