HayStack is the model evaluator by Equiano Institute

Evaluate language models through red-teaming

Haystack is a model evaluations and red-teaming suite for LLMs. This benchmark suite is designed to evaluate the performance and security of large language models through red-teaming.

Discrimination Evals RedTeam

Used and trusted by a growing community

0
Models Evaluated
0
Attack Success Rate
5
GitHub Stars

Evaluating LLMs is a minefield.

Current ways of evaluating chatbots/LLMs don't work well, especially for questions about societal impact.

There are no quick fixes. More research is needed.

Arvind Narayanan

Princeton CS prof. Director @PrincetonCITP.

Open LLMs need to get organized and co-ordinated about sharing human feedback. It's the weakest link with Open LLMs right now.

They don't have 100m+ people giving feedback like in the case of @OpenAI/Anthropic/Bard.

Soumith Chintala

Cofounded and lead @PyTorch at Meta.

Let's face it, most safety issues apply to all AI, not just open-source. Even worse, APIs and low-code/no-code is easier to use for 1,000x more people which multiplies the number of potential bad actors & risk exponentially IMO. Let's all work together on these risks instead of blaming the other.

@ClementDelangue