Collected molecules will appear here. Add from search or explore.
A benchmarking framework and dataset for evaluating the safety and robustness of Large Language Models (LLMs) against red-teaming attacks and adversarial prompts.
Defensibility
stars
59
forks
10
ALERT is a research artifact (paper-code pairing) from Babelscape that provided an early comprehensive framework for LLM safety. However, with only 59 stars and 10 forks over a two-year period, it has failed to achieve the 'standard' status required for a benchmark to have defensibility. In the LLM space, benchmarks suffer from high velocity; newer suites like HarmBench, JailbreakBench, and Stanford's HELM (Holistic Evaluation of Language Models) have more active community support and broader coverage. Frontier labs like OpenAI and Anthropic are also developing internal, automated red-teaming (ART) systems that are significantly more sophisticated than static datasets. The project's low velocity (0.0/hr) and age (735 days) suggest it is largely a stagnant reference rather than a living tool, making it highly susceptible to displacement by newer, more diverse adversarial datasets that account for post-GPT-4 jailbreaking techniques.
TECH STACK
INTEGRATION
reference_implementation
READINESS