Evaluation Bundesbot

Proof of concept for an automatic evaluation of an ITZBund Chatbot using a language model

This projects provides ideas for an automtic evaluation of an instance of the ITZBund-Chatbot (aka Bundesbot, see e.g. https://www.itzbund.de/DE/itloesungen/standardloesungen/chatbots/chatbots.html). It should be seen as a Proof-of-Concept and give ideas to others who are tasked with a similar problem. In this project, we use a large language model (LLM) to generate questions that might be sent to the chatbot. The newly generated questions include variations of the pre-defined questions with or without typos, translations and new questions on a given topic. The answers are retrived from the chatbot by API-calls and checked against the pre-defined set of answers. Additionally the LLM is used to evaluate the answer on a score between 1 and 3 whether it is a valid response to the question.

Software-Details

Created at

04/17/24

Last updated

09/25/25

Status

concept

Platform

linux

Software-Version

0.1

License

MIT License

Last updated: 09/25/2025

Evaluation Bundesbot

Software-Details

Making software use visible!

Badges