Chatbot Development using Retrieval-Augmented Generation (RAG) for Domain-Specific Knowledge in Greek Language

Ανάπτυξη ψηφιακού βοηθού με χρήση της μεθόδου Retrieval-Augmented Generation (RAG) για ανάκτηση και παροχή εξειδικευμένων πληροφοριών στην Ελληνική γλώσσα (Ελληνική)

  1. MSc thesis
  2. ΝΙΚΟΛΑΟΣ ΡΑΠΤΗΣ
  3. Μεταπτυχιακή Εξειδίκευση στα Πληροφοριακά Συστήματα (ΠΛΣ)
  4. 21 Σεπτεμβρίου 2025
  5. Αγγλικά
  6. 106
  7. Καραπιπέρης, Δημήτριος
  8. Αμανατίδης, Δημήτριος | Βερύκιος, Βασίλειος
  9. Retrieval-Augmented Generation, MyDATA, Chatbot, Large Language Models, Greek NLP, API Integration, Hybrid Retrieval, RAGAS Evaluation, LangChain
  10. Μεταπτυχιακή Εξειδίκευση στα Πληροφοριακά Συστήματα (ΠΛΣ)
  11. 2
  12. 9
  13. 34
    • This dissertation presents the design, implementation, and evaluation of a Retrieval-Augmented Generation (RAG) chatbot developed to support domain-specific question answering in the Greek language, with a particular focus on Greece’s official digital accounting and tax platform, myDATA. Large Language Models (LLMs), although powerful, suffer from limitations such as hallucination, fixed knowledge cutoffs, and limited domain specialization. To address these issues, the system integrates a RAG pipeline capable of retrieving and grounding responses in authoritative myDATA documentation, thereby enhancing factual accuracy and regulatory alignment.

      The implementation follows a zero-shot RAG approach, without model fine-tuning, leveraging the LangChain framework, OpenAI’s GPT-4o, Chroma vector store, and Streamlit for deployment. A hybrid retrieval strategy combines dense embeddings with BM25-based sparse search, further enhanced by reranking using Cohere’s neural API and context expansion techniques. A multilingual embedding model enables seamless retrieval in Greek, even for queries posed in English or any other language, while API integration with the myDATA platform facilitates real-time document lookup and invoice data retrieval.

      To ensure high retrieval and response quality, the chatbot underwent both manual and automated evaluations. Using the RAGAS evaluation framework, the system achieved high scores across key metrics such as context recall (0.98), faithfulness (0.94), and F1 score (0.95). Functional testing confirmed support for multilingual interaction, API connectivity, and explicit citation of the source paragraph or document fragment, reinforcing the system’s practical value in accounting and tax-related workflows.

      Overall, the MyDATA Chatbot serves as a robust case study demonstrating the effectiveness of RAG architectures in low-resource language environments. It highlights the potential of combining retrieval techniques, prompt engineering, and API integration to deliver accurate, explainable, and domain-adapted AI systems tailored to public-sector applications.


  14. Hellenic Open University
  15. Αναφορά Δημιουργού-Μη Εμπορική Χρήση 4.0 Διεθνές