Σύγκριση και αξιολόγηση αρχιτεκτονικών βαθιάς μάθησης για ανίχνευση αντικειμένων σε εικόνες

ΜΑΡΙΟΣ ΜΠΑΣΙΝΑΣ

Σύγκριση και αξιολόγηση αρχιτεκτονικών βαθιάς μάθησης για ανίχνευση αντικειμένων σε εικόνες

Title in other language Comparison and evaluation of deep learning architectures for object detection in images (english)

Entity typeMSc thesis
Author ΜΑΡΙΟΣ ΜΠΑΣΙΝΑΣ
Department Μεταπτυχιακή Εξειδίκευση στα Πληροφοριακά Συστήματα (ΠΛΣ)
Date of work 16 May 2026
Work language Ελληνικά
Number of Pages 137
Supervisor Καστανιώτης Δημήτριος
Committee members Βασίλειος Βερύκιος | Βασίλειος Καψάλης | Δημήτρης Καστανιώτης
Keywords ανίχνευση αντικειμένων | Transformer-based detectors | RT-DETRv2 | Κλιμάκωση ανάλυσης (image scaling) | Gaussian blur (low-pass invariance) | LPIPS
Course / Module Μεταπτυχιακή Εξειδίκευση στα Πληροφοριακά Συστήματα / ΠΛΣ
Number of international bibliographic reference 67
Description Περιλαμβάνει: Εικόνες, Διαγράμματα, Πίνακες
Abstract
- Η παρούσα διπλωματική εργασία εξετάζει με συστηματικό τρόπο τη συμπεριφορά ενός σύγχρονου transformer-based ανιχνευτή αντικειμένων υπό ελεγχόμενες μεταβολές της εισόδου, με στόχο να διαχωριστεί καθαρά η χωρική ακρίβεια εντοπισμού (localization) από τη σημασιολογική ορθότητα ταξινόμησης (classification). Αντί να περιορίζεται στην αναφορά συνολικών δεικτών, η μελέτη εστιάζει στο πώς και σε ποια συνιστώσα μεταβάλλεται η απόδοση όταν αλλοιώνεται η χωρική ή συχνοτική πληροφορία της εικόνας.
  Ως μοντέλο μελέτης επιλέγεται το RT-DETRv2-R50, το οποίο αξιολογείται στο COCO 2017 validation set, με πλήρως αναπαραγώγιμο πρωτόκολλο. Το πειραματικό μέρος οργανώνεται γύρω από δύο βασικούς άξονες αλλοίωσης της εισόδου: (α) κλιμάκωση της ανάλυσης εικόνας (image scaling) και (β) εφαρμογή χαμηλοπερατών αλλοιώσεων μέσω Gaussian blur. Μέσα από αυτά τα ελεγχόμενα perturbations εξετάζεται αν η υποβάθμιση της απόδοσης είναι ομοιόμορφη ή αν επηρεάζει δυσανάλογα συγκεκριμένες πλευρές της ανίχνευσης.
  Τα αποτελέσματα δείχνουν ότι τόσο η μείωση της ανάλυσης όσο και η θόλωση επηρεάζουν εντονότερα τις αυστηρές μετρικές χωρικής ευθυγράμμισης (όπως AP75 και AP_S), ενώ μετρικές τύπου ανάκλησης μεταβάλλονται ηπιότερα. Αυτό υποδεικνύει ότι, υπό παραμόρφωση, η σημασιολογική αναγνώριση τείνει να διατηρείται περισσότερο από τη γεωμετρική πιστότητα των bounding boxes. Για να αποσαφηνιστεί ο μηχανισμός της υποβάθμισης, εισάγεται διαγνωστική αποσύνθεση σφαλμάτων σε επίπεδο bounding box (IoU και κανονικοποιημένα σφάλματα κέντρου/μεγέθους), η οποία αναδεικνύει ότι οι χαμηλοπερατές αλλοιώσεις επηρεάζουν κυρίως την εκτίμηση ορίων και κλίμακας, ιδιαίτερα σε μικρά αντικείμενα. Επιπλέον, χρησιμοποιείται η μετρική LPIPS ως model-agnostic δείκτης έντασης παραμόρφωσης, επιτρέποντας τη συσχέτιση της “severity” του perturbation με τη συμπεριφορά του ανιχνευτή. Συνολικά, η εργασία προτείνει ένα ερμηνεύσιμο πλαίσιο αξιολόγησης της invariance ως γεωμετρικής συνέπειας του localization και όχι απλώς ως διατήρησης ενός συνολικού score.
- This thesis systematically investigates the behavior of a modern transformer-based object detector under controlled input perturbations, with a particular focus on disentangling spatial localization accuracy from semantic classification correctness. Rather than reporting only aggregate performance scores, the study examines how—and in which component—the detector’s predictions change when the spatial detail or the frequency content of the input image is altered.
  RT-DETRv2-R50 is selected as the core model and is evaluated on the COCO 2017 validation set using a fully reproducible evaluation protocol. The experimental study is organized around two main axes of input variation: (a) image-resolution scaling and (b) low-pass appearance degradation via Gaussian blur. These controlled perturbations are used to assess whether performance degradation is uniform or whether it disproportionately affects specific aspects of detection.
  Results indicate that both reduced resolution and blur have a stronger impact on strict localization-sensitive metrics (such as AP75 and AP_S), while recall-type measures degrade more mildly. This suggests that under perturbations the model often retains a substantial degree of semantic recognition, whereas the geometric fidelity of bounding boxes progressively deteriorates. To clarify the underlying failure mechanism, the thesis introduces a diagnostic bounding-box error decomposition (IoU and normalized center/size errors), showing that low-pass degradations primarily destabilize boundary and scale estimation, especially for small objects. In addition, LPIPS is employed as a model-agnostic perceptual severity indicator, enabling a direct mapping between perturbation strength and changes in detector behavior. Overall, the work provides an interpretable evaluation framework that views invariance through the lens of localization consistency and geometric stability, rather than solely through a single aggregate detection score.
Publisher Hellenic Open University
Licence Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές

Σύγκριση και αξιολόγηση αρχιτεκτονικών βαθιάς μάθησης για ανίχνευση αντικειμένων σε εικόνες - Identifier: 240179

Internal display of the 240179 entity interconnections (Node labels correspond to identifiers)

Loading..

Legend

Navigation

Info

Controls

Narrowness

Inferred

Σύγκριση και αξιολόγηση αρχιτεκτονικών βαθιάς μάθησης για ανίχνευση αντικειμένων σε εικόνες

Title in other language Comparison and evaluation of deep learning architectures for object detection in images (english)

Main Files