Αναζήτηση ακραίων τιμών σε γραπτό κείμενο με μεθόδους μηχανικής μάθησης

Κοροβέσης, Βασίλειος

Αναζήτηση ακραίων τιμών σε γραπτό κείμενο με μεθόδους μηχανικής μάθησης

Title in other language Outlier detecxtion in text data with machine learning techniques (english)

Entity typeMSc thesis
Author Κοροβέσης, Βασίλειος
Department Μεταπτυχιακή Εξειδίκευση στα Πληροφοριακά Συστήματα (ΠΛΣ)
Date of work 19 September 2021 [2021-09-19]
Work language Ελληνικά
Number of Pages 91
Supervisor Κωτσιαντής, Σωτήριος
Keywords Μηχανική μάθηση | Machine learning | ακραίες τιμές | Outliers | ακραίες τιμές | Anomalies | Κάφκα | Kafka | κείμενο | text | δεδομένα | data | τζάβα | Java | πάιθον | Python | Κέρας | Keras | Τουίτερ | Twitter
Number of international bibliographic reference 26
Description This Dissertation consist of: 21 Equations, 47 Images and 6 Definitions
Abstract
- Σε κάθε περίπτωση όπου συλλέγονται ή μεταφέρονται δεδομένα είναι αναγκαίο αυτά είτε να αποθηκεύονται προς επεξεργασία είτε να επεξεργάζονται σε πραγματικό χρόνο. Επειδή ο όγκος αυτός των δεδομένων συνεχώς αυξάνεται η δυνατότητα να επεξεργάζονται από τον άνθρωπο ή κάποιο ντετερμινιστικό μοντέλο περιορίζεται. Για τον λόγο αυτό έχει πλέον εισαχθεί ένα επιπλέον ενδιάμεσο στάδιο στην επεξεργασία των δεδομένων, ειδικά όταν στα δεδομένα γίνεται ανάλυση με σκοπό να βρεθεί η αιτία κάποιου γεγονότος και συνηθέστερα κάποιο σφάλμα. Σε αυτήν την περίπτωση εφαρμογές όπως αυτή που αναπτύσσεται σε αυτή την εργασία αναλαμβάνουν να επεξεργαστούν τα δεδομένα σε πραγματικό χρόνο και να τα κατηγοριοποιήσουν σηματοδοτώντας τα ως ακραίες (Outlier/Anomaly) και πιθανά ακραίες τιμές (Potential Outlier/Anomaly). Έτσι σε περιπτώσεις όπως τα logs ενός server ή τα tweets που αναφέρονται σε ένα θέμα ή γεγονός, όπου ο όγκος δεδομένων και το κείμενο μπορεί να είναι χιλιάδες γραμμές σε λίγα δευτερόλεπτα καθίσταται απαραίτητο να γίνεται αυτή η επεξεργασία όπου τα δεδομένα συλλέγονται και κατηγοριοποιούνται σε anomalies ή μη οπότε η τελική επεξεργασία να γίνεται πάνω σε αυτά τα μη συνηθισμένα δεδομένα που σε τέτοιες περιπτώσεις κουβαλούν και την περισσότερη και πιο χρήσιμη πληροφορία.
- In every case where data are collected or transferred, it is necessary either to be stored for processing or to be processed in real time. Because the volume of those data keeps increasing, the capability of processing either from a human or a deterministic model is being limited. For this reason, a medium level in the processing of data has been introduced, especially when a data analysis occurs in order to find the cause of circumstance and more commonly in some error. In this case, applications, like this one which is developed in this assignment, undertake the task to process the data in real time and categorize them as outlier/anomaly or potential outlier/anomaly. So, in cases like the logs of a server or the tweets which refer to a subject or an occurrence, where the data volume can be thousands of lines in a few seconds, makes it necessary for this process to occur where the data are collected and categorized in anomalies or not, so the final process to be made on those non- conventional data which in these cases carry the most and most useful information.
LicenceItems in Apothesis are protected by copyright, with all rights reserved, unless otherwise indicated.

Αναζήτηση ακραίων τιμών σε γραπτό κείμενο με μεθόδους μηχανικής μάθησης - Identifier: 160268

Internal display of the 160268 entity interconnections (Node labels correspond to identifiers)

Loading..

Legend

Navigation

Info

Controls

Narrowness

Inferred

Αναζήτηση ακραίων τιμών σε γραπτό κείμενο με μεθόδους μηχανικής μάθησης

Title in other language Outlier detecxtion in text data with machine learning techniques (english)

Main Files