Skip to main content
NEWS CLASSIFICATION USING MACHINE LEARNING METHODS: THE CASE OF BOSNIAN LANGUAGE

NEWS CLASSIFICATION USING MACHINE LEARNING METHODS: THE CASE OF BOSNIAN LANGUAGE

MSc student: Kanita Krdzalic-Koric

Mentor: Assist. Prof. Dr. Kanita Karadjuzovic-Hadziabdic

Natural language processing is a combination of artificial intelligence and linguistics. The Internet has provided means of big data collection that can be used to extract useful knowledge. Information retrieval can be a time-consuming and costly process. For that reason, there is more interest in automatic systems for data processing. The focus of this thesis is on a collection of textual contents from the Bosnian news portal and their classification into predefined categories. Within this work, data is converted into a structured form using existing natural language processing methods for text processing. The structured textual data is analyzed using three selected machine learning algorithms. The experiments have shown the best results are achieved using a support vector machine algorithm with the TF-IDF feature selection method. The overall classification accuracy was 90.86% which is a very high accuracy result when characteristics of the Bosnian language are taken into consideration

International University of Sarajevo - The best private university in Bosnia and Herzegovina

Contact