NEWS CLASSIFICATION USING MACHINE LEARNING METHODS: THE CASE OF BOSNIAN LANGUAGE
MSc student: Kanita Krdzalic-Koric
Mentor: Assist. Prof. Dr. Kanita Karadjuzovic-Hadziabdic
Natural language processing is a combination of artificial intelligence and linguistics. The Internet has provided means of big data collection that can be used to extract useful knowledge. Information retrieval can be a time-consuming and costly process. For that reason, there is more interest in automatic systems for data processing. The focus of this thesis is on a collection of textual contents from the Bosnian news portal and their classification into predefined categories. Within this work, data is converted into a structured form using existing natural language processing methods for text processing. The structured textual data is analyzed using three selected machine learning algorithms. The experiments have shown the best results are achieved using a support vector machine algorithm with the TF-IDF feature selection method. The overall classification accuracy was 90.86% which is a very high accuracy result when characteristics of the Bosnian language are taken into consideration
















