Type to search

Machine Learning / Data Science Project on Text Analyzer

Final Year Projects

Machine Learning / Data Science Project on Text Analyzer


A DATA SCIENCE / MACHINE LEARNING PROJECT. We are all familiar with SPAM / NOT SPAM feature in our emails. This is an excellent use case of text classification. Our project not only classifies text but it also first extracts it from websites. So, our project has two parts. In first part, our web scraper is provided with a list of website URLs, These websites are extracted, then extracted text HTML is cleaned by removing unwanted tags like HTML tags (<p>,<img>,<h1>), commas, full stops etc and only numerical / textual data is left behind as cleaned data. Now that we have clean data of all the websites we provided to our web scraper, we need to save it somewhere so that we wouldn’t have to extract it again. For this purpose, we are saving all this cleaned data in a text file. At the same time, we also provide labels to our. At this stage, we are done with data collection and data cleaning part of our project. Now is the time for modeling (MACHINE LEARNING). We provide this clean text to our models (by converting it into numerical form) and it does the classification for us.

As per Amir: “Now that we have cleaned up data of all the websites, we provided to our web scraper, we need to save it somewhere so that we wouldn’t have to extract it again. For this purpose, we are saving all this cleaned data in a text file. At the same time, we also provide labels. At this stage, we are done with data collection and data cleaning part of our project. Now is the time for modeling (MACHINE LEARNING). We provide this clean text to our models (by converting it into numerical form) and it does the classification for us.
The special thing about it is that unlike other data science / machine learning projects, it is a lightweight project. Our project requires less data to train and it doesn’t require GPU. The other important thing is that it does all the work from scratch. It starts from extracting text, cleaning it, saving it into a file for later use, converting it into numerical form, feeding it to machine learning algorithm. One more thing! Our project doesn’t rely on single machine learning algorithm, It makes use of different ML algorithms to get better accuracy / results. Last, it is a generic project. It can be trained on any textual data of any nature / field. Just provide the URLs and labels, and you are good to go.”
For more details, please contact Amir at kamir6132@gmail.com
Want to feature your project at Consuldents, upload your FYP by signing up at Consuldents and show it the industry network to leverage your academia currency, feedback and funding.

If you are working on something interesting in graduate education space in Pakistan and would like to share your views/articles/blog, please write to us at info@consuldents.com

About Consuldents.com : Consuldents provide students an opportunity to work on projects, find scholarships, internships, jobs and much more. Employers partner with Consuldents to find skilled students like you.