Web Scraping Project
Web Scraping Demo
I did this web scraping project on the Car Wash Chains & Auto Detailing industry; this web scraping project can be applicable to any industry. I hope this demo and the market expansion analysis project that I did would provide support during the deal sourcing stage.
You can view the app version of this project, which I built and is hosted on the Shiny Server. Please access it through this link:
Part 1 - Scraping News Articles
In Part 1, I used 2 packages in Python to scrape news articles relating to the Car Wash Chains Industry Trends in the U.S.. The packages I used are Ntlk and Newspaper3k. Within a few lines of codes, I was able to extract and retrieved the title, author(s), date published, the full article in text, a summary, and top keywords.
Note: This is a relatively simple Neural Language Processing(NLP) I am using, but the main idea behind NLP is that it is using language models to map words into vector component and that’s why it was able to correctly retrieved the information I need.
Part 2 - Scraping A List of Top Car Wash Chains
In this section, I used the rvest package in R to scrape a list of top 50 car wash chains I found on carwash.com. This scraping can also be done using packages like BeautifulSoup in Python.
References
Top 50: Tracking Industry Trends Summary
Promoting 101: How to Advertise Your Carwash