Christina Gao
Christina Gao Experienced M&A professional who is also passionate in machine learning & analytics.

Web Scraping Project

Web Scraping Project

Web Scraping Demo

I did this web scraping project on the Car Wash Chains & Auto Detailing industry; this web scraping project can be applicable to any industry. I hope this demo and the market expansion analysis project that I did would provide support during the deal sourcing stage.

You can view the app version of this project, which I built and is hosted on the Shiny Server. Please access it through this link:

Web Scraping Project


Part 1 - Scraping News Articles

In Part 1, I used 2 packages in Python to scrape news articles relating to the Car Wash Chains Industry Trends in the U.S.. The packages I used are Ntlk and Newspaper3k. Within a few lines of codes, I was able to extract and retrieved the title, author(s), date published, the full article in text, a summary, and top keywords.

Note: This is a relatively simple Neural Language Processing(NLP) I am using, but the main idea behind NLP is that it is using language models to map words into vector component and that’s why it was able to correctly retrieved the information I need.

Selected News Articles

Part 2 - Scraping A List of Top Car Wash Chains

In this section, I used the rvest package in R to scrape a list of top 50 car wash chains I found on carwash.com. This scraping can also be done using packages like BeautifulSoup in Python.

Top 50 Car Wash Chains in the U.S.

References

Newspaper3k Documentation

rvest Documentation

Top 50: Tracking Industry Trends Summary

2022 M&A Predictions

Promoting 101: How to Advertise Your Carwash

Car Wash Offers a Growing Revenue Stream for Retailers

Top 50 U.S. Conveyor Chain List