New York, NY
Fordham University, MSBA, Class of 2020
RPubs
LinkedIn
Resume
Github
Email: ngocdinh1410@gmail.com
School email: bdinh@fordham.edu
I am currently a Masters in Business Analytics Student at Fordham University. If you’re visiting this page you’re welcome to look at my work!
Hi,
I am a student interested in Business Analytics. I came from a Finance job where I worked with various outdated databases that stored a lot of valuable data. I was frustrated with how little there has been done to extract the most out of customer data, but at the same time I was not technical enough to work on any higher-level analytics project. I made the transition to Business Analytics through enrolling at Fordham University. While I’ve only been studying for a year in Analytics, through my challenging learning curve I have explored many areas of analytics. I created this page to introduce some of the works that I’m proud of in the past two semesters. I had a tough learning curve as I did not know much in R or Python. By the end of the first semester, I was entrusted by three professors/ faculty to work with them on important research in data visualization and web analytics. If you are interested in learning more about me, shoot me a message at ngocdinh1410@gmail.com or on LinkedIn! You can contact me if you like any of my projects or have any interesting ideas you want my collaboration!
Movielens recommendation system
I used the data from movielens dataset and created two recommendation system: popularity-based and user-based collaborative filtering. The CF model was done with ALS Pyspark.
Million song recommendation system
The models are also popularity-based and user-based collaborative filtering. However, I filtered for songs with individual listen count of more than 1 time. The rationale is that if user only listen to the song once, it suggested that the user might not have liked the song.
-more to be added-
Web parsing was one of the first thing I learned, thus these tutorials are meant to provide a starting point for beginners in Python.
Simple web scraping with BeautifulSoup
I attempted to scrape several pages of reviews from rotten tomatoes. Scraping done with Beautiful soup and finding the appropriate tags.
I could not scrape dynamic website without some help from Selenium. The automation of selenium is of great asisstance to scraping dynamic websites.
In this repository, I will go through the process of scraping rotten tomato reviews and analyze those text reviews For this project, we utilized a web parser in order to scrape data from 11 movies from the website Rotten Tomatoes. From there, we used this data to build a classifier which we then evaluated. Lastly, we performed exploratory data analysis on our data. For each of the 11 movies we scraped a total of ten pages of reviews from each movie. From this we gathered information on a total of 2,089 reviews. Our data frame consisted of the following columns: Reviewers’ Name, Rating (fresh or rotten), Review and Date.
-more to be added-
What we seek to explore with this dataset: Can we predict a movie’s popularity based on type of movie, genre, runtime, imdb rating, imdb number of votes, critics rating, critics score, audience rating, Oscar awards obtained (actor, actress, director and picture)? We hope to find a model with good predicting power to predict the IMDB rate of a movie.
The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. We use this dataset to explore correlation between certain risk factors
March Madness Prediction Competition

Model Result:

Conclusion:
-more to be added-
The input dataset is a sample of fashion reviews crawled from Vogue during Fashion Week. The main content is the review text. The file also contains meta data such as “year”, “season”, “brand”, “author of review.” The jupyter notebook will scan through the review texts and performed some basic text analytics to identify key trends of fashion in 2016
-more to be added-
Generated a relational database that fits the 3 normal form requirements and the business objectives of the art gallery.

-more to be added-
Health Awareness SPSS Modeler Project: Analyzing CDC data on behavioral risk with SPSS Modeler with clustering, neural networks and naive bayes.
IBM Capstone Project - NYC Neighborhood Clustering: I attempted to cluster NYC neighborhood according to crime stats for my last IBM Data Science Project.
-more to be added-