Projects

A selection of projects that I'm not too ashamed of

Analysis of the Newest Social Media Dataset Focused on “Clean” and Less Toxic Social Media: the Pixstory

Analysis of the Newest Social Media Dataset Focused on “Clean” and Less Toxic Social Media: the Pixstory

A three-month project built for a DSCI course. Examined the target market and user behaviors through Correlation Analysis, Content Analysis, Sentiment Analysis, Hate Speech & Sarcasm Detection, Clustering Analysis, Image Analysis, Language Detection & Translation, Location Analysis, and Data Visulization. Utilized multiple Python libraries, various NLP and ML algorithms and Docker containers.

Time Series Classification for Human Activities

Time Series Classification for Human Activities

Built Logistic Regression Classifier and L1- Peonalized Logistic Regression Classifer for binary classification (classifying bending from other activities). Built L1-penalized multinomial regression model, Gaussian Naive Bayes' Classifier and Multinomial Naive Bayes' Classifer for multi-class classification. Handled class-imbalance effectively by using case-control sampling. Improved the performances of models by evaluating confusion matrix, p-values, test error and accuracy. Reduced the dimensionality of the models through backward selection and recursive feature elimination.

Biomedical Data Analysis by using K-Nearest Neighbors Algorithms

Biomedical Data Analysis by using K-Nearest Neighbors Algorithms

Built a KNN classifer predicting the prescence of spinal column pathologies based on the biomechanical features of the vertebrae using the Vertebral Column Data created by Dr. Henrique de Mota. Six primary biomechanical features of the vertebrae were measured using X-rays of the spines. Optimized the classifier by testing with differenct distance metrics and performing cross-validation for the parameter K.

Soccer Data Analysis Using SQL

Soccer Data Analysis Using SQL

As an enthuasitic soccer fans and a data science student, I am deeply interested in this dataset and want to consolidate my knowledge of using SQL languages to write queries that would pull data from the database, manipulate it, sort it and extract it.

Analysis of Los Angeles 1b1b Apartment Market

Analysis of Los Angeles 1b1b Apartment Market

Gathered real-time data through Python web scraping library BeautifulSoup, Google Review API and Google Geolocation API. Conducted a correlation analysis between rent, online reviews and neighborhood. Organized and modeled data into CSV files. Visualized data by using Python Matplotlib.