Liu Jason Tan

Data and Analytics Associate at Morgan Stanley

Welcome! I’m Liu Jason Tan, a data scientist and aspiring software engineer with expertise in machine learning, computer vision, and risk analytics. I combine strong technical skills with collaboration, leadership, and problem-solving abilities to deliver high-quality solutions.
I am pursuing a Master of Science in Computer Science at Columbia University, specializing in vision, graphics, interaction, and robotics, and I hold a Master of Applied Data Science from the University of Michigan. Currently, I work at Morgan Stanley, where I’ve rebuilt mission-critical risk models in Python, developed NLP and machine learning pipelines, and led cross-functional teams to deliver capital analytics solutions ahead of regulatory deadlines.
Beyond industry, I’ve applied my skills to award-winning academic projects, including an NLP pipeline that won first place in a data challenge and predictive models that improved stock performance forecasting accuracy. I am known for my strong communication, time management, and critical thinking, which enable me to tackle complex problems efficiently while collaborating effectively with diverse teams.
On this site, you’ll find my portfolio of projects, technical skills, and professional accomplishments.

Top Skills

Data Science and Data Analysis
Supervised Learning
Unsupervised Learning
Natural Language Processing
Computer Vision
Risk Modeling and Forecasting

Professional Experiences

Morgan Stanley
Data and Analytics Associate

August 2022 - Present

Reconstructed a critical operational risk capital model from scratch, leveraging Python to replace a legacy implementation with 10k+ lines of clean, efficient code. Optimized natural language processing and machine learning models to enhance operational risk incident quality assurance, reducing manual review workload by over 50% and increasing data accuracy for risk event analysis. Led and coached 2 team members and 5 consultants by setting clear guidelines, managing timelines, and running daily standups, enabling the team to deliver mission-critical capital analytics solutions ahead of regulatory deadlines.

Poisera
Data Analytics Intern

June 2021 – August 2021

Applied web scraping and API integration pipelines to collect and analyze large-scale public datasets, directly informing product development decisions, leading to significant user interface improvements. Conducted user interviews and qualitative research to identify critical insights, driving key management decisions and boosting customer satisfaction.

Education

Columbia Univesity
Master of Science in Computer Science
May 2027
University of Michigan - Ann Arbor
Master in Applied Data Science
August 2022
GPA: 4.00 /4.00
Stony Brook University
Bachelor of Science in Information Systems
May 2020
GPA: 3.64 /4.00

Academic Projects

Social Monitoring Dashboard

Developed a full-stack application for real-time sentiment and topic monitoring of company discussions. Utilized supervised and unsupervised learning techniques, including BERT for emotion classification (e.g., surprise, anger, disgust) and non-negative matrix factorization for topic clustering (e.g., account issues, ordering issues, service issues), to gain actionable insights from social media interactions.

S&P 500 Stock Performance Forecasting

Achieved 62% precision with a random forest classifier, a substantial improvement over the 20% precision of a dummy classifier. Successfully categorized stocks into top, middle, and bottom tiers using key equity metrics such as price-to-earnings ratio, dividend yield, and volatility.

S&P 500 Stock Sector Clustering

Conducted a comprehensive analysis of S&P 500 stocks using k-means, agglomerative clustering, and affinity propagation to identify optimal stock groupings. Evaluated clustering methods based on Calinski-Harabasz, Davies-Bouldin, and Silhouette scores, leading to a refined unsupervised clustering model that revealed key sectoral insights.

My Voice Data Challenge

Received first place by leveraging NLP techniques to analyze sentiment in text message surveys regarding COVID-19. Automated data cleaning, text encoding, and hierarchical clustering using BERT to improve the efficiency of research and generate deeper insights.

Quicken Loans Data Challenge

Performed predictive analysis on call data to optimize client contact frequency using a Multi-Layer Perceptron neural network. Employed GridSearchCV for hyperparameter tuning and conducted detailed model interpretation and evaluation.

Electric Vehicle Analysis

Conducted comprehensive data analysis on Tesla vehicle performance, evaluating efficiency in relation to temperature, average speed, and driving smoothness. Utilized Python for data preprocessing, analysis, and visualization, and tracked battery degradation over time to identify patterns and trends.

Awards