Liu Jason Tan

Data and Analytics Associate at Morgan Stanley

Hi there! I’m Liu Jason Tan — a data scientist and aspiring software engineer who loves turning data, code, and ideas into meaningful solutions. My work spans machine learning, computer vision, and risk analytics, and I’m passionate about building tools that make a real impact.
I’m pursuing a Master of Science in Computer Science at Columbia University, where I focus on vision, graphics, interaction, and robotics. Before that, I earned my Master of Applied Data Science from the University of Michigan. Currently, I work at Morgan Stanley, where I’ve rebuilt risk models in Python, developed NLP and machine learning pipelines, and led cross-functional teams to deliver analytics solutions ahead of tight deadlines.
I’ve also enjoyed working on research and competition projects — from an award-winning NLP pipeline to predictive models that improved stock performance forecasting. I’m known for being curious, organized, and collaborative, and I love solving complex problems alongside people who think differently.
When I’m not coding or experimenting with models, you’ll probably find me training for my next marathon or exploring new places to run, think, and recharge.
Thanks for stopping by — here you’ll find my projects, skills, and professional journey so far. I hope something here sparks your curiosity!

Top Skills

Data Science and Data Analysis
Supervised Learning
Unsupervised Learning
Natural Language Processing
Computer Vision
Risk Modeling and Forecasting

Professional Experiences

Morgan Stanley
Data and Analytics Associate

August 2022 - Present

Reconstructed a critical operational risk capital model from scratch, leveraging Python to replace a legacy implementation with 10k+ lines of clean, efficient code. Optimized natural language processing and machine learning models to enhance operational risk incident quality assurance, reducing manual review workload by over 50% and increasing data accuracy for risk event analysis. Led and coached 2 team members and 5 consultants by setting clear guidelines, managing timelines, and running daily standups, enabling the team to deliver mission-critical capital analytics solutions ahead of regulatory deadlines.

Poisera
Data Analytics Intern

June 2021 – August 2021

Applied web scraping and API integration pipelines to collect and analyze large-scale public datasets, directly informing product development decisions, leading to significant user interface improvements. Conducted user interviews and qualitative research to identify critical insights, driving key management decisions and boosting customer satisfaction.

Education

Columbia Univesity
Master of Science in Computer Science
May 2027
University of Michigan - Ann Arbor
Master in Applied Data Science
August 2022
GPA: 4.00 /4.00
Stony Brook University
Bachelor of Science in Information Systems
May 2020
GPA: 3.64 /4.00

Academic Projects

Social Monitoring Dashboard

Developed a full-stack application for real-time sentiment and topic monitoring of company discussions. Utilized supervised and unsupervised learning techniques, including BERT for emotion classification (e.g., surprise, anger, disgust) and non-negative matrix factorization for topic clustering (e.g., account issues, ordering issues, service issues), to gain actionable insights from social media interactions.

S&P 500 Stock Performance Forecasting

Achieved 62% precision with a random forest classifier, a substantial improvement over the 20% precision of a dummy classifier. Successfully categorized stocks into top, middle, and bottom tiers using key equity metrics such as price-to-earnings ratio, dividend yield, and volatility.

S&P 500 Stock Sector Clustering

Conducted a comprehensive analysis of S&P 500 stocks using k-means, agglomerative clustering, and affinity propagation to identify optimal stock groupings. Evaluated clustering methods based on Calinski-Harabasz, Davies-Bouldin, and Silhouette scores, leading to a refined unsupervised clustering model that revealed key sectoral insights.

My Voice Data Challenge

Received first place by leveraging NLP techniques to analyze sentiment in text message surveys regarding COVID-19. Automated data cleaning, text encoding, and hierarchical clustering using BERT to improve the efficiency of research and generate deeper insights.

Quicken Loans Data Challenge

Performed predictive analysis on call data to optimize client contact frequency using a Multi-Layer Perceptron neural network. Employed GridSearchCV for hyperparameter tuning and conducted detailed model interpretation and evaluation.

Electric Vehicle Analysis

Conducted comprehensive data analysis on Tesla vehicle performance, evaluating efficiency in relation to temperature, average speed, and driving smoothness. Utilized Python for data preprocessing, analysis, and visualization, and tracked battery degradation over time to identify patterns and trends.

Awards