Zifei Dong
M.S. Data Science @ Vanderbilt | ML, Medical Imaging, Quant Research
I am looking for 2026 full-time roles in machine learning, applied AI, quantitative research, and data-heavy engineering.
My work combines medical imaging, LLM pipelines, quantitative modeling, and end-to-end data systems. I have led projects in chest X-ray segmentation, low-dose X-ray denoising, and financial disclosure analysis.

Profile
I have worked on machine learning problems where the data are difficult and the outputs still need to be reliable. In medical imaging, that has meant handling arbitrary-view chest X-rays, incomplete labels, synthetic data generation, and low-dose reconstruction. In quantitative and applied AI settings, it has meant building LLM pipelines and structured data workflows over noisy financial disclosures and longitudinal datasets.
What I bring is a mix of modeling depth and practical execution. I am comfortable moving from data processing and experimentation to training, evaluation, and pipeline design. That range has been useful in research labs, quantitative research settings, and applied projects that need both technical rigor and delivery.
Selected Experience
Lead Researcher - AIMP Lab, Northwestern University
Dec 2025 - Present- Leading a 2.5D low-dose X-ray denoising project focused on motion robustness, hallucination suppression, and clinically usable reconstruction quality.
- Led AnyCXR from initial pipeline design to paper submission, building a robust segmentation system for chest X-rays with arbitrary acquisition positions and imperfect annotations.
- Built large-scale synthetic-data and preprocessing pipelines over 8TB of heterogeneous imaging data to improve cross-view generalization.
Quantitative Researcher - AllianceBernstein x Vanderbilt University
Jan 2026 - Present- Designing LLM-based factor extraction systems that turn 10 years of EDGAR filings into structured, interpretable investment signals.
- Built 5,000+ high-quality financial semantic annotations for supervised fine-tuning and explored multi-agent reasoning for signal fusion.
- Studied RL alignment and process-supervision approaches to improve reasoning quality in long financial documents.
Data Architect and Engineer - Ultimate Consequences Lab, Vanderbilt University
Mar 2025 - Present- Built end-to-end data infrastructure including database design, automated R-based data collection, and LLM-assisted validation workflows.
- Supported stable management and analysis for mortality and life-history data across 639 individuals.
- Refactored legacy R workflows into reusable data pipelines, improving collaboration and reducing manual work.
Quantitative Research Intern - Jinge Liangrui Asset Management
Jun 2025 - Aug 2025- Built end-to-end pipelines for 10 years of high-dimensional noisy time series data, covering cleaning, alignment, feature generation, and modeling.
- Optimized matrix-heavy training workflows with Numba and NumPy for roughly 10x faster iteration.
- Integrated generalized signature neural networks for noisy time-series prediction and validated performance on full-scale data.
Selected Work
Robust Chest X-ray Segmentation Across Arbitrary Acquisition Positions
First-author medical imaging project on anatomy segmentation for chest X-rays collected under diverse positions, noise patterns, and annotation quality.
- Built a multi-stage synthetic-data generation pipeline for hard acquisition conditions
- Combined weak supervision with conditional annotation regularization
- Reached strong Dice performance on 54 anatomical structures and improved downstream disease classification
2.5D Low-Dose X-ray Denoising with Diffusion and Latent Priors
Ongoing reconstruction work that combines diffusion-style denoising, latent priors, and uncertainty-aware feature fusion for motion-heavy clinical settings.
- Built a Brownian-bridge diffusion and RQ-VAE-based denoising pipeline
- Designed smoothing and gating mechanisms to improve slice consistency and suppress hallucinated artifacts
- Targeted practical reliability in non-rigid anatomical regions under difficult motion
LLM-Based Factor Extraction from Financial Disclosures
Applied research system for extracting structured signals from 10-K, 10-Q, and 8-K filings using long-context reasoning, supervised fine-tuning, and evaluation workflows.
- Created a labeled dataset for topic understanding and five-level sentiment classification
- Built structured pipelines over a decade of filings for interpretable factor generation
- Explored reasoning-enhanced workflows with process supervision and backtesting
Technical Toolkit
Machine Learning and Modeling
Data and Systems
Applied Areas
Education
M.S. in Data Science
Aug 2024 - May 2026 (Expected)Vanderbilt University
GPA 3.8/4.0
B.S. in Computer Science and Statistics
May 2021 - May 2024UNC-Chapel Hill
GPA 3.625/4.0, graduated with distinction
B.S. in Computer Science
Aug 2020 - May 2021Case Western Reserve University
GPA 4.0/4.0 before transfer
Open to 2026 Full-Time Opportunities
I am actively looking for full-time roles in machine learning, applied AI, quantitative research, medical AI, and LLM systems. If your team works on difficult real-world data, applied research, or production-facing modeling, I would be glad to connect.