Zifei Dong

M.S. Data Science @ Vanderbilt | ML, Medical Imaging, Quant Research

I am looking for 2026 full-time roles in machine learning, applied AI, quantitative research, and data-heavy engineering.

My work combines medical imaging, LLM pipelines, quantitative modeling, and end-to-end data systems. I have led projects in chest X-ray segmentation, low-dose X-ray denoising, and financial disclosure analysis.

Zifei Dong

Profile

I have worked on machine learning problems where the data are difficult and the outputs still need to be reliable. In medical imaging, that has meant handling arbitrary-view chest X-rays, incomplete labels, synthetic data generation, and low-dose reconstruction. In quantitative and applied AI settings, it has meant building LLM pipelines and structured data workflows over noisy financial disclosures and longitudinal datasets.

What I bring is a mix of modeling depth and practical execution. I am comfortable moving from data processing and experimentation to training, evaluation, and pipeline design. That range has been useful in research labs, quantitative research settings, and applied projects that need both technical rigor and delivery.

Selected Experience

Lead Researcher - AIMP Lab, Northwestern University

Dec 2025 - Present
  • Leading a 2.5D low-dose X-ray denoising project focused on motion robustness, hallucination suppression, and clinically usable reconstruction quality.
  • Led AnyCXR from initial pipeline design to paper submission, building a robust segmentation system for chest X-rays with arbitrary acquisition positions and imperfect annotations.
  • Built large-scale synthetic-data and preprocessing pipelines over 8TB of heterogeneous imaging data to improve cross-view generalization.

Quantitative Researcher - AllianceBernstein x Vanderbilt University

Jan 2026 - Present
  • Designing LLM-based factor extraction systems that turn 10 years of EDGAR filings into structured, interpretable investment signals.
  • Built 5,000+ high-quality financial semantic annotations for supervised fine-tuning and explored multi-agent reasoning for signal fusion.
  • Studied RL alignment and process-supervision approaches to improve reasoning quality in long financial documents.

Data Architect and Engineer - Ultimate Consequences Lab, Vanderbilt University

Mar 2025 - Present
  • Built end-to-end data infrastructure including database design, automated R-based data collection, and LLM-assisted validation workflows.
  • Supported stable management and analysis for mortality and life-history data across 639 individuals.
  • Refactored legacy R workflows into reusable data pipelines, improving collaboration and reducing manual work.

Quantitative Research Intern - Jinge Liangrui Asset Management

Jun 2025 - Aug 2025
  • Built end-to-end pipelines for 10 years of high-dimensional noisy time series data, covering cleaning, alignment, feature generation, and modeling.
  • Optimized matrix-heavy training workflows with Numba and NumPy for roughly 10x faster iteration.
  • Integrated generalized signature neural networks for noisy time-series prediction and validated performance on full-scale data.

Selected Work

AnyCXR

Robust Chest X-ray Segmentation Across Arbitrary Acquisition Positions

First-author medical imaging project on anatomy segmentation for chest X-rays collected under diverse positions, noise patterns, and annotation quality.

  • Built a multi-stage synthetic-data generation pipeline for hard acquisition conditions
  • Combined weak supervision with conditional annotation regularization
  • Reached strong Dice performance on 54 anatomical structures and improved downstream disease classification
PythonPyTorchMedical ImagingSynthetic DataWeak Supervision
Low-Dose X-ray Denoising

2.5D Low-Dose X-ray Denoising with Diffusion and Latent Priors

Ongoing reconstruction work that combines diffusion-style denoising, latent priors, and uncertainty-aware feature fusion for motion-heavy clinical settings.

  • Built a Brownian-bridge diffusion and RQ-VAE-based denoising pipeline
  • Designed smoothing and gating mechanisms to improve slice consistency and suppress hallucinated artifacts
  • Targeted practical reliability in non-rigid anatomical regions under difficult motion
PyTorchDiffusion ModelsRepresentation LearningMedical AIGenerative Reconstruction
EDGAR LLM Pipeline

LLM-Based Factor Extraction from Financial Disclosures

Applied research system for extracting structured signals from 10-K, 10-Q, and 8-K filings using long-context reasoning, supervised fine-tuning, and evaluation workflows.

  • Created a labeled dataset for topic understanding and five-level sentiment classification
  • Built structured pipelines over a decade of filings for interpretable factor generation
  • Explored reasoning-enhanced workflows with process supervision and backtesting
LLMsNLPFinancial TextSFTEvaluation

Technical Toolkit

Machine Learning and Modeling

PyTorch
LLMs
Diffusion Models
Medical Imaging
Weak Supervision
Representation Learning

Data and Systems

Python
NumPy
Pandas
Numba
R
SQL

Applied Areas

Quantitative Research
Financial NLP
Synthetic Data
Longitudinal Data Analysis
Experiment Design
Evaluation Pipelines

Education

M.S. in Data Science

Aug 2024 - May 2026 (Expected)

Vanderbilt University

GPA 3.8/4.0

B.S. in Computer Science and Statistics

May 2021 - May 2024

UNC-Chapel Hill

GPA 3.625/4.0, graduated with distinction

B.S. in Computer Science

Aug 2020 - May 2021

Case Western Reserve University

GPA 4.0/4.0 before transfer

Open to 2026 Full-Time Opportunities

I am actively looking for full-time roles in machine learning, applied AI, quantitative research, medical AI, and LLM systems. If your team works on difficult real-world data, applied research, or production-facing modeling, I would be glad to connect.