About
I'm Daniel Siegle, an AI/ML Engineer based in North Carolina. I contract through Possible Futures, LLC, where I focus on LLM training and fine-tuning as well as traditional ML projects across the biotech and life-sciences space.
I earned my M.S. in Pharmaceutical Sciences from North Carolina Central University. My thesis — Cytochrome P450 Inhibitor Classification with Statistical Learning — benchmarked Bayesian binary QSAR against scikit-learn classifiers on HTS luminescence assay data for five CYP isozymes (1A2, 2C9, 2C19, 2D6, 3A4). Before that I completed a B.S. in Biology at UNC-Chapel Hill.
On the industry side, I spent four years at Q2 Solutions — first as a Scientific Programmer, then as a Bioinformatics Software Engineer at Q2 Genomics, where I built sequencing analysis, QC, and data transfer pipelines for a satellite NGS lab in Beijing. More recently I taught AI for Health and Life Sciences at the North Carolina School of Science and Mathematics (NCSSM). I started out in drug manufacturing at Biogen Idec as a Manufacturing Associate / Associate Scientist.
Outside of work, I run Deep Learning RTP, a 1,700+ member community meetup in the Research Triangle focused on deep learning, with over 400 past events and counting.
Projects
LLM Training Pipeline
Production infrastructure for large language model fine-tuning — 50K+ training examples, 500+ tokens/sec throughput. End-to-end pipeline covering data preparation, training orchestration, and evaluation.
gut-typist
A Lua-based typing tutor built for the terminal. Lightweight, distraction-free typing practice with progress tracking.
View on GitHub →
CYP450 Inhibitor Classification
Master's thesis project comparing Bayesian binary QSAR models with scikit-learn classifiers for predicting cytochrome P450 inhibition. Evaluated performance on HTS luminescence assay data across five CYP isozymes (1A2, 2C9, 2C19, 2D6, 3A4), with applications in early-stage drug metabolism screening.
LLM Tuning Demonstration
End-to-end walkthrough of LLM fine-tuning techniques — from data formatting through training to evaluation. Designed as a practical reference for teams adopting LLM customization.
View on GitHub →
NGS Bioinformatics (Q2 Genomics)
Developed and maintained genomic analysis pipelines for next-generation sequencing data at Q2 Solutions. Supported a satellite lab in Beijing with sequencing QC, analysis workflows, and cross-site data transfer infrastructure. Received the 2019 CEO Team Award as part of the Q2 Genomics Global Expansion Team for Beijing lab support.
Neural Network from Scratch
A hands-on implementation of a neural network using only NumPy — no frameworks, just math. Walk through backpropagation, gradient descent, and activation functions step by step.
View the full notebook →
Writing
Building a Neural Network in NumPy
A step-by-step tutorial walking through the construction of a neural network from scratch — covering forward propagation, loss functions, backpropagation, and training loops, all implemented in pure NumPy.
Read the tutorial →
CYP450 Prediction & Cheminformatics in Pharma
How statistical learning and molecular fingerprints can predict cytochrome P450 inhibition — bridging HTS assay data with QSAR modeling for early ADMET screening in drug development. Coming soon.
Karpathy "Zero to Hero" Workshop Recap
Notes from the Neural Network Coding Workshop series at Deep Learning RTP, working through Andrej Karpathy's "Zero to Hero" curriculum — building language models from scratch, one layer at a time. Coming soon.