Bohan Zhang

I am a Member of Technical Staff at OpenAI. I have broad interests in large-scale data systems, machine learning, and their intersection. Previously, I co-founded OtterTune with Prof. Andy Pavlo and Dr. Dana Van Aken , an automatic database optimization service based on years of research from Carnegie Mellon University's database group.

I earned my Master of Computational Data Science (MCDS) degree from Carnegie Mellon University in 2018. Prior to that, I graduated with honors from Peking University in 2017 with a B.S. in Computer Science.


Experience

OpenAI

Member of Technical Staff

Data Infra.

August 2024 - Present

OtterTune

Co-founder, Engineer

Cofounded OtterTune with Prof. Andy Pavlo and Dr. Dana Van Aken, building on seven years of research at Carnegie Mellon University's database group. OtterTune leverages machine learning to automatically optimize databases like Postgres and MySQL, significantly enhancing performance and reducing costs. We successfully raised $15 million from top VCs (Accel, Intel Capital, and Race Capital), and angel investors (co-founders of Snowflake, Databricks, Duolingo, etc).

August 2020 - August 2024

Striim

Software Engineer

Developed advanced time-series forecasting, sampling, and anomaly detection algorithms for large-scale streaming data. Integrated them seamlessly into Striim, a distributed streaming integration and intelligence system.

February 2019 - August 2020

Databricks

Software Engineer Intern

Enhanced graph algorithms within GraphFrames, a parallel graph processing framework in Spark. Achieved a 4x speedup in the Connected Components algorithm for large-diameter graphs. Improved the Motif Finding algorithm by leveraging the Cost-Based Optimizer in Spark, resulting in up to a 20x performance increase.

May 2018 - August 2018

Research Publications

OtterTune

Advisor: Prof. Andy Pavlo
An automatic database knob tuning service using machine learning, performing as well as human experts. It was subsequently commercialized as a startup. [project website]

An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems
Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Bilien, Andrew Pavlo
VLDB 2021.

A Demonstration of the OtterTune: Automatic Database Management System Tuning Service
Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, Geoffrey J. Gordon
VLDB 2018.

Automatic Database Management System Tuning Through Large-scale Machine Learning 
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang
SIGMOD 2017.

Grape

Advisor: Prof. Wenfei Fan
A new parallel graph computation system achieving 10x performance and a 90% reduction in communication costs compared to state-of-the-art systems. It was commercialized as a graph computing startup that was later acquired by Alibaba. [project website]

Parallelizing Sequential Graph Computations
Wenfei Fan, Jingbo Xu , Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, Chao Tian.
SIGMOD 2017.
(Best Paper Award!)

From Think Parallel to Think Sequential
Wenfei Fan, Yang Cao, Jingbo Xu , Wenyuan Yu, Yinghui Wu, Chao Tian, Jiaxin Jiang, Bohan Zhang
SIGMOD Record 2018.
(SIGMOD Research Highlight Award!)

Striim

Advisor: Alok Pareek
A distributed streaming integration and intelligence system. [project website]

A Demonstration of Striim: A Streaming Integration and Intelligence Platform
Alok Pareek, Bohan Zhang , Bhushan Khaladkar
DEBS 2019.
(DEBS Best Demo Award!)


Talks and Blogs

Talks

Everything you want to know about Postgres autovacuum
Postgres Conference 2024.
Bohan discusses the internals of Postgres autovacuum, strategies for effective monitoring and tuning, and provides real-world examples. [Slides]

The Part of PostgreSQL I Hate the Most
Postgres Conference Silicon Valley 2023.
Bohan explores the implementation of Multi-Version Concurrency Control (MVCC) in Postgres, the issues it causes, and optimization strategies. [Slides]

Lessons Learned from Automatically Optimizing Databases Using Machine Learning in the Real World
HTAP Summit 2023.
Bohan discusses lessons learned from applying machine learning to automate optimization of real-world databases at OtterTune. [Slides]

OtterTune: Automatic database optimization using machine learning
Postgres Conference Asia 2021, 2020.
Bohan introduces OtterTune, an automatic database knob tuning service using machine learning.

Blogs

I wrote technical blogs on database internals and optimizations, some hitting HackerNews front page.

The part of PostgreSQL we hate the most
Posted on Apr, 2023. (w/Andy Pavlo)
Bohan and Andy make the claim that Postgres’s approach to multi-version concurrency control 'sucks' and we go into detail as to why.
[HackerNews][Postgres Weekly][Reddit]

Yes, PostgreSQL has problems, but we’re sticking with it!
Posted on June, 2023. (w/Andy Pavlo)
Bohan and Andy explore strategies to address challenges caused by the implementation of multi-version concurrency control in Postgres.

Query best practices: When should you use the IN instead of the OR operator?
Posted on August, 2023.
Bohan advises using IN clauses instead of OR clauses with PostgreSQL, and discusses the reasons.
[HackerNews]

How Amazon RDS replication works and why the FAA’s database problem won’t happen in AWS
Posted on Jan, 2023.
All flights were grounded on Jan 11, 2023 due to an FAA system outage. Bohan discusses how replication works in AWS to prevent such incidents.

Fixing slow PostgreSQL queries: How to make a smart optimizer more stupid
Posted on August, 2022. (w/Haonan Wang)
Haonan and Bohan discuss their approach to improving the performance of a customer's LIMIT ORDER BY query by up to 10,000 times.
[HackerNews]

Run ANALYZE. Run ANALYZE. Run ANALYZE
Posted on May, 2022.
Bohan discusses how we reduced a customer's job time from 52 minutes to just 34 seconds by running ANALYZE.