论文

Jinming Hu, Jiahao Gu, Kenta Ploch, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang (2026). Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis. To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026).

Jinming Hu (2026). The Informational Foundation of Physical Reality: Proving the Necessity of Relativity and Quantics via Information Bandwidth.

Jinming Hu (2025). The Computational Event Horizon: A Heuristic Taxonomy of Solvability for the Millennium Prize Problems.

Xinjing Zhou, Viktor Leis, Jinming Hu, Xiangyao Yu, Michael Stonebraker (2025). Practical db-os co-design with privileged kernel bypass. Proceedings of the ACM on Management of Data (SIGMOD).

PDF Google Scholar

查看全部出版物

文章

写在读博旅程之后：我与我的博导Qizhen Zhang之间的冲突

不到三个月退学、家庭变故与被博士生导师Qizhen Zhang背刺之后，我选择了新的道路：用 AI 创业去回应现实，也回应初心。

Mar 24, 2026 8 分钟阅读时长 Life

写在读博旅程之前

写在读博出发前，对过去五年工作与选择的回望，也是对未来研究与教育之路的确认。

Mar 24, 2026 13 分钟阅读时长 Life

信息带宽：相对论与量子力学的逻辑必然性

相对论和量子力学不只是经验科学的发现，它们更像是信息有限性这一底层约束的逻辑推论。

Feb 19, 2026 3 分钟阅读时长 Research

The Computational Event Horizon: The Universal Quantifier Trap

A metamathematical framework categorizing mathematical systems based on their capacity for logical simulation and the Universal …

Dec 21, 2025 5 分钟阅读时长 Research

CMU15-721课程总结

首先关于15-721的课程介绍请参见我的上一篇文章。昨天晚上刷完了CMU 15-721 2023 Spring课程的全部视频，也看了一部分的推荐论文，这里做一下课程总结。

May 10, 2023 2 分钟阅读时长

查看全部文章

经历

创始人兼首席科学家

Sea-Land.ai

Dec 2025 – 现在

负责高性能系统与机器学习方向的研发，致力于构建对所有人都有帮助的 AI。

博士生

多伦多大学计算机系

Sep 2025 – Dec 2025

进行 AI 与数据库方向研究。

研究实习生

University of Toronto

Jan 2025 – Dec 2025 Toronto

软件工程师

DolphinDB

Mar 2021 – 现在杭州

有幸与 Davis、 Xinjing Zhou 及多位优秀同事合作。

设计并实现时序数据库存储引擎，在分析、写入和点查上都具备高性能。
牵头推进面向 AI 的 System/DB 工作，增加 textDB（文本检索）、vectorDB 等能力。
维护并扩展现有计算引擎：
- 设计开发分布式 join 等新功能。
- 优化 group by、context by 等已有能力。
- 设计 RBO + CBO。
- 扩展 Dlang 语言能力。

特邀讲师

浙江大学

Nov 2020 – Jan 2021 杭州

与蔡登教授和何晓飞教授共同设计并讲授图灵班核心课程《机器学习》。

系统开发工程师

Optiver

Apr 2020 – Nov 2020 上海

开发基于规则的自动交易系统。
改进机器学习流水线。
改进二进制程序测试环境。

机器学习软件工程实习生

Google CloudAI

Jun 2019 – Sep 2019 北京

探索 Tesseract 的扩展应用与工程实现。

有幸与 Jingtao Wang、 Sijia Ma、 Yong Cheng 合作。

机器学习工程实习生

杭州法布科技

May 2018 – Nov 2018 杭州

参与 ADAS 中人脸模块设计与开发。
参与人脸验证系统设计与开发。

机器学习研究实习生

硕士研究生

浙江大学

Sep 2017 – Mar 2020 杭州

系统学习计算机基础。
研究机器学习、数据挖掘与计算机视觉。
担任机器学习课程助教。

精选论文

Jinming Hu, Armaan Nanji, Zixiu Meng, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang

February 2026

FEDDE: Federated Data Deduplication

This paper introduces FEDDE, a general and efficient framework that addresses data redundancy across clients to facilitate effective federated learning (FL). At its core, FEDDE adopts a hierarchical deduplication architecture where clients first perform local, centralized deduplication and then send minimal records that are only meaningful for redundancy detection to the server for global deduplication. To enable flexible trade-offs between FL training efficiency and the accuracy of the training outcomes, FEDDE proposes two-round approximate deduplication protocols. A set of system optimizations is further applied to reduce deduplication overhead.

Jinming Hu, Jiahao Gu, Kenta Ploch, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang

February 2026 To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026)

Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis

Federated learning (FL) has emerged as a popular paradigm for distributed machine learning over decentralized data. Data generated by FL clients is prone to noises. While the impact of data noise on centralized learning (CL) is well understood, there is lack of a systematic study for FL. We fill this gap by presenting an empirical investigation to provide a deeper understanding regarding the impact of data noise on FL. Our study is enabled by NoiseMaker, an open-source and extensible toolkit for the injection of controlled data noises across five diverse data modalities. Our experimental evaluation results reveal that FL is significantly more vulnerable to data noise compared to CL.

February 2026

The Informational Foundation of Physical Reality: Proving the Necessity of Relativity and Quantics via Information Bandwidth

This paper establishes that General Relativity and Quantum Mechanics are necessary logical consequences of the Axiom of Finite Information. We introduce a new fundamental constant, i, representing the Information Maximum Transfer Speed, and posit that i > c, where c is the speed of light in a vacuum. By substituting i into the relativistic framework, we demonstrate that the finite nature of i is the primary mechanism preventing infinite information density and logical singularities. Furthermore, we prove that a ‘Theory of Everything’ is precluded by the computational cost of self-reference, and propose the observation of Computational Redshift as a definitive empirical test for the gap between c and i.

Jinming Hu

December 2025

The Computational Event Horizon: A Heuristic Taxonomy of Solvability for the Millennium Prize Problems

We propose a conceptual framework to resolve the dichotomy of the Millennium Prize Problems by categorizing mathematical systems based on their capacity for logical simulation. We distinguish between Class I (Structural) problems (e.g., Poincaré, Hodge, Yang-Mills), which rely on symmetries, conservation laws, and coercivity estimates that constrain degrees of freedom effectively, and Class II (Simulational) problems (e.g., P vs NP, Navier-Stokes), which theoretically possess the fidelity to simulate Universal Turing Machines. While not a formal proof of independence, we argue that Class II problems face obstructions isomorphic to the Halting Problem, inhibiting standard analytic techniques. We posit that the ‘intractability’ of these problems arises because they inhabit a complexity class where asymptotic behavior is determined by generalized computation rather than geometric structure.