About
Self Introduction Link to heading
- Yuki Iwai (he/him)
- Tokyo, Japan (UTC+9)
- Software Engineer and Part-time OSS developer at CyberAgent, Inc.
Technical Interest Link to heading
- AutoML
- Distributed Trainig
- Batch Workload
- Kubernetes
OSS Activity Link to heading
Roles Link to heading
I’m focused on developing the Kubernetes-based Distributed Systems for AutoML, Distributed Training and Batch Workloads.
- Member of Kubernetes Organization
- kubernetes-sigs/kueue (SIG Scheduling / WG Batch) Lead
- kubernetes/kubernetes kube-controller-manager Job API reviewer
- Member of Kubeflow Organization
- Technical Lead for WG AutoML and WG Training
- kubeflow/katib (WG AutoML) maintainer
- kubeflow/training-operator (WG Training) maintainer
- kubeflow/mpi-operator (WG Training) maintainer
- Member of Kserve Organization
- kserve/kserve contributor
Other Activities Link to heading
- Google Summer of Code 2024 Kubeflow Mentor
Experience Link to heading
- 2022/04 - current: Software Engineer (Private Cloud) and Part-time OSS developer at CyberAgent, Inc.
- Development of the on-prem Kubernetes as a Service (KaaS).
- Security Policies for many Kubernetes Clusters using open-policy-agent/gatekeeper.
- Development of the on-prem Kubernetes-based Machine Learning Platform.
- Hyperparameter Tuning System
- Job Systems for trainig ML models and predicting target values (inference)
- Distributed Training System (RDMA/GPU)
- Serving system for ML models
- Full managed interactive development environment (JupyterLab or Jupyter Notebook)
- Development of the on-prem Kubernetes as a Service (KaaS).
- 2021/07 - 2021/11: Part-time Infra/Software Engineer (Private Cloud) at CyberAgent, Inc.
- Development of the on-prem Kubernetes as a Service (KaaS).
- Security Policies for many Kubernetes Clusters using open-policy-agent/gatekeeper.
- Development of the on-prem Kubernetes-based Machine Learning Platform.
- Serving system for ML models
- Job Systems for trainig ML models
- Development of the on-prem Kubernetes as a Service (KaaS).
Internship Link to heading
- 2020/10/01 - 2020/10/31: CyberAgent, Inc.
- Survey of Kubeflow features
- Survey of NVIDIA DGX A100 performance and features
- Blog: https://developers.cyberagent.co.jp/blog/archives/27764/ (Japanese)
- 2020/09/02 - 2020/09/15: Yahoo! Japan
- Development of monitoring infrastructure for on-prem Kubernetes as a Service (KaaS)
- 2020/08/03 - 2020/08/14: Cybozu
- Development of Rook on upstream
- 2020/07/20 - 2020/07/28: F@N Communications
- Survey of Vitess features
- Blog: https://n.fancs.tech/blog/beginnerofvitess/ (Japanese)
Education Link to heading
- 2020/04 - 2022/03: Electronic Engineering Major, Graduate School of Science and Engineering, Kindai University
- Master of Engineering (Computer Science)
- 2016/04 - 2020/03: Department of Informatics, Faculty of Science and Engineering, Kindai University
- Bachelor of Engineering (Computer Science)
Talks Link to heading
2024 Link to heading
KubeCon & CloudNativeCon North America 2024 Salt Lake City Link to heading
- Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and JobSet - Andrey Velichkevich, Apple & Yuki Iwai, CyberAgent, Inc. at KubeCon NA 2024 Salt Lake City (English)
KubeCon & CloudNativeCon Europe 2024 Paris Link to heading
- Advanced Resource Management for Running AI/ML Workloads with Kueue - Michał Woźniak, Google & Yuki Iwai, CyberAgent, Inc. at KubeCon EU 2024 Paris (English)
- WG-Batch Updates: What’s New and What Is Next? - Michał Woźniak, Google & Yuki Iwai, CyberAgent, Inc. at KubeCon EU 2024 Paris (English)
- Panel: AutoML and Training Working Group Updates - Andrey Velichkevich, Apple; Yuki Iwai, CyberAgent; Johnu George, Nutanix; Amber Graner, Open Source Evangelist at Kubeflow Summit Europe (English)
2023 Link to heading
KubeCon & CloudNativeCon North America 2023 Chicago Link to heading
- Batch Systems in Production with Kueue: Multi-Tenancy and Fungibility - Yuki Iwai, CyberAgent, Inc. & Aldo Culquicondor, Google at Kubernetes AI+HPC Day North America (English)
2020 Link to heading
- ML環境でのRook/Ceph at Japan Rook Meetup #3 (Japanese)
Publications Link to heading
2023 Link to heading
- Kubernetesの知識地図 —— 現場での基礎から本番運用まで (Japanese)
2022 Link to heading
- 入門Kueue〜KubernetesのBatchワークロード最前線〜 (Japanese)