Experience
TikTok
Software Engineer 2
Jan. 2024 – Present
AI Data Platform (AIDP) Team, Beijing, China
- Worked on multi-round LLM Unit Test Data Synthesis Pipelines for SFT/RL finetuning Doubao code models.
- Designed and implemented multiple backend microservices using Python/Golang/Kafka/Redis/MySQL stack.
- Implemented and Carried out intelligent distribution experiments on different task assignment strategies such as similar tasks clustering, MILP based global optimum assignment, LLM-assisted pre-labeling, etc.
- Integrated Vector Database into existing architecture to push semantic-related task based on vector similarity to reduce labeler context switching, reduced average handling time by 12%. (Patent Pending).
- Increased labeler efficiency by 10% after implementing novel task-to-labeler distribution strategy based on labeler’s profile and historical data. Evaluated gains by analyzing labeler’s performance data with Hive SQL.
Microsoft
Software Engineer
Sep. 2021 – Dec. 2023
Bing Multimedia Video Index Generation Team, Beijing, China
- Optimized Bing Video Search Index Freshness to match and beat Google’s and Baidu in side-by-side metrics.
- Built and maintained real-time and batch process ETL pipelines for video discovery, ingestion and transformation.
- Improved index size from 120M to 300M by code profiling and optimization without additional capacity ask. Increased index generation service QPS by 3x which mitigated our cluster’s Kafka consumer lag issue.
- Developed monitoring solutions to ensure performance health and reliability for ETL and services.
- Discovered, identified and fixed a high impact memory leak issue (~10GB/hour) in core service.
eBay
SDE Intern
Jun. 2021 – Aug. 2021
San Jose, USA
- Explored query optimization methods on Nugraph, an inhouse JanusGraph based open-source graph database.
Microsoft Research
Research Intern
Feb 2017 – Feb 2018
Beijing, China
- Proposed a Machine Learning (Random Forest) failure prediction method, reduced Azure VMs’ failure rate by 43%.