Experience

Tencent

Algorithm Engineer Intern
Shenzhen, China
July 2025 - Present

Trained a multi-intent financial function-calling component with multi-turn SFT and DAPO on top of Qwen3 30A3B.
Reduced hallucinations around stock and fund codes by increasing parameter-level reward pressure and constructing positive samples for persistently mistaken groups during DAPO training.
Received a return offer from the internship and continued to work on production-facing model systems.

Built the interaction workflow used in multi-turn financial search agent training, including the testing pipeline and the post-test iteration SOP.
Supported a Qwen3 30A3B-based financial agent whose performance not only reached parity with DeepSeek V3.2 on internal financial search benchmarks, but also surpassed it on FinSearchComp under the same business tool setting.
Stabilized evaluation by fixing tool images, lowering judge randomness, and iterating judge prompts, reducing score fluctuation from 10% to 1%.
Used LLM-assisted analysis SOPs to speed up issue localization across tools, evaluation standards, and entity interfaces during rapid system iteration.

Algorithm Engineer Intern
Nanjing, China
December 2024 - May 2025

Improved an internal reinforcement learning training framework to address crash and instability issues during post-training.
Introduced DAPO into the framework and designed a dynamic sampling strategy that prioritizes samples with higher reward variance.
Increased training stability without additional compute by implicitly improving effective batch quality through better sample selection.