Experience

Tencent

Algorithm Engineer Intern
Shenzhen, China
July 2025 - Present

Financial function calling

  • Trained a multi-intent financial function-calling component with multi-turn SFT and DAPO on top of Qwen3 30A3B.
  • Reduced hallucinations around stock and fund codes by increasing parameter-level reward pressure and constructing positive samples for persistently mistaken groups during DAPO training.
  • Received a return offer from the internship and continued to work on production-facing model systems.

Financial agent training and evaluation

  • Built the interaction workflow used in multi-turn financial search agent training, including the testing pipeline and the post-test iteration SOP.
  • Supported a Qwen3 30A3B-based financial agent whose performance not only reached parity with DeepSeek V3.2 on internal financial search benchmarks, but also surpassed it on FinSearchComp under the same business tool setting.
  • Stabilized evaluation by fixing tool images, lowering judge randomness, and iterating judge prompts, reducing score fluctuation from 10% to 1%.
  • Used LLM-assisted analysis SOPs to speed up issue localization across tools, evaluation standards, and entity interfaces during rapid system iteration.

Huawei

Algorithm Engineer Intern
Nanjing, China
December 2024 - May 2025

RL training framework optimization

  • Improved an internal reinforcement learning training framework to address crash and instability issues during post-training.
  • Introduced DAPO into the framework and designed a dynamic sampling strategy that prioritizes samples with higher reward variance.
  • Increased training stability without additional compute by implicitly improving effective batch quality through better sample selection.