看ktransformers, 有三个地方:

  1. AMX 2. CUDA Graph 3. expert defer。 都是围绕 Maximize the utilization of CPU and GPU.