Cover image for DanceOPD: On-Policy Generative Field Distillation

B1 DAILY STORY

Technology 25 Jun 2026

DanceOPD: On-Policy Generative Field Distillation

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.

Image: Daily English Reader / Local generated SVG (Project-owned local asset)

5 min read B1

0:00 0:00

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. So, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state. And trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance.

สรุป/คำแปลไทย

ข่าวนี้มาจาก arXiv อยู่ในหมวดเทคโนโลยี และถูกเรียบเรียงเป็นระดับ B1. เนื้อหาข่าวคือ

การสร้างภาพทันสมัยต้องการแบบเดียวที่รวมความทำได้ที่หลากหลาย รวมถึงข้อความต่อภาพ (T2I) การแก้ไขในท้องถิ่น และการแก้ไขทั่วโลก. แต่ ความทำได้เหล่านี้ ค่อนข้างไม่ตรงกันตามธรรมชาติ และมักจะขัดแย้งกัน. ตัวอย่างเช่น การแก้ไขมักจะทำให้ผลงาน T2I ลดลง ขณะที่การแก้ไขระดับโลกและระดับท้องถิ่นสับสนกัน.

ดังนั้น การสร้างความทำได้เหล่านี้ได้อย่างมีประสิทธิภาพ จึงกลายเป็นโจทย์สําคัญสำหรับการฝึกอบรมแบบสร้างภาพ. เพื่อแก้ปัญหานี้ เรานํามาใช้ DanceOPD เป็นกรอบการดิสติเลชั่นสนามที่สร้างผลในนโยบาย สำหรับแบบที่ตรงกับการไหล ซึ่งนําตัวอย่างแต่ละตัวอย่างไปยังสนามความทำได้หนึ่ง. และรถไฟที่มีเป้าหมาย MSE ความเร็วง่ายๆ.

ด้วยการกําหนดแหล่งความทำได้แต่ละครั้งเป็นสนามความเร็วบนพื้นที่ภาวะการไหลที่แบ่งปัน นักเรียนเรียนเรียนรู้จากสนามที่สอบถามในภาวะการโครงการของตนเอง เพื่อประกอบความทำได้ของผู้เชี่ยวชาญ. การสรุปนี้ยังสับสนสนสนามที่กําหนดโดยผู้ประกอบการ เช่น การแนะนําที่ไม่มีประเภทประเภท.

Save & Review

Only words saved from this story appear here.