Cover image for Scalable Behavior Cloning with Open Data, Training, and Evaluation

B1 DAILY STORY

Technology 25 Jun 2026

Scalable Behavior Cloning with Open Data, Training, and Evaluation

We introduce ABC, a fully open-source stack for manipulation with behavior cloning. At its core is ABC-130K: the largest open-source teleoperation dataset to date, featuring 3,500 hours of data spanning over 130K episodes across 195 diverse tasks. Furthermore, we open-source our accessible hardware setup, training infrastructure, and simulation pipeline. We also release 400 hours of sim-teleop data and provide a co-training recipe that produces correlated simulation and real-world evaluation, offering a reliable proxy for ablating model-design and training decisions before costly real-world evaluation. We explore various training recipes and compare common architectural choices for Diffusion Transformers (DiT) and Vision-Language-Action (VLA) models, grounding our findings in real-world evaluations. The resulting policies successfully execute dexterous tasks such as box folding and extracting credit cards from wallets. By providing a reproducible toolkit, we aim to place researchers on an equal footing, establishing the necessary foundation to learn the ABCs of Behavior Cloning together as a community.

Image: Daily English Reader / Local generated SVG (Project-owned local asset)

5 min read B1

0:00 0:00

We introduce ABC, a fully open-source stack for manipulation with behavior cloning. At its core is ABC-130K, the largest open-source teleoperation dataset to date, featuring 3,500 hours of data spanning over 130K episodes across 195 diverse tasks. Furthermore, we open-source our accessible hardware setup, training infrastructure, and simulation pipeline. We also release 400 hours of sim-teleop data and provide a co-training recipe that produces correlated simulation. And real-world evaluation, offering a reliable proxy for ablating model-design and training decisions before costly real-world evaluation. We explore various training recipes and compare common architectural choices for Diffusion Transformers (DiT). And Vision-Language-Action (VLA) models, grounding our findings in real-world evaluations. The resulting policies successfully execute dexterous tasks such as box folding and extracting credit cards from wallets.

สรุป/คำแปลไทย

ข่าวนี้มาจาก arXiv อยู่ในหมวดเทคโนโลยี และถูกเรียบเรียงเป็นระดับ B1. เนื้อหาข่าวคือ

เรานําเสนอ ABC เป็นสแต๊กที่เปิดแหล่งครบครัน เพื่อการควบคุมด้วยการคลอนพฤติกรรม. หลักของมันคือ ABC-130K ซึ่งเป็นเซ็ตข้อมูลโทรปฏิบัติการเปิดแหล่งที่ใหญ่ที่สุดจนถึงปัจจุบัน โดยมีข้อมูล 3,500 ชั่วโมง ที่รวมถึง 130K ตอน ผ่านงาน 195 รายการที่หลากหลาย. ยิ่งไปกว่านั้น เราเปิดแหล่งการตั้งแฮร์ดแวร์ที่ทำได้เข้าถึงได้ สิ่งอํานวยความสะดวกในการฝึกอบรม และระบบการจําลอง.

เรายังปล่อยข้อมูลจากซิม-เทเลอป 400 ชั่วโมง และให้สูตรการฝึกอบรมร่วมกันที่ผลิตการจําลองที่เกี่ยวข้อง. และการประเมินในโลกจริง ซึ่งเป็นการนําเสนอตัวแทนที่น่าเชื่อถือ ในการยกเลิกการตัดสินใจออกแบบและการฝึกอบรม ก่อนการประเมินในโลกจริงที่แพง. เราสํารวจสูตรการฝึกอบรมต่าง ๆ และเปรียบเทียบตัวเลือกสถาปัตยกรรมทั่วไปสำหรับเครื่องแปลงกระจาย (DiT).

และรูปแบบ Vision-Language-Action (VLA) ที่พัฒนาผลการค้นพบของเรา. ปริญญาที่เกิดขึ้นได้สําเร็จในการดําเนินงานอย่างมีเกียรติ เช่น การพับกล่อง และการถอดบัตรเครดิตจากกระเป๋าเปลือก.

Save & Review

Only words saved from this story appear here.