Cover image for Randomized YaRN Improves Length Generalization for Long-Context Reasoning

B2 DAILY STORY

Technology 22 Jun 2026

Randomized YaRN Improves Length Generalization for Long-Context Reasoning

Large language models (LLMs) are typically pretrained on short sequences and then extended to work on longer sequences with additional training. However, such LLMs still struggle to further generalize to very long sequences. We propose Randomized YaRN, a training method that improves length generalization by combining YaRN-based positional extrapolation with randomized positional encoding and a length curriculum. During training on short context data, tokens are assigned YaRN positional encodings sampled from a larger position range, exposing the model to out-of-distribution positional representations even on short-context inputs. We evaluate Randomized YaRN on two challenging long-context reasoning benchmarks, BABILong and Multi-Round Coreference Resolution (MRCR). When training on data with <8K context, Randomized YaRN consistently improves reasoning performance on context lengths from 16K to 128K and outperforms standard fine-tuning, with the largest gains appearing at far out-of-distribution lengths. Our results suggest that progressively exposing models to OOD positional distributions provides an effective recipe for generalizable long-context reasoning.

Image: Daily English Reader / Local generated SVG (Project-owned local asset)

5 min read B2

0:00 0:00

สรุป/คำแปลไทย

ข่าวนี้มาจาก arXiv อยู่ในหมวดเทคโนโลยี และถูกเรียบเรียงเป็นระดับ B2. เนื้อหาข่าวคือ โมเดลภาษาขนาดใหญ่ (LLM) โดยทั่วไปมีการฝึกซ้อมก่อนในเรียงลําดับสั้น แล้วขยายให้ทำงานในเรียงลําดับยาวกว่า ด้วยการฝึกซ้อมเพิ่มเติม. แต่ LLM เหล่านี้ยังคงต่อสู้เพื่อนําไปทั่วไปเป็นเรียงลําดับยาวมาก. เราเสนอ YaRN แรนโดมิสต์ วิธีการฝึกอบรมที่ปรับปรุงการรวมความยาวโดยการรวมการขยายตําแหน่งจาก YaRN กับการรหัสตําแหน่งแบบสุ่มและหลักสูตรความยาว. ระหว่างการฝึกอบรมข้อมูลในสภาพสั้น โทคอนได้รับการมอบหมาย YaRN โคดิ้งตําแหน่งจากระยะตําแหน่งที่ใหญ่กว่า, ทำให้รุ่นถูกเผชิญกับการแสดงตําแหน่งนอกการกระจายแม้กระทั่งในข้อมูลในสภาพสั้น. เราประเมิน YaRN แรนโดมิสแซมบนสองเทียบเทียบของการคิดในสภาพยนต์ยาวที่ท้าทาย คือ BABILong และ Multi-Round Coreference Resolution (MRCR). เมื่อฝึกสอนข้อมูลที่มีสภาพสภาพ <8K, Randomized YaRN ปรับปรุงผลการคิดในความยาวของสภาพสภาพจาก 16K ถึง 128K และมากกว่าการปรับปรุงแบบเรียบร้อยตามมาตรฐาน, โดยผลประโยชน์ที่ใหญ่ที่สุดจะปรากฏในความยาวที่ออกไปไกลจากกระจาย. ผลงานของเราแสดงให้เห็นว่า การเปิดเผยรูปแบบต่อเนื่องต่อการกระจายตําแหน่ง OOD ให้กับสูตรที่มีประสิทธิภาพสำหรับการคิดทั่วไปในสภาพยาว.

Save & Review

Only words saved from this story appear here.