문제 해설

RoPE (Rotary Position Embedding) [medium]

신경망 · medium

preview

RoPE (Rotary Position Embedding) [medium]

v1 sinusoidal PE 는 임베딩에 덧셈. RoPE (Su et al. 2021, LLaMA/Mistral/Qwen 표준) 는 회전을 적용:

공식

벡터를 짝수/홀수 인덱스로 짝지어 2D 평면에서 회전. 위치 $m$ , 페어 $(x_{2i}, x_{2i+1})$ , 각도 $\theta_i = 10000^{-2i/d}$ :

$\begin{pmatrix} x'_{2i} \\ x'_{2i+1} \end{pmatrix} = \begin{pmatrix} \cos(m\theta_i) & -\sin(m\theta_i) \\ \sin(m\theta_i) & \cos(m\theta_i) \end{pmatrix} \begin{pmatrix} x_{2i} \\ x_{2i+1} \end{pmatrix}$

왜 덧셈보다 좋은가

Norm 보존 (회전은 isometry) — 스케일 손상 없음.
Relative encoding 자연 내재: $\langle \text{RoPE}(q, m), \text{RoPE}(k, n) \rangle$ 이 $n - m$ 만 의존.
추가 파라미터 0, extrapolation 잘 됨 (학습보다 긴 컨텍스트도 동작).

구현

pos shape (L, 1), theta shape (d/2,) → angles = pos · theta shape (L, d/2).
cos, sin 각각 (L, d/2) 로 broadcast.
out[:, 0::2] = x_even * cos - x_odd * sin
out[:, 1::2] = x_even * sin + x_odd * cos

과제

함수 apply_rope(x, base=10000.0) 를 완성하세요.

x shape (L, d), d 짝수.
반환 shape (L, d).

테스트 케이스

#	이름	검증
1	shape 동일
2	norm 보존 (row-wise)	‖x_m‖ = ‖RoPE(x, m)‖
3	pos=0 → identity	첫 row 변하지 않음
4	상대 위치 속성	q=k 일 때 ⟨q_m, k_n⟩ = f(n-m)
5	다른 위치 → 다른 출력
6	d=2 순수 2D 회전
7	base 영향	base=1 이면 주파수 1

코드 작성

Loading...

실행 결과

코드를 작성하고 Run 을 눌러보세요.