@Higuchi Kokoro

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1

flowchart TB
    %% ノードの定義
    A[DeepSeek-V3-Base 事前学習済みベースモデル]
    Z[DeepSeek-R1-Zero RLのみ]
    R[DeepSeek-R1 推論 非推論 + 安全対応]

    %% DeepSeek-R1-Zero の流れ
    A -->|RL Reasoning Tasks| B[モデル更新 No SFT]
    B --> Z

    %% DeepSeek-R1 の流れ
    A -->|SFT with Cold-Start Data| C[モデル更新 初期]
    C -->|RL Reasoning Tasks| D[モデル更新 推論強化]
    D -->|Rejection Sampling 正答のみ選別| E[SFT用 新データ 推論+非推論]
    E -->|SFT 約80万件| F[モデル更新 多領域]
    F -->|RL 全シナリオ: 安全 + Helpful| R

Aider LLM Leaderboards

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking


@FukasawaYusuke

A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

http://35.73.85.241/ demo

A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches