/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Peiyi Wang

@sybilhyz
6 posts
2025-01-28
I hope this formula helps researchers with no experience in RL better understand the RL of LLMs. Additionally, I am grateful to have @pigjunebaba by my side to witness the miracle of RL.
2025-01-28 View on X
Wired

DeepSeek's privacy policy shows broad data collection and says content that users give to its models, like text, audio, and files, is stored on servers in China

Amid ongoing fears over TikTok, Chinese generative AI platform DeepSeek says it's sending heaps of US user data straight to its home country …

Last year, I joined DeepSeek with no RL experience. While conducting Mathshepherd and DeepSeekMath research, I independently derived this unified formula to understand various training methods. It felt like an “aha moment”, though I later realized it was PG.
2025-01-28 View on X
Wired

DeepSeek's privacy policy shows broad data collection and says content that users give to its models, like text, audio, and files, is stored on servers in China

Amid ongoing fears over TikTok, Chinese generative AI platform DeepSeek says it's sending heaps of US user data straight to its home country …

This unified formula has made me truly believe in the potential of RL. During the training of r1-zero, it was because of this formula that I was able to wait for and eventually witness r1-zero's aha moment.
2025-01-28 View on X
Wired

DeepSeek's privacy policy shows broad data collection and says content that users give to its models, like text, audio, and files, is stored on servers in China

Amid ongoing fears over TikTok, Chinese generative AI platform DeepSeek says it's sending heaps of US user data straight to its home country …

I hope this formula helps researchers with no experience in RL better understand the RL of LLMs. Additionally, I am grateful to have @pigjunebaba by my side to witness the miracle of RL.
2025-01-28 View on X
Bloomberg

DeepSeek highlights some of China's key advantages: a deep pool of skilled software engineers, a vast domestic market, and government support via subsidies

and MAGA tensions X: Peiyi Wang / @sybilhyz : I hope this formula helps researchers with no experience in RL better understand the RL of LLMs. Additionally, I am grateful to have @...

Last year, I joined DeepSeek with no RL experience. While conducting Mathshepherd and DeepSeekMath research, I independently derived this unified formula to understand various training methods. It felt like an “aha moment”, though I later realized it was PG.
2025-01-28 View on X
Bloomberg

DeepSeek highlights some of China's key advantages: a deep pool of skilled software engineers, a vast domestic market, and government support via subsidies

and MAGA tensions X: Peiyi Wang / @sybilhyz : I hope this formula helps researchers with no experience in RL better understand the RL of LLMs. Additionally, I am grateful to have @...

This unified formula has made me truly believe in the potential of RL. During the training of r1-zero, it was because of this formula that I was able to wait for and eventually witness r1-zero's aha moment.
2025-01-28 View on X
Bloomberg

DeepSeek highlights some of China's key advantages: a deep pool of skilled software engineers, a vast domestic market, and government support via subsidies

and MAGA tensions X: Peiyi Wang / @sybilhyz : I hope this formula helps researchers with no experience in RL better understand the RL of LLMs. Additionally, I am grateful to have @...