← Previous · All Episodes · Next →
We aren't running out of training data, we are running out of open training data Episode 35

We aren't running out of training data, we are running out of open training data

· 08:29

|
Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/the-data-wall

0:00 We aren't running out of training data, we are running out of open training data
2:51 Synthetic data: 1 trillion new tokens per day
4:18 Data licensing deals: High costs per token
6:33 Better tokens: Search and new frontiers


Subscribe

Listen to Interconnects Audio using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts YouTube
← Previous · All Episodes · Next →