OpenWebText2

An enhanced version of OpenWebTextCorpus.

The Pile

A large, diverse, open-source language modeling dataset.