gpt-2-simple can be installed via PyPI:
pip3 install gpt-2-simple
You will also need to install the corresponding TensorFlow for your system (e.g.
An example for downloading the model to the local system, fineturning it on a dataset. and generating some text.
Warning: the pretrained 117M model, and thus any finetuned model, is 500 MB! (the pretrained 345M model is 1.5 GB)
import gpt_2_simple as gpt2
model_name = "117M"
gpt2.download_gpt2(model_name=model_name) # model is saved into current directory under /models/117M/
sess = gpt2.start_tf_sess()
steps=1000) # steps is max number of training steps
The generated model checkpoints are by default in
/checkpoint/run1. If you want to load a model from that folder and generate text from it:
import gpt_2_simple as gpt2
sess = gpt2.start_tf_sess()
As with textgenrnn, you can generate and save text for later use (e.g. an API or a bot) by using the
single_text = gpt2.generate(sess, return_as_list=True)
You can pass a
run_name parameter to
load_gpt2 if you want to store/load multiple models in a
There is also a command-line interface for both finetining and generation with strong defaults for just running on a Cloud VM w/ GPU. For finetuning (which will also download the model if not present):
gpt_2_simple finetune shakespeare.txt
And for generation, which generates texts to files in a
Most of the same parameters available in the functions are available as CLI arguments, e.g.:
gpt_2_simple generate --temperature 1.0 --nsamples 20 --batch_size 20 --length 50 --prefix "<|startoftext|>" --truncate "<|endoftext|>" --include_prefix False --nfiles 5
See below to see what some of the CLI arguments do.
NB: Restart the Python session first if you want to finetune on another dataset or load another model.
Differences Between gpt-2-simple And Other Text Generation Utilities
The method GPT-2 uses to generate text is slightly different than those like other packages like textgenrnn (specifically, generating the full text sequence purely in the GPU and decoding it later), which cannot easily be fixed without hacking the underlying model code. As a result:
- In general, GPT-2 is better at maintaining context over its entire generation length, making it good for generating conversational text. The text is also generally gramatically correct, with proper capitalization and few typoes.
- The original GPT-2 model was trained on a very large variety of sources, allowing the model to incorporate idioms not seen in the input text.
- GPT-2 can only generate a maximum of 1024 tokens per request (about 3-4 paragraphs of English text).
- GPT-2 cannot stop early upon reaching a specific end token. (workaround: pass the
truncate parameter to a
generate function to only collect text until a specified end token. You may want to reduce
- Higher temperatures work better (e.g. 0.7 - 1.0) to generate more interesting text, while other frameworks work better between 0.2 - 0.5.
- When finetuning GPT-2, it has no sense of the beginning or end of a document within a larger text. You'll need to use a bespoke character sequence to indicate the beginning and end of a document. Then while generating, you can specify a
prefix targeting the beginning token sequences, and a
truncate targeting the end token sequence. You can also set
include_prefix=False to discard the prefix token while generating (e.g. if it's something unwanted like
- If you pass a single-column
.csv file to
finetune(), it will automatically parse the CSV into a format ideal for training with GPT-2 (including prepending
<|startoftext|> and suffixing
<|endoftext|> to every text document, so the
truncate tricks above are helpful when generating output). This is necessary to handle both quotes and newlines in each text document correctly.
- GPT-2 allows you to generate texts in parallel by setting a
batch_size that is divisible into
nsamples, resulting in much faster generation. Works very well with a GPU (can set
batch_size up to 20 on Colaboratory's K80)!
- Due to GPT-2's architecture, it scales up nicely with more powerful GPUs. For the 117M model, if you want to train for longer periods of time, GCP's P100 GPU is about 3x faster than a K80/T4 for only 3x the price, making it price-comparable (the V100 is about 1.5x faster than the P100 but about 2x the price). The P100 uses 100% of the GPU even with
batch_size=1, and about 88% of the V100 GPU.
- If you have a partially-trained GPT-2 model and want to continue finetuning it, you can set
overwrite=True to finetune, which will continue training and remove the previous iteration of the model without creating a duplicate copy. This can be especially useful for transfer learning (e.g. heavily finetune GPT-2 on one dataset, then finetune on other dataset to get a "merging" of both datasets).
- If your input text dataset is massive (>100 MB), you may want to preencode and compress the dataset using
gpt2.encode_dataset(file_path). THe output is a compressed
.npz file which will load much faster into the GPU for finetuning.
Note: this project is intended to have a very tight scope unless demand dictates otherwise.
- Allow users to generate texts longer than 1024 tokens. (GitHub Issue)
- Allow users to use Colaboratory's TPU for finetuning. (GitHub Issue)
- For Colaboratory, allow model to automatically save checkpoints to Google Drive during training to prevent timeouts.
Examples Using gpt-2-simple
Max Woolf (@minimaxir)
Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.
This repo has no affiliation or relationship with OpenAI.