

Unfortunately, the model trained from scratch is ~ 0.5% worse than the multilingual model on a WikiNER split (80/10/10). That means training with a sequence length of 128, then fine-tuning with a sequence length of 512. I used the same fine-tuning parameters as used in the SciBERT paper/repository. I did some experiments with a training corpus size from 16 to 40 GB. Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. Is there any script or tutorial to perform this process step by step? I already have a vocab.txt for the PT-BR base and I don't want to load initial weights. Is it possible to use the run_lm_finetuning.py code to perform this process without using the multi-language bert model? I would like to train BERT from scratch for a textual base in PT-BR (8GB data). I'll share the results (TF checkpoints + Transformers weights) whenever the training on TPU has finished.Įvaluation tasks for that model are a bit limited, so I would evaluate the model for PoS tagging and NER (Universal Dependencies and WikiANN) and compare the model with Could you explain to me how you trained your model from scratch without using Bert multilingual?

tfrecords (both cased and uncased) for a French BERT model (corpus is mainly taken from Wikipedia + OPUS corpora, resulting in ~20GB of text).
