There are ways around this, eg
arxiv.org/abs/1902.04094, but I think uniform rate MLMs are the cleaner approach. Plus, as shown in Eq 5 of the Factorization Curse paper, it's what you get after marginalizing over all XLNet-style autoregressive permutations.