Transformer neural networks represent a deep learning architecture exploiting attention for modeling sequence data, thus not requiring input data to actually be processed in sequence, unlike recurrent neural networks. Because of the recent successes of transformers in machine translation, I was curious to experiment with them for their use in modeling multivariate time series, and more specifically multivariate longitudinal clinical patient data. For this purpose, I have now implemented transformers as part of a method that I previously published (paper, code). The method addresses the problem of clustering multivariate time series with potentially many missing values, and uses a variational autoencoder with a Gaussian mixture prior, extended with LSTMs (or GRUs) for modeling multivariate time series, as well as implicit imputation and loss re-weighting for directly dealing with (potentially many) missing values. In addition to LSTMs and GRUs, I have now implemented transformers as part of the package. In addition to variational autoencoders with Gaussian mixture priors, the code allows to train ordinary variational LSTM/GRU/Transformer autoencoders (multivariate gaussian prior) and ordinary LSTM/GRU/Transformer autoencoders (without prior). I will probably report on some experiments comparing LSTMs and transformers for multivariate longitudinal clinical patient data in the near future.