In machine fault diagnosis, conventional data-driven models trained by empirical risk minimization (ERM) often fail to generalize across domains with distinct data distributions caused by various machine operating conditions. One major reason is that ERM primarily focuses on informativeness of data labels and lacks sufficient attention on invariance of data features. To enable invariance on top of informativeness, a learning framework, learning invariant features via in-label swapping for generalizing out-of-distribution (Lifeisgood), is proposed in this study. Lifeisgood is inspired by a simple intuition that invariance can be assessed by checking changes in loss due to swapping certain entries of features with the same labels. Lifeisgood also enjoys a theoretical guarantee on improving testing domain performance under certain conditions based on a swapping 0-1 loss proposed in this work. To circumvent the training difficulties associated with the swapping 0-1 loss, a swapping cross-entropy loss is derived as a surrogate and theoretical justifications for such a relaxation are also provided. As a result, Lifeisgood can be employed conveniently to develop data-driven fault diagnosis models. In the experiments, Lifeisgood outperformed the majority of state-of-the-art methods in terms of average accuracy and exceeded the second-best by 25% in terms of the frequency of beating the generic ERM. The code is available at: https://github.com/mozhenling/doge-lifeisgood.