This section presents the performance comparison of BERT-based and GPT-based models on MIMIC mortality and readmission prediction tasks using unstructured clinical notes.
Table 2a: Performance comparison of BERT-based and GPT-based models on MIMIC mortality prediction tasks using unstructured clinical notes.Bold denotes the best performance. We use a bootstrapping strategy on all test set samples 100 times to report the mean±std results. All metrics are multiplied by 100 for readability purposes.
Table 2b: Performance comparison of BERT-based and GPT-based models on MIMIC readmission prediction tasks using unstructured clinical notes.Bold denotes the best performance. We use a bootstrapping strategy on all test set samples 100 times to report the mean±std results. All metrics are multiplied by 100 for readability purposes.
GPT-based models, especially with prompting, show strong performance, with models like DeepSeek-R1 and o3-mini-high achieving top results in AUROC and AUPRC for outcome prediction.
DeepSeek-V3 and o3-mini-high also show leading performance for readmission prediction.