References
- Allwein, E., Schapire, R., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141. [Google Scholar]
- Alonzo, T. A., & Nakas, C. T. (2007). Comparison of roc umbrella volumes with an application to the assessment of lung cancer diagnostic markers. Biometrical Journal, 49, 654–664. [Google Scholar]
- Alonzo, T. A., Nakas, C. T., Yiannoutsos, C. T., & Bucher, S. (2009). A comparison of tests for restricted orderings in the three-class case. Statistics in Medicine, 28, 1144–1158. [Google Scholar]
- Austin, P. C., & Steyerberg, E. W. (2013). Predictive accuracy of risk factors and markers: A simulation study of the effect of novel markers on different performance measures for logistic regression models. Statistics in Medicine, 32, 661–672. [Google Scholar]
- Beffa, C. B., Slansky, E., Pommerenke, C., Klawonn, F., Li, J., Dai, L., … Pessler, F. (2013). The relative composition of the inflammatory infiltrate as an additional tool for synovial tissue classification. PLoS ONE, 8, e72494. [Google Scholar]
- Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth. [Google Scholar]
- Cox, D. R., & Wermuth, N. (1992). A comment on the coefficient of determination for binary response. The American Statisticians, 46, 1–4. [Taylor & Francis Online], [Google Scholar]
- Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292. [Google Scholar]
- Delaigle, A., & Hall, P. (2012). Achieving near-perfect classification for functional data. Journal of the Royal Statistical Society: Series B, 74, 267–286. [Google Scholar]
- Dreiseiltl, S., Ohno-machado, L., & Binder, M. (2000). Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making, 20, 323–331. [Google Scholar]
- Edwards, D. C., & Metz, C. E. (2006). Analysis of proposed three-class classification decision rules in terms of the ideal observer decision rule. Journal of Mathematical Psychology, 50, 478–487. [Google Scholar]
- Edwards, D. C., Metz, C. E., & Kupinski, M. A. (2004). Ideal observers and optimal ROC hypersurfaces in n-class classification. IEEE Transactions on Medical Imaging, 23, 891–895. [Google Scholar]
- Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378. [Taylor & Francis Online], [Google Scholar]
- Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, H., … Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537. [Google Scholar]
- Hand, D. J., & Till, R. T. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171–186. [Google Scholar]
- He, X., & Frey, E. C. (2007). An optimal three-class linear observer derived from decision theory. IEEE Transactions on Medical Imaging, 26, 77–83. [Google Scholar]
- He, X., Gallas, B. D., & Frey, E. C. (2010). Three-class ROC analysis – toward a general decision theoretic solution. IEEE Transactions on Medical Imaging, 29, 206–215. [Google Scholar]
- Heckerling, P. S. (2001). Parametric three-way receiver operating characteristic surface analysis using mathematica. Medical Decision Making, 20, 409–417. [Google Scholar]
- Hilden, J., & Gerds, Thomas A. (2014). A note on the evaluation of novel biomarkers: Do not rely on integrated discrimination improvement and net reclassification index. Statistics in Medicine, 33(19), 3405–3414. [Google Scholar]
- Hu, B., Palta, M., & Shao, J. (2006). Properties of r2 statistics for logistic regression. Statistics in Medicine, 25, 1383–1395. [Google Scholar]
- Huang, Z., Li, J., Cheng, C. Y., Cheung, C., & Wong, T. Y. (2016, July). Bayesian reclassification statistics for assessing improvements in diagnostic accuracy. Statistics in Medicine, 35, 2574–2592. ISSN 0277-6715. doi: 10.1002/sim.6899. [Google Scholar]
- Kerr, Kathleen F., Wang, Z., Janes, H., McClelland, Robyn L., Psaty, Bruce M., & Pepe, M. S. (2014). Net reclassification indices for evaluating risk prediction instruments: A critical review. Epidemiology, 25(1), 114–121. [Google Scholar]
- Koltchinskii, V., & Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, 1–50. [Google Scholar]
- Lee, Y., Lin, Y., & Wahba, G. (2004). Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99, 67–81. [Taylor & Francis Online], [Google Scholar]
- Li, J., & Fine, J. P. (2008). ROC analysis with multiple tests and multiple classes: Methodology and applications in microarray studies. Biostatistics, 9, 566–576. [Google Scholar]
- Li, J., & Fine, J. P. (2010). Weighted area under the receiver operating characteristic curve and its application to gene selection. Journal of the Royal Statistical Society Series C (Applied Statistics), 59, 673–692. [Google Scholar]
- Li, J., & Zhou, X. H. (2009). Nonparametric and semi-parametric estimation of the three way receiver operating characteristic surface. Journal of Statistical Planning and Inference, 139, 4133–4142. [Google Scholar]
- Li, J., Jiang, B., & Fine, J. P. (2013a). Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics, 14(2), 382–394. [Google Scholar]
- Li, J., Jiang, B., & Fine, J. P. (2013b). Letter to editor: Response. Biostatistics, 14(4), 809–810. [Google Scholar]
- Li, J., Chow, Y., Wong, W. K., & Wong, T. Y. (2014). Sorting multiple classes in multi-dimensional ROC analysis: Parametric and nonparametric approaches. Biomarkers, 19(1), 1–8. [Taylor & Francis Online], [Google Scholar]
- Li, J., Feng, Q., Fine, J., Pencina, M., & Van Calster, B. (2017). Nonparametric estimation and inference for polytomous discrimination index. Statistical Methods in Medical Research. doi: 10.1177/0962280217692830 [Google Scholar]
- Luo, J., & Xiong, C. (2013). Youden index and associated cut-points for three ordinal diagnostic groups. Communications in Statistics – Simulation and Computation, 42, 1213–1234. [Taylor & Francis Online], [Google Scholar]
- Menard, S. (2000). Coefficients of determination for multiple logistic regression analysis. The American Statisticians, 54, 17–24. [Taylor & Francis Online], [Google Scholar]
- Mossman, D. (1999). Three-way ROCs. Medical Decision Making, 19, 78–89. [Google Scholar]
- Nakas, C. T., & Alonzo, T. A. (2007). ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics, 63, 603–609. [Google Scholar]
- Nakas, C. T., & Yiannoutsos, C. T. (2004). Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine, 23, 3437–3449. [Google Scholar]
- Nakas, C. T., Alonzo, T. A., & Yiannoutsos, C. T. (2010). Accuracy and cut-off point selection in three-class classification problems using a generalization of the Youden index. Statistics in Medicine, 29, 2946–2955. [Google Scholar]
- Nakas, C. T., Dalrymple-Alford, J. C., Anderson, T. J., & Alonzo, T. A. (2012). Generalization of Youden index for multiple-class classification problems applied to the assessment of externally validated cognition in parkinson disease screening. Statistics in Medicine, 95, 995–1003. [Google Scholar]
- Novoselova, N., Beffa, C. D., Wang, J., Li, J., Pessler, F., & Klawonn, K. (in press). HUM calculator and HUM package for R: Easy-to-use software tools for multicategory receiver operating characteristic analysis. Bioinformatics. [Google Scholar]
- Obuchowski, N. (2005). Estimating and comparing diagnostic tests’ accuracy when the gold standard is not binary. Academic Radiology, 12, 1198–1204. [Google Scholar]
- Ogdie, A., Li, J., Dai, L., Pessler, M. E., Yu, X., et al. (2010). Identification of broadly discriminatory tissue biomarkers of synovitis with binary and multicategory receiver operating characteristic analysis. Biomarkers, 15, 183–190. [Taylor & Francis Online], [Google Scholar]
- Pencina, M. J., D’Agostino Sr, R. B., D’Agostino Jr, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172. [Google Scholar]
- Pencina, M. J., D’Agostino Sr, R. B., & Steyerberg, E. W. (2011). Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Statistics in Medicine, 30, 11–21. [Google Scholar]
- Pencina, M. J., D’Agostino Sr, R. B., & Demler, O. V. (2012). Novel metrics for evaluating improvement in discrimination: Net reclassification and integrated discrimination improvements for normal variables and nested models. Statistics in Medicine, 31, 101–113. [Google Scholar]
- Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. New York: Oxford University Press. [Google Scholar]
- Pepe, M. S., Janes, H., Longton, G., Leisenring, W., & Newcomb, P. (2004). Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. American Journal of Epidemiology, 159, 882–890. [Google Scholar]
- Pepe, M. S., Feng, Z., & Gu, J. W. (2008a). Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al. Statistics in Medicine, 27, 173–181. [Google Scholar]
- Pepe, M. S., Feng, Z., & Gu, J. W. (2008b). Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al. Statistics in Medicine, 27(2), 173–181. [Google Scholar]
- Ressom, H. W., Varghese, R. S., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A., … Goldman, R. (2007). Peak selection from MALDI-TOF mass spectra using ant colony optimization. Bioinformatics, 23, 619–626. [Google Scholar]
- Ressom, H. W., Varghese, R. S., Goldman, L., Loffredo, C. A., Abdel-Hamid, M., Kyselova, Z., … Goldman, R. (2008). Analysis of MALDI-TOF mass spectrometry data for detection of Glycan biomarkers. Pacific Symposium on Biocomputing, 13, 216–227. [Google Scholar]
- Schubert, C. M., Thorsen, S., & Oxley, M. (2011). The roc manifold for classification systems. Pattern Recognition, 44, 350–362. [Google Scholar]
- Scurfield, B. K. (1996). Multiple-event forced-choice tasks in the theory of signal detectability. Journal of Mathematical Psychology, 40, 253–269. [Google Scholar]
- Selten, R. (1998). Axiomatic characterization of the quadratic scoring rule. Experimental Economics, 1, 43–62. [Google Scholar]
- Shao, F., Li, J., Fine, J., Wong, W. K., & Pencina, M. J. (2015, January). Inference for reclassification statistics under nested and non-nested models for biomarker evaluation. Biomarkers, 20, 240–252. doi: 10.3109/1354750X.2015.1068854. [Taylor & Francis Online], [Google Scholar]
- Shiu, S. Y., & Gatsonis, C. (2012). On ROC analysis with non-binary reference standard. Biometrical Journal, 54, 457–480. [Google Scholar]
- Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., … Kattane, M. W. (2010). Assessing the performance of prediction models, a framework for traditional and novel measures. Epidemiology, 21, 128–138. [Google Scholar]
- Tjur, T. (2009). Coefficients of determination in logistic regression models – a new proposal: The coefficient of discrimination. The American Statistician, 64, 366–372. [Taylor & Francis Online], [Google Scholar]
- Toth, Z., Zhu, Y., & Marchok, T. (2001). The use of ensembles to identify forecasts with small and large uncertainty. Weather and Forecasting, 16, 463–477. [Google Scholar]
- Van Calster, B., Van Belle, V., Vergouwe, Y., Timmerman, D., Van Huffel, S., & Steyerberg, E. W. (2012a). Extending the c-statistic to nominal polytomous outcomes: The polytomous discrimination index. Statistics in Medicine, 31, 2610–2626. [Google Scholar]
- Van Calster, B., Vergouwe, Y., Looman, C. W. N., Van Belle, V., Timmerman, D., & Steyerberg, E. W. (2012b). Assessing the discriminative ability of risk models for more than two outcome categories: A perspective. European Journal of Epidemiology, 27, 761–770. [Google Scholar]
- Vapnik, V. (1998). Statistical learning theory. New York, NY: Wiley. [Google Scholar]
- Xiong, C., van Belle, G., Miller, J. P., & Morris, J. C. (2006). Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine, 25, 1251–1273. [Google Scholar]
- Zhang, Y., & Li, J. (2011). Combining multiple markers for multi-category classification: An ROC surface approach. Australian and New Zealand Journal of Statistics, 53, 63–78. [Google Scholar]
- Zhou, X. H., Obuchowski, N. A. & McClish, D. K. (2002). Statistical methods in diagnostic medicine. New York, NY: John Wiley & Sons. [Google Scholar]