ÇOK DEĞİŞKENLİ AYKIRI DEĞER TESPİTİ İÇİN KLASİK VE DAYANIKLI MAHALANOBİS UZAKLIK ÖLÇÜTLERİ: FİNANSAL VERİ İLE BİR UYGULAMA

Çok değişkenli veri setlerinde aykırı değerlerin varlığı anakütle parametre tahminini zorlaştırmakta ve hata varyansını arttırarak kullanılan istatistiki testin gücünü azaltmaktadır. Bu durum, değişkenlerin eşit varyansa ve çok değişkenli normal dağılıma sahip olduğu varsayımlarından sapmalara sebep olmaktadır. Çok değişkenli aykırı değer tespitinde kullanılan tekniklerden biri olan Mahalanobis uzaklığı, aykırı değişkenlere karşı hassas ölçütler olan çok değişkenli ortalamalar ve kovaryans matrisine dayalı olarak hesaplanmakta; çok değişkenli veri setlerinde aykırı gözlemlerin tespitinin engellenmesi veya normal gözlemlerin aykırı gözlem olarak tespit edilmesi problemlerine karşı dayanıklı ölçütlerle de kullanılmaktadır. Bu çalışmada, çok değişkenli aykırı değer tespitinde kullanılan klasik ve dayanıklı Mahalanobis ölçütlerinin aykırı gözlem tespitlerinin karşılaştırılması amaçlanmıştır. Uygulama verisi olarak, Ocak 2013 – Aralık 2017 döneminde New York ve NASDAQ borsasında yatırımcılar tarafından gerçekleştirilen 1.239.507 adet hisse senedi alım ve satım işlemi kullanılmıştır. Aykırı işlemlerin tespitinde miktar ve hacim değişkenleri ele alınarak, her bir işlem için klasik ve dayanıklı ölçütlere dayalı uzaklık skorları hesaplanarak, söz konusu teknikler karşılaştırılmıştır. Çalışma sonucunda, klasik Mahalanobis ölçütü ve En Küçük Hacimli Elipsoid ile tespit edilemeyen maskelenmiş aykırı gözlemlerin, Hızlı Minimum Kovaryans Determinant yöntemiyle tespit edilmiş olduğu; söz konusu yöntemin finans uygulama alanında çok değişkenli veri setlerinde aykırı gözlemlerin tespiti için kullanılabilecek etkin bir yöntem olduğu sonucuna ulaşılmıştır.   

CLASSICAL AND ROBUST MAHALANOBIS DISTANCE MEASURES FOR OUTLIER DETECTION: AN APPLICATION IN STOCK EXCHANGES

The existence of outliers in multivariate data sets contaminates the parameter estimations and reduces the power of the statistical test by increasing the variance of the errors. This situation leads to deviations from the assumptions that the variables have equal variance and multivariate normal distribution. Mahalanobis distance is one of the techniques frequently used in multivariate outliers and it is calculated on the basis of multivariate location and covariance matrix, which are sensitive measures against outliers. In addition, due to the problems such as misidentification of a normal observation as an outlier and the presence of masking of an outlier, robust measures have been used. In this study, it is aimed to compare the performance of classical and robust Mahalanobis measures. 1.239.507 stock transactions executed by investors between the periods of January 2013 - December 2017 in New York Stock Exchange and NASDAQ are used for analysis. In order to determine outlying transactions, volume and value of trade have been analysed. Mahalanobis distances based on classical and robust measures have been calculated for each transaction and the measures are compared. As a result, the masked observations which cannot be detected by classical and robust Minimum Volume Ellipsoid measures, have been detected as outlying by Fast - Minimum Covariance Determinant (Fast MCD) measure. It has been concluded that Fast MCD can be used as an efficient estimator of multivariate location and scatter in presence of masked data for multivariate datasets in financial applications. 

___

  • Aggarwal, Charu C., Outlier Analysis, Springer, 2013.
  • Arteaga, T.G., Alcantud, J.C.R., Calle, R.A. (2016). A cardinal dissensus measure based on the Mahalanobis distance, European Journal of Operational Research, 251(2), 575-585.
  • Carminati, M., Caron, R., Maggi, F., Epifani, I., Zanero, S. (2015). BankSealer: A decision support system for online banking fraud analysis and investigation, Computers & Security, 53, 175-186.
  • Carrato, R.G.H. (2018). Wind farm monitoring using Mahalanobis distance and fuzzy clustering, Renewable Energy, 123(C), 526-540.
  • Chang, C.C. (2012). A boosting approach for supervised Mahalanobis distance metric learning, Pattern Recognition, 45(2), 844-862.
  • Cheng, T. C. & Victoria-Feser, M. P. (2002). High breakdown estimation of multivariate mean and covariance with missing observations, British Journal of Mathematical and Statistical Psychology, 55, 317-335.
  • Cho, S., Hong, H., Ha, B.C. (2010). A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction, Expert Systems with Applications, 37(4), 3482-3488.
  • Coakley, C. W., Hettmansperger, T. P. (1993). A Bounded Influence, High Breakdown, Efficient Regression Estimator, Journal of the American Statistical Association, 88, 872-880.
  • Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., & Walczak, B. (2007). Robust statistics in data analysis – a review: basic concepts. Chemometrics and Intelligent Laboratory Systems, 85(2), 203–219.
  • Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses, Pattern Recognition, 74, 406-421.
  • Fauvel, M., Chanussot, J., Benediktsson, J.A., Villa, A. (2013). Parsimonious Mahalanobis kernel for the classification of high dimensional data, Pattern Recognition, 46(3), 845-854.
  • Haldar, N., Khan, F., Ali, A., Abbas, H. (2016). Arrhythmia Classification using Mahalanobis Distance based Improved Fuzzy C-Means Clustering for Mobile Health Monitoring Systems. Neurocomputing, 220, 221-235.
  • Hardin, J. & Rocke, D.M. (2005). The Distributions of Robust Distances, Journal of Computational and Graphical Statistics, 14(4), 1-19.
  • Hawkins, D. (1980). Identification of Outliers, Chapman and Hall, 1980.
  • Hawkins, D.M., & Olive, D.J. (1999). Improved feasible solution algorithm for high breakdown estimation. Computational Statistics and Data Analysis, 30, 1-11.
  • Hodge, Victoria J., Austin, J. (2004). A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, 22(2), 85-126.
  • Hubert, M. & Debruyne, M. (2010). Minimum Covariance Determinant, Computational Statistics, 2(1), 36-43.
  • Jaffel, I., Taouali, O., Faouzi Harkat, M., Messaoud, H. (2015). A Fault Detection Index Using Principal Component Analysis And Mahalanobis Distance, IFAC-PapersOnLine, 48(21), 1397-1401.
  • Johnson, R. A. & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5. Baskı). Prentice Hall, Upper Saddle River, NJ.
  • Ke, T., Lv, H., Sun, M., Zhang, L. (2018). A biased least squares support vector machine based on Mahalanobis distance for PU learning, Physica A: Statistical Mechanics and its Applications, 509, 422-438.
  • Leys, C., Klein, O., Dominicy, Y., Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance, Journal of Experimental Social Psychology, 74, 150-156.
  • Melnykov, I. & Melnykov, V. (2014). On K-means algorithm with the use of Mahalanobis distances, Statistics & Probability Letters, 84, 88-95.
  • Nguyen, B., Morell, C., Baets, B.D. (2018). Distance metric learning for ordinal classification based on triplet constraints, Knowledge-Based Systems, 142, 17-28.
  • Pompella, M. & Dicanio, A. (2017). Ratings based Inference and Credit Risk: Detecting likely-to-fail Banks with the PC-Mahalanobis Method, Economic Modelling, 67, 34-44.
  • Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective, Expert Systems with Applications, 41(10), 4915-4928.
  • Qiu, Z., Zhou, B., Yuan, J. (2017). Protein–protein interaction site predictions with minimum covariance determinant and Mahalanobis distance, Journal of Theoretical Biology, 433, 57-63.
  • Rocke, D. M., Woodruff, D. L. (1996). Identification of Outliers in Multivariate Data, Journal of the American Statistical Association, 91, 1047-1061. Rousseeuw, P.J. (1985). Multivariate Estimation With High Breakdown Point, Mathematical Statistics and Applications, 1, 283-297.
  • Rousseeuw, P.J. & Leroy, A.M. (1987). Robust Regression & Outlier Detection, Wiley&Sons, New Jersey.
  • Rousseeuw, P. J. & Zomeren, B. C. V. (1990). Unmasking Multivariate Outliers and Leverage Points, Journal of the American Statistical Association, 185(411), 633-634 Rousseeuw, P.J. & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41(3), 212-223.
  • Shang, J., Chen, M.Y., Zhang, H. (2018). Fault detection based on augmented kernel Mahalanobis distance for nonlinear dynamic processes, Computers & Chemical Engineering, 109, 311-312
  • Shulgin, S., Zinkina, J., Korotayev, A., Andreev, A. (2017). “Neighbors in values”: A new dataset of cultural distances between countries based on individuals’ values, and its application to the study of global trade, Research in International Business and Finance, 42, 966-985.
  • Stöckl, S. & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance, Applied Economics and Finance, 1(2), 78-84.
  • Suo, M., Zhu, B., Zhang, Y., An, R., Li, S. (2018). Fuzzy Bayes risk based on Mahalanobis distance and Gaussian kernel for weight assignment in labeled multiple attribute decision making, Journal of Knowledge-Based Systems, 152(C), 26-39.
  • Thode, H.C. (2002). Testing for Normality, Marcel Dekker, New York.
  • Wang, P.C., Su, C.T., Chen, K.H., Chen, N.H. (2011). The application of rough set and Mahalanobis distance to enhance the quality of OSA diagnosis, Expert Systems with Applications, 38(6), 7828-7836,
  • Wang, Q., Wan, J., Yuan, Y. (2018). Locality constraint distance metric learning for traffic congestion detection, Pattern Recognition, 75, 272-281.
  • Warren, R. Smith, R., Cybenko, A. (2011). Use Of Mahalanobis Distance For Detecting Outliers And Outlier Clusters In Markedly Non-Normal Data: A Vehicular Traffic Example, Air Force Research Laboratory Human Effectiveness Directorate Report, 1-52.
  • Willems, G., Joe, H., Zamar, R. (2009). Diagnosing Multivariate Outliers Detected by Robust Estimators, Journal of Computational and Graphical Statistics, 18(1), 73-91
  • Xiang, S., Nie, F., Zhang, C. (2008). Learning a Mahalanobis distance metric for data clustering and classification, Pattern Recognition, 41(12), 3600-3612,
  • Vukovic, O. (2015). Analysing Bank Real Estate Portfolio Management by Using Impulse Response Function, Mahalanobis Distance and Financial Turbulence, Procedia Economics and Finance, 30, 932-938.
Uluslararası İktisadi ve İdari İncelemeler Dergisi-Cover
  • ISSN: 1307-9832
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2008
  • Yayıncı: Kenan ÇELİK