Methodological Validation of Machine Learning Models for Non-Technical Loss Detection in Electric Power Systems: A Case Study in an Ecuadorian Electricity Distributor
Methodological Validation of Machine Learning Models for Non-Technical Loss Detection in Electric Power Systems: A Case Study in an Ecuadorian Electricity Distributor
Blog Article
Detecting fraudulent behaviors in electricity consumption is a significant challenge for electric utility companies due to the lack of information and the complexity of both constructing patterns and distinguishing between regular and fraudulent consumers.This study proposes a methodology based on data analytics that, through the processing of information, generates lists of suspicious metering systems for fraud.The database provided by the electrical distribution company contains 266,298 records, of which 15,013 have observations for possible frauds.One of the challenges lies in managing the different variables in the training data and choosing appropriate evaluation metrics.
To address this, a balanced database of 27,374 records was used, with an equitable division between fraud and non-fraud cases.The features used in the identification and construction of patterns for non-technical losses were crucial, although additional techniques Spacecraft Model Kit could be applied to determine the most relevant variables.Following the process, several popular classification models were trained.Hyperparameter optimization was performed by using grid search, and the models were Leather Biker Jacket validated by using cross-validation techniques, finding that the ensemble methods Categorical Boosting (CGB), Light Gradient Boosting Machine (LGB) and Extreme Gradient Boosting (EGB) are the most suitable for identifying losses, achieving high performance and reasonable computational cost.
The best performance was compared by measuring accuracy (Acc) and F1 score, which allows for the evaluation of various techniques and is a combination of two metrics: detection rate and precision.Although CGB achieved the best performance in terms of accuracy (Acc = 0.897) and F1 (0.894), it was slower than LGB, so it is considered the ideal classifier for the data provided by the electrical distribution company.
This research study highlights the importance of the techniques used for fraud detection in electricity metering systems, although the results may vary depending on the characteristics of the training, the number of variables, and the available hardware resources.