Expert in Data Science, Machine Learning, Textmining, Statistics and Economics with working experience from academia, (central) bank, (re-)insurance, AI startup, consultancy, entertainment and software engineering.
To assess the photovoltaic (PV) potential of a specific site, stations measuring solar radiation are installed prior to the construction of a PV plant. Data from these stations are examined by quality control during which the erroneous data are removed from the assessment of the PV potential. The recognition of erroneous data is in many cases non-trivial, especially as the errors are hidden in changing weather conditions. Typical causes of erroneous data are e.g. shading, soiling, snow, dew and frost on the sensor. Nowadays, the quality assessment of the data is mainly performed by visual inspection and therefore must be done by domain-skilled data operator.
The poster will present to you, how by a combination of domain knowledge, agile process, statistics, machine learning, and Python such quality assessment can be automatized. Specifically, it will be shown, how all these areas are applied to automatically recognize the most usual issue – shading on the sensor by near and far objects. The poster will show you a real-world data example and will walk you through steps of algorithm which has a recall of 81.2% and a false-positive rate of 2.5%. Finally, the possibilities of improvements as well as the challenges and lessons learned will be also discussed.
The developed shading recognition algorithm described in the poster reduces the work of the domain data operator from hours to minutes. Moreover, it opens a possibility for customers to assess the data quality on their own without a need for a domain data operator. The combination of this automatic data quality control with existing vitalization and data-handling tools (all coded in Python) make this product unique on the market.