What’s Labelling Got To Do With It?
It is well known that training machine learning models under supervised conditions requires significant data that is labelled in a systematic way. Once operational, these models can provide significant value to water utilities and communities.
Though, what may not be so clear is the immediate benefits of labelling your water data. This post endeavours to shed some light on this topic. But first, a brief overview of data collection on water and wastewater networks.
Over the last 20 years the value of SCADA monitoring data has primarily been used in real-time applications. For example, it has proved invaluable during emergency storm situations and when water networks are stressed due to peak day demand events. The introduction of cheap IoT sensors is increasing real time data collection and has already been especially useful for:
notification of high flows linked to water pipe failure; and
notification of flooding in wastewater manholes, amongst other things.
These real-time applications offer great value to utilities and improve outcomes for communities. But so too does the 20 plus years of historical data which is often overlooked during the decision-making process.
The primary reasons for this are difficulties in accessing data historians and the messy state of the historical data. Engineers usually use spreadsheets for everyday analysis, but spreadsheets are fundamentally limited for cleaning large data sets (link to blog).
New data cleaning tools such as FSA Data’s SensorClean will open the many possibilities of productive use of this messy data.
What are some use cases for labelled historical water data?
We may want to estimate non-revenue water loss from corroding cast iron pipe failures. Or perhaps ongoing leakage in the early hours of the morning. Or we might ask: what are the ten highest peak hour volumes in the record? And there are other possibilities for pump, pressure and reservoir data.
To use this information effectively for a range of use cases, we need to systematically label the critical events so that we can exclude or include them in analyses depending on the use case.
For example, we don’t want to include data describing pipe failures in peak hour volume assessments. But we do want to include them if we’re interested in non-revenue water loss.
There are many other use cases in potable water, too, where decision-making can strongly benefit from an organised history of critical events. For example, it allows analysis of changing consumption behaviours and network performance trends.
What about labelling historical wastewater data?
Utilities’ operating licences are often aimed at limiting the impacts of dry weather sewage overflows. A common approach to reduce the impacts of these overflows is to set a normal range for dry weather flows. If the bound of the normal range is transgressed, a crew might be sent to check for a blockage.
If there is no blockage (a false-positive), the cost of the activity is the cost of the maintenance crew. On the other hand, if the normal range is set too high, a blockage can escape detection and (generally much higher) costs will ensue. Especially, if wastewater overflow occurs in an environmentally sensitive area.
If we label historical wastewater data, we can capture both the normal dry weather range and critical rainfall events outside the range. For example, analysis of pump station performance during these critical events (when networks flood) can lead to better decisions on replacements and upgrades of impeller and pump stations respectively.
In summary…
Systematic labelling of historical water data can lead to significantly improved economic and environmental outcomes for utilities and their communities. FSA Data’s SensorClean software will facilitate quick cleaning and labelling of water data.
Please feel free to contact us for more information.