On the Feasibility of Reinforcement Learning in Single- and Multi-Agent Systems : The Cases of Indoor Climate and Prosumer Electricity Trading Communities

Abstract: Over half of the world’s population live in urban areas, a trend which is expected to only grow as we move further into the future. With this increasing trend in urbanisation, challenges are presented in the form of the management of urban infrastructure systems. As an essential infrastructure of any city, the energy system presents itself as one of the biggest challenges. Indeed, as cities expand in population and economically, global energy consumption increases, and as a result, so do greenhouse gas (GHG) emissions. Key to realising the goals as laid out by the 2030 Agenda for Sustainable Development, is the energy transition - embodied in the goals pertaining to affordable and clean energy, sustainable cities and communities, and climate action. Renewable energy systems (RESs) and energy efficiency have been shown as key strategies towards achieving these goals. While the building sector is considered to be one of the biggest contributors to climate change, it is also seen as an area with many opportunities for realising the energy transition. Indeed, the emergence of the smart city and the internet of things (IoT), alongside Photovoltaic and battery technology, offers opportunities for both the smart management of buildings, as well as the opportunity to form self-sufficient peer-to-peer (P2P) electricity trading communities. Within this context, advanced building control offers significant potential for mitigating global warming, grid instability, soaring energy costs, and exposure to poor indoor building climates. Most advanced control strategies, however, rely on complex mathematical models, which require a great deal of expertise to construct, thereby costing in time and money, and are unlikely to be frequently updated - which can lead to sub-optimal or even wrong performance. Furthermore, arriving at solutions in economic settings as complex and dynamic as the P2P electricity markets referred to above, often leads to solutions that are computationally intractable. A model-based approach thus seems, as alluded to above, unsustainable, and I thus propose taking a model-free alternative instead. One such alternative is the reinforcement learning (RL) method. This method provides a beautiful solution that addresses many of the limitations seen in more classical approaches - those based on complex mathematical models - to single- and multi-agent systems. To address the feasibility of RL in the context of building systems, I have developed four papers. In studying the literature, while there is much review work in support of RL for controlling energy consumption, it was found that there were no such works analysing RL from a methodological perspective w.r.t. controlling the comfort level of building occupants. Thus, in Paper I, to fill in this gap in knowledge, a comprehensive review in this area was carried out. To follow up, in Paper II, a case study was conducted to further assess, among other things, the computational feasibility of RL for controlling occupant comfort in a single agent context. It was found that the RL method was able to improve thermal and indoor air quality by more than 90% when compared with historically observed occupant data. Broadening the scope of RL, Papers III and IV considered the feasibility of RL at the district scale by considering the efficient trade of renewable electricity in a peer-to-peer prosumer energy market. In particular, in Paper III, by extending an open source economic simulation framework, multi-agent reinforcement learning (MARL) was used to optimise a dynamic price policy for trading the locally produced electricity. Compared with a benchmark fixed price signal, the dynamic price mechanism arrived at by RL, increased community net profit by more than 28%, and median community self-sufficiency by more than 2%. Furthermore, emergent social-economic behaviours such as changes in supply w.r.t changes in price were identified. A limitation of Paper III, however, is that it was conducted in a single environment. To address this limitation and to assess the general validity of the proposed MARL-solution, in Paper IV a full factorial experiment based on the factors of climate - manifested in heterogeneous demand/supply profiles and associated battery parameters, community scale, and price mechanism, was conducted in order to ascertain the response of the community w.r.t net-loss (financial gain), self-sufficiency, and income equality from trading locally produced electricity. The central finding of Paper IV was that the community, w.r.t net-loss, performs significantly better under a learned dynamic price mechanism than under the benchmark fixed price mechanism, and furthermore, a community under such a dynamic price mechanism stands an odds of 2 to 1 in increased financial savings. 

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)