top of page

Position Statement on Reinforcement Learning Infrastructure and Google Brain's New Seed

Submitted by Sarmad Ahmad, Chief Managing Editor, The Indian Learning & Ananya Saraogi, Research Intern.

  • Amidst the rapid and bewildering global change observed that is due cause to the COVID-19 pandemic, AI and its subset of Machine Learning (ML) is still making developmental strides in its field of research, which showcases its utility and application across various sectors and industries in the unprecedented times that surround us today.

  • Google Brain, an AI and ML research and development lab funded by the tech company was born out of a collaboration between Jeff Dean and Andrew NG. While the group was primarily involved with simple ML algorithms, such as Natural Language Processing (NLP) and tasks involving perception, recent efforts have been directed towards much more intricate and complex algorithmic structures, such as Deep Learning and Reinforcement Learning (RL). This was usually the domain of DeepMind, the RL start-up acquired by Google in 2014 that was responsible for creating the Alpha-GO AI which succeeded in defeating the world champion of the board game GO.

  • Herein, it is important to emphasise the differences between various ML algorithms on the basis of the complexity of their structures. While simple ML algorithms process input data and generate output data based upon the rules that they have been trained with (such as an NLP algorithm translating English into French with an intricate observation of the rules of grammar, syntax, pronunciation and semantics), more complex algorithmic structures like Deep Learning (DL) and RL, are both autonomous decision-making systems with the objective of solving the task given to them at hand.

  • How these two algorithms approach these tasks are what give them their distinctions. A DL algorithm does this by learning to maximise the efficiency of their predictions. It does this by learning an existing dataset in order to categorise and cluster patterns from it, which it can then apply on a new dataset. An RL algorithm, however, approaches the problem through a trial and error method with the purpose of achieving an end reward, wherein instead of being trained with a primary dataset, it is actively engaged with the data and the problem, making itself efficient in the process by learning what it is not to do through its errors.

  • Researchers at Google Brain have recently open-sourced and made available their Scalable, Efficient Deep Reinforcement Learning Algorithm (SEED-RL) to the general programming public. AI systems and their subset of ML are still very much intangible software, heavily reliant on the development of effective and well-performing hardware that make their foundations. What makes the SEED-RL algorithm different in comparison to its existing brethren of RL algorithms, is that it is a distributed architecture that achieves state-of-the-art results, at a lower cost and up to 80x faster than previous systems, while performing well across a variety of RL benchmarks.

  • The reason why these developments revolutionise the process is because RL algorithms are time-consuming and heavy on the pocket to train. Current models attempt to mitigate much of this inefficiency dividing the system into one centralized learner and multiple actors. The actors and the learner all harbour a copy of the same algorithm. The acting algorithms periodically engage themselves with the tasks at hand in a stimulation. They then send their experience; data of learnings acquired from the repeated engagement, chosen actions and the result of those actions back to the learner, which updates the parameters of the shared neural network. The actors periodically refresh their copy of the algorithm to be on par with the latest version of the centralised learner.

  • The SEED RL model instead uses the centralized learner for both network training and inference. This eliminates the need to send algorithmic parameters to the actors, and the learner can use hardware accelerators such as GPUs and TPUs to improve both learning and inference performance, in turn making the entire process faster, effective and more efficient.

  • While the ripple effect of this change is not dramatic in the short term, it poses a lasting effect on DL systems in the long term, by dramatically increasing the efficiency of these systems and making the requirement of computing hardware lesser, both of which contribute towards lesser costs of running and maintaining DL systems. Arguably, this minute variance is the observance of Moore’s law in play on a very small scale; that states that computing power would increase with decreases in its costs at an exponential pace over time.

  • SEED RL has been by far the best algorithm brought forth by Google Brain. The architecture has by far surpassed the DeepMind Lab-created algorithm with a speed (in technical language frames) greater than any algorithm invented. This branch of AI has demonstrated the efficient working of the technology on the basis of the central neural system.

  • Let us imagine playing an online game where the player’s neural actions create an environment for the AI to develop a player which could completely defeat or in the language of the article ‘destroy’. The author has tried to highlight the complexities of this technology with certain flaws.

  • The open-sourcing of the algorithm has failed to determine the risks that would concur on this act. There is no guarantee that such a technology that has the capacity to imbibe the qualities of the actor based on the environment set through his or her neural systems, would not cause any damage to the users. There is no validity to the fact that it could not be used openly in the market and be misused by the developers.

To understand this development, please refer to:

For queries, do not hesitate to mail us at


The Indian Society of Artificial Intelligence and Law is a technology law think tank founded by Abhivardhan in 2018. Our mission as a non-profit industry body for the analytics & AI industry in India is to promote responsible development of artificial intelligence and its standardisation in India.


Since 2022, the research operations of the Society have been subsumed under VLiGTA® by Indic Pacific Legal Research.

ISAIL has supported two independent journals, namely - the Indic Journal of International Law and the Indian Journal of Artificial Intelligence and Law. It also supports an independent media and podcast initiative - The Bharat Pacific.

bottom of page