Keynote Speakers
Energy Efficiency in Large Scale Information Retrieval Systems
A large part of the energy consumption of a data center could be accounted to inefficiencies in its cooling and power supply systems. However, search companies already adopt state-of-the art techniques to reduce the energy wastage of such systems, leaving little room for more improvements in those areas. Therefore, new approaches are necessary to mitigate the environmental impact and the energy expenditure of Web search engines.
In this talk we will address the reduction of the energy consumption of computing resources to mitigate the energy expenditure and carbon footprint of a IR system. In particular, reducing the energy consumption of CPUs represents an attractive venue for Web search engines. Currently, CPU cores frequencies are typically managed by operating system components, called frequency governors. We will discuss how to delegate the CPU power management from the OS frequency governors to the query processing application. Such IR system-specific governors can reduce up to 24% a server power consumption, with only limited (but uncontrollable) drawbacks in the quality of search results with respect to a system running at maximum CPU frequency.
Since users can hardly notice response times that are faster than their expectations we advise that Web search engine should not process queries faster than user expectations and, consequently, we will present the Predictive Energy Saving Online Scheduling (PESOS) algorithm, to select the most appropriate CPU frequency to process a query by its deadline, on a per-core basis. PESOS can reduce the CPU energy consumption of a query processing server from 24% up to 48% when compared to an high performance system running at maximum CPU core frequency. To conclude, we will compare the PESOS performance w.r.t. an industry-level baseline, called PEGASUS, on a realistic simulation of a distributed Web search engine. Our results show that PESOS can reduce the CPU energy consumption of a distributed WSE by up to 18% with respect to PEGASUS, while providing query response times which are in line with user expectations.
In this talk we will address the reduction of the energy consumption of computing resources to mitigate the energy expenditure and carbon footprint of a IR system. In particular, reducing the energy consumption of CPUs represents an attractive venue for Web search engines. Currently, CPU cores frequencies are typically managed by operating system components, called frequency governors. We will discuss how to delegate the CPU power management from the OS frequency governors to the query processing application. Such IR system-specific governors can reduce up to 24% a server power consumption, with only limited (but uncontrollable) drawbacks in the quality of search results with respect to a system running at maximum CPU frequency.
Since users can hardly notice response times that are faster than their expectations we advise that Web search engine should not process queries faster than user expectations and, consequently, we will present the Predictive Energy Saving Online Scheduling (PESOS) algorithm, to select the most appropriate CPU frequency to process a query by its deadline, on a per-core basis. PESOS can reduce the CPU energy consumption of a query processing server from 24% up to 48% when compared to an high performance system running at maximum CPU core frequency. To conclude, we will compare the PESOS performance w.r.t. an industry-level baseline, called PEGASUS, on a realistic simulation of a distributed Web search engine. Our results show that PESOS can reduce the CPU energy consumption of a distributed WSE by up to 18% with respect to PEGASUS, while providing query response times which are in line with user expectations.
Bio
Nicola Tonellotto (http://pomino.isti.cnr.it/~khast/) is a researcher at National Research Council of Italy. He received his Ph.D. from the Information Engineering Department of the University of Pisa in 2008. His main research interests include Cloud computing and Web information retrieval. He published over 50 papers in journals and proceedings of international conferences. He received the Best Paper Award at ACM SIGIR in 2015.
Fast and Efficient Auto-Regressive Inference
We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN.
We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency.
We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency.
Bio
Erich Elsen is a Research Scientist at Google Brain where he collaborates with the Magenta team and DeepMind. He is generally interested in auto-regressive models, sparse neural networks, optimization, generative models (especially of music) and hardware. Recently he combined those interests to enable extremely efficient WaveNet quality Text-to-Speech with sparse WaveRNNs. He also created the world's best piano transcription model Onsets and Frames. Prior to joining Google he developed the DeepSpeech and DeepSpeech 2 speech transcription systems at Baidu's Silicon Valley AI Lab. A long time ago he graduated with a PhD in Mechanical Engineering from Stanford in 2009.