SPEAKERS.
Song Han, Assistant Prof. at MIT Hardware Intelligence Lab, and Deephi
Talk: Bandwidth efficient deep learning by model compression
Abstract: In the post-ImageNet era, computer vision and machine learning researchers are solving more complicated AI problems using larger datasets driving the demand for more computation. However, we are in the post-Moore’s Law world where the amount of computation per unit cost and power is no longer increasing at its historic rate. This mismatch between supply and demand for computation highlights the need for co-designing efficient algorithms and hardware. In this talk, I will talk about bandwidth efficient deep learning by model compression, together with efficient hardware architecture support, saving memory bandwidth, networking bandwidth, and engineer bandwidth.
Shan Liu, Distinguished Scientist and Vice President of Tencent Media Lab
​
Talk: Deep neural networks for multimedia processing, coding and standardization
Abstract: Deep neural networks have proven to be an effective approach for many applications such as in computer vision, data mining and so on. Recent advances show that they may also be helpful for video, image and rich media processing and compression. This talk will introduce some work being carried in Tencent Media Lab which utilize deep neural networks to help improving audio/visual experiences, and some recent advances and on-going activities of using neural networks to help video compression in ITU-T and ISO/IEC video coding standard development and in general. Opportunities and challenges will be discussed from the perspective of industry applications.
Christos Louizos, PhD candidate at University of Amsterdam
​
Talk: Network compression via differentiable pruning and quantization
Abstract: neural network compression has become an important research area due to its great impact on deployment of large models on resource constrained devices. In this talk, we will introduce two novel techniques that allow for differentiable sparsification and quantization of deep neural networks; both of these are achieved via appropriate smoothing of the overall objective. As a result, we can directly train architectures to be highly compressed and hardware-friendly via off-the-self stochastic gradient descent optimizers.
Jian Cheng, Professor, National Laboratory of Pattern Recognition, Institute of Automation , Chinese Academy of Sciences
Talk: Efficient Computation of Deep Convolutional Neural Networks: A Quantization Perspective
Abstract: Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or resource-limited devices. It is becoming a critical issue how to efficiently compute those networks, such as acceleration, compression. In this talk, we will first provide a brief introduction to network acceleration and compression, and then emphasize the efficient computation by quantization approach. Finally, we will introduce and discuss a few possible future directions.
Anbang Yao, Senior Staff Research Scientist at Intel Labs China
Talk: Deep neural network compression and acceleration
Abstract: In the past several years, Deep Neural Networks (DNNs) have demonstrated record-breaking accuracy on a variety of artificial intelligence tasks. However, the intensive storage and computational costs of DNN models make it difficult to deploy them on the mobile and embedded systems for real-time applications. In this technical talk, Dr. Yao will introduce their recent works on deep neural network compression and acceleration, showing how they achieve impressive compression performance without noticeable loss of model prediction accuracy, from the perspective of pruning and quantization.
Dilip Sequeira, Principal Engineer TensorRT, NVIDIA
Talk: TBD
Tim Genewein, research scientist at Bosch Center for AI
Talk: Neural network compression in the wild: why aiming for high compression factors is not enough
Abstract: the widespread use of state-of-the-art deep neural network models in the mobile, automotive and embedded domains is often hindered by the steep computational resources that are required for running such models. However, the recent scientific literature proposes a plethora of of ways to alleviate the problem, either on the level of efficient network architectures, efficiency-optimized hardware or via network compression methods. Unfortunately, the usefulness of a network compression method strongly depends on the other aspects (network architecture and target hardware) as well as the task itself (classification, regression, detection, etc.), but very few publications consider this interplay. This talk highlights some of the issues that arise from the strong interplay between network architecture, target hardware, compression algorithm and target task. Additionally some shortcomings in the current literature on network compression methods are pointed-out, such as incomparability of results (different base-line networks, different training-/data-augmentation schemes, etc.), lack of results on tasks other than classification, or use of very different (and perhaps not very informative) quantitative performance indicators such as naive compression rate, operations-per-second, size of stored weight matrices, etc. The talk concludes by proposing some guidelines and best-practices for increasing practical applicability of network compression methods and a call for standardizing network compression benchmarks.