October 21, 2019

3109 words 15 mins read

Paper Group AWR 154

Visual Interpretability for Deep Learning: a Survey. Multi-Stream Dynamic Video Summarization. Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization. Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF). Backprop Evolution. Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep …

Visual Interpretability for Deep Learning: a Survey


Title	Visual Interpretability for Deep Learning: a Survey
Authors	Quanshi Zhang, Song-Chun Zhu
Abstract	This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always the Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of low interpretability of their black-box representations. We believe that high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.
Tasks
Published	2018-02-02
URL	http://arxiv.org/abs/1802.00614v2
PDF	http://arxiv.org/pdf/1802.00614v2.pdf
PWC	https://paperswithcode.com/paper/visual-interpretability-for-deep-learning-a
Repo	https://github.com/JepsonWong/CNN_Visualization
Framework	none

Multi-Stream Dynamic Video Summarization


Title	Multi-Stream Dynamic Video Summarization
Authors	Mohamed Elfeki, Aidean Sharghi, Srikrishna Karanam, Ziyan Wu, Ali Borji
Abstract	With vast amounts of video content being uploaded to the Internet every minute, video summarization becomes critical for efficient browsing, searching, and indexing of visual content. Nonetheless, the spread of social and egocentric cameras creates an abundance of sparse scenarios captured by several devices, and ultimately required to be jointly summarized. In this paper, we discuss the problem of summarizing videos recorded simultaneously by several dynamic cameras that intermittently share the field of view. We present a robust framework that (a) identifies a diverse set of important events among moving cameras that often are not capturing the same scene, and (b) selects the most representative view(s) at each event to be included in a universal summary. Due to the lack of an applicable alternative, we collected a new multi-view egocentric dataset, Multi-Ego. Our dataset is recorded simultaneously by three cameras, covering a wide variety of real-life scenarios. The footage is annotated by multiple individuals under various summarization configurations, with a consensus analysis ensuring a reliable ground truth. We conduct extensive experiments on the compiled dataset in addition to three other standard benchmarks that show the robustness and the advantage of our approach in both supervised and unsupervised settings. Additionally, we show that our approach learns collectively from data of varied number-of-views and orthogonal to other summarization methods, deeming it scalable and generic. Our materials are made publicly available.
Tasks	Video Summarization
Published	2018-12-01
URL	https://arxiv.org/abs/1812.00108v3
PDF	https://arxiv.org/pdf/1812.00108v3.pdf
PWC	https://paperswithcode.com/paper/multi-view-egocentric-video-summarization
Repo	https://github.com/M-Elfeki/Multi-DPP
Framework	none

Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization


Title	Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization
Authors	Rishabh Iyer, Pratik Dubal, Kunal Dargan, Suraj Kothawade, Rohan Mahadev, Vishal Kaushal
Abstract	With increasing amounts of visual data being created in the form of videos and images, visual data selection and summarization are becoming ever increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data Selection and Summarization. Vis-DSS implements a framework of models for summarization and data subset selection using submodular functions, which are becoming increasingly popular today for these problems. We present several classes of models, capturing notions of diversity, coverage, representation and importance, along with optimization/inference and learning algorithms. Vis-DSS is the first open source toolkit for several Data selection and summarization tasks including Image Collection Summarization, Video Summarization, Training Data selection for Classification and Diversified Active Learning. We demonstrate state-of-the art performance on all these tasks, and also show how we can scale to large problems. Vis-DSS allows easy integration for applications to be built on it, also can serve as a general skeleton that can be extended to several use cases, including video and image sharing platforms for creating GIFs, image montage creation, or as a component to surveillance systems and we demonstrate this by providing a graphical user-interface (GUI) desktop app built over Qt framework. Vis-DSS is available at https://github.com/rishabhk108/vis-dss
Tasks	Active Learning, Video Summarization
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08846v1
PDF	http://arxiv.org/pdf/1809.08846v1.pdf
PWC	https://paperswithcode.com/paper/vis-dss-an-open-source-toolkit-for-visual
Repo	https://github.com/Pratik08/vis-dss
Framework	none

Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)


Title	Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)
Authors	Trefor W. Evans, Prasanth B. Nair
Abstract	We introduce a kernel approximation strategy that enables computation of the Gaussian process log marginal likelihood and all hyperparameter derivatives in $\mathcal{O}(p)$ time. Our GRIEF kernel consists of $p$ eigenfunctions found using a Nystrom approximation from a dense Cartesian product grid of inducing points. By exploiting algebraic properties of Kronecker and Khatri-Rao tensor products, computational complexity of the training procedure can be practically independent of the number of inducing points. This allows us to use arbitrarily many inducing points to achieve a globally accurate kernel approximation, even in high-dimensional problems. The fast likelihood evaluation enables type-I or II Bayesian inference on large-scale datasets. We benchmark our algorithms on real-world problems with up to two-million training points and $10^{33}$ inducing points.
Tasks	Bayesian Inference, Gaussian Processes
Published	2018-07-05
URL	http://arxiv.org/abs/1807.02125v2
PDF	http://arxiv.org/pdf/1807.02125v2.pdf
PWC	https://paperswithcode.com/paper/scalable-gaussian-processes-with-grid-1
Repo	https://github.com/treforevans/gp_grief
Framework	none

Backprop Evolution


Title	Backprop Evolution
Authors	Maximilian Alber, Irwan Bello, Barret Zoph, Pieter-Jan Kindermans, Prajit Ramachandran, Quoc Le
Abstract	The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation rules that maximize the generalization per- formance after a few epochs of training. We find several update equations that can train faster with short training times than standard back-propagation, and perform similar as standard back-propagation at convergence.
Tasks
Published	2018-08-08
URL	http://arxiv.org/abs/1808.02822v1
PDF	http://arxiv.org/pdf/1808.02822v1.pdf
PWC	https://paperswithcode.com/paper/backprop-evolution
Repo	https://github.com/mmajewsk/ml_research
Framework	none

Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep Neural Network Models


Title	Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep Neural Network Models
Authors	Rodolfo Quispe, Helio Pedrini
Abstract	Given a video or an image of a person acquired from a camera, person re-identification is the process of retrieving all instances of the same person from videos or images taken from a different camera with non-overlapping view. This task has applications in various fields, such as surveillance, forensics, robotics, multimedia. In this paper, we present a novel framework, named Saliency-Semantic Parsing Re-Identification (SSP-ReID), for taking advantage of the capabilities of both clues: saliency and semantic parsing maps, to guide a backbone convolutional neural network (CNN) to learn complementary representations that improves the results over the original backbones. The insight of fusing multiple clues is based on specific scenarios in which one response is better than another, thus favoring the combination of them to increase performance. Due to its definition, our framework can be easily applied to a wide variety of networks and, in contrast to other competitive methods, our training process follows simple and standard protocols. We present extensive evaluation of our approach through five backbones and three benchmarks. Experimental results demonstrate the effectiveness of our person re-identification framework. In addition, we combine our framework with re-ranking techniques to achieve state-of-the-art results on three benchmarks.
Tasks	Person Re-Identification
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05618v1
PDF	http://arxiv.org/pdf/1807.05618v1.pdf
PWC	https://paperswithcode.com/paper/improved-person-re-identification-based-on
Repo	https://github.com/RQuispeC/saliency-semantic-parsing-reid
Framework	pytorch

BlockQNN: Efficient Block-wise Neural Network Architecture Generation


Title	BlockQNN: Efficient Block-wise Neural Network Architecture Generation
Authors	Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Abstract	Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35% top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0% top-1 and 96.0% top-5 on ImageNet.
Tasks	Image Classification, Q-Learning
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05584v1
PDF	http://arxiv.org/pdf/1808.05584v1.pdf
PWC	https://paperswithcode.com/paper/blockqnn-efficient-block-wise-neural-network
Repo	https://github.com/gomerudo/nas-dmrl
Framework	tf

Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts


Title	Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts
Authors	Enrique Noriega-Atala, Paul D. Hein, Shraddha S. Thumsi, Zechy Wong, Xia Wang, Clayton T. Morrison
Abstract	We present an analysis of the problem of identifying biological context and associating it with biochemical events in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type and cell type that are associated with biochemical events. We describe the properties of an annotated corpus of context-event relations and present and evaluate several classifiers for context-event association trained on syntactic, distance and frequency features.
Tasks	Relation Extraction
Published	2018-12-14
URL	http://arxiv.org/abs/1812.06199v1
PDF	http://arxiv.org/pdf/1812.06199v1.pdf
PWC	https://paperswithcode.com/paper/inter-sentence-relation-extraction-for
Repo	https://github.com/ml4ai/BioContext
Framework	none

Incremental Learning in Person Re-Identification


Title	Incremental Learning in Person Re-Identification
Authors	Prajjwal Bhargava
Abstract	Person Re-Identification is still a challenging task in Computer Vision due to a variety of reasons. On the other side, Incremental Learning is still an issue since deep learning models tend to face the problem of over catastrophic forgetting when trained on subsequent tasks. In this paper, we propose a model that can be used for multiple tasks in Person Re-Identification, provide state-of-the-art results on a variety of tasks and still achieve considerable accuracy subsequently. We evaluated our model on two datasets Market 1501 and Duke MTMC. Extensive experiments show that this method can achieve Incremental Learning in Person ReID efficiently as well as for other tasks in computer vision as well.
Tasks	Person Re-Identification
Published	2018-08-20
URL	https://arxiv.org/abs/1808.06281v5
PDF	https://arxiv.org/pdf/1808.06281v5.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-in-person-re
Repo	https://github.com/prajjwal1/person-reid-incremental
Framework	pytorch

A formal framework for deliberated judgment


Title	A formal framework for deliberated judgment
Authors	Olivier Cailloux, Yves Meinard
Abstract	While the philosophical literature has extensively studied how decisions relate to arguments, reasons and justifications, decision theory almost entirely ignores the latter notions and rather focuses on preference and belief. In this article, we argue that decision theory can largely benefit from explicitly taking into account the stance that decision-makers take towards arguments and counter-arguments. To that end, we elaborate a formal framework aiming to integrate the role of arguments and argumentation in decision theory and decision aid. We start from a decision situation, where an individual requests decision support. In this context, we formally define, as a commendable basis for decision-aid, this individual’s deliberated judgment, popularized by Rawls. We explain how models of deliberated judgment can be validated empirically. We then identify conditions upon which the existence of a valid model can be taken for granted, and analyze how these conditions can be relaxed. We then explore the significance of our proposed framework for decision aiding practice. We argue that our concept of deliberated judgment owes its normative credentials both to its normative foundations (the idea of rationality based on arguments) and to its reference to empirical reality (the stance that real, empirical individuals hold towards arguments and counter-arguments, on due reflection). We then highlight that our framework opens promising avenues for future research involving both philosophical and decision theoretic approaches, as well as empirical implementations.
Tasks
Published	2018-01-17
URL	https://arxiv.org/abs/1801.05644v2
PDF	https://arxiv.org/pdf/1801.05644v2.pdf
PWC	https://paperswithcode.com/paper/a-formal-framework-for-deliberated-judgment
Repo	https://github.com/oliviercailloux/formal-framework-dj
Framework	none

Composable Unpaired Image to Image Translation


Title	Composable Unpaired Image to Image Translation
Authors	Laura Graesser, Anant Gupta
Abstract	There has been remarkable recent work in unpaired image-to-image translation. However, they’re restricted to translation on single pairs of distributions, with some exceptions. In this study, we extend one of these works to a scalable multidistribution translation mechanism. Our translation models not only converts from one distribution to another but can be stacked to create composite translation functions. We show that this composite property makes it possible to generate images with characteristics not seen in the training set. We also propose a decoupled training mechanism to train multiple distributions separately, which we show, generates better samples than isolated joint training. Further, we do a qualitative and quantitative analysis to assess the plausibility of the samples. The code is made available at https://github.com/lgraesser/im2im2im.
Tasks	Image-to-Image Translation
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05470v1
PDF	http://arxiv.org/pdf/1804.05470v1.pdf
PWC	https://paperswithcode.com/paper/composable-unpaired-image-to-image
Repo	https://github.com/lgraesser/im2im2im
Framework	pytorch

FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs


Title	FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
Authors	Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, Michaela Blott
Abstract	It is well known that many types of artificial neural networks, including recurrent networks, can achieve a high classification accuracy even with low-precision weights and activations. The reduction in precision generally yields much more efficient hardware implementations in regards to hardware cost, memory requirements, energy, and achievable throughput. In this paper, we present the first systematic exploration of this design space as a function of precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network. Specifically, we include an in-depth investigation of precision vs. accuracy using a fully hardware-aware training flow, where during training quantization of all aspects of the network including weights, input, output and in-memory cell activations are taken into consideration. In addition, hardware resource cost, power consumption and throughput scalability are explored as a function of precision for FPGA-based implementations of BiLSTM, and multiple approaches of parallelizing the hardware. We provide the first open source HLS library extension of FINN for parameterizable hardware architectures of LSTM layers on FPGAs which offers full precision flexibility and allows for parameterizable performance scaling offering different levels of parallelism within the architecture. Based on this library, we present an FPGA-based accelerator for BiLSTM neural network designed for optical character recognition, along with numerous other experimental proof points for a Zynq UltraScale+ XCZU7EV MPSoC within the given design space.
Tasks	Optical Character Recognition, Quantization
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04093v1
PDF	http://arxiv.org/pdf/1807.04093v1.pdf
PWC	https://paperswithcode.com/paper/finn-l-library-extensions-and-design-trade
Repo	https://github.com/Xilinx/LSTM-PYNQ
Framework	none

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost


Title	Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Authors	Noam Shazeer, Mitchell Stern
Abstract	In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-row and per-column sums of these moving averages, and estimating the per-parameter second moments based on these sums. We demonstrate empirically that this method produces similar results to the baseline. Secondly, we show that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow. We propose update clipping and a gradually increasing decay rate scheme as remedies. Combining these methods and dropping momentum, we achieve comparable results to the published Adam regime in training the Transformer model on the WMT 2014 English-German machine translation task, while using very little auxiliary storage in the optimizer. Finally, we propose scaling the parameter updates based on the scale of the parameters themselves.
Tasks	Machine Translation, Stochastic Optimization
Published	2018-04-11
URL	http://arxiv.org/abs/1804.04235v1
PDF	http://arxiv.org/pdf/1804.04235v1.pdf
PWC	https://paperswithcode.com/paper/adafactor-adaptive-learning-rates-with
Repo	https://github.com/DeadAt0m/adafactor-pytorch
Framework	pytorch

Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition


Title	Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition
Authors	Swathikiran Sudhakaran, Oswald Lanz
Abstract	In this paper we propose an end-to-end trainable deep neural network model for egocentric activity recognition. Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video. Based on this, we develop a spatial attention mechanism that enables the network to attend to regions containing objects that are correlated with the activity under consideration. We learn highly specialized attention maps for each frame using class-specific activations from a CNN pre-trained for generic image recognition, and use them for spatio-temporal encoding of the video with a convolutional LSTM. Our model is trained in a weakly supervised setting using raw video-level activity-class labels. Nonetheless, on standard egocentric activity benchmarks our model surpasses by up to +6% points recognition accuracy the currently best performing method that leverages hand segmentation and object location strong supervision for training. We visually analyze attention maps generated by the network, revealing that the network successfully identifies the relevant objects present in the video frames which may explain the strong recognition performance. We also discuss an extensive ablation analysis regarding the design choices.
Tasks	Activity Recognition, Egocentric Activity Recognition, Hand Segmentation
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11794v1
PDF	http://arxiv.org/pdf/1807.11794v1.pdf
PWC	https://paperswithcode.com/paper/attention-is-all-we-need-nailing-down-object
Repo	https://github.com/swathikirans/ego-rnn
Framework	pytorch

Blended Conditional Gradients: the unconditioning of conditional gradients


Title	Blended Conditional Gradients: the unconditioning of conditional gradients
Authors	Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright
Abstract	We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank–Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.
Tasks
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07311v3
PDF	https://arxiv.org/pdf/1805.07311v3.pdf
PWC	https://paperswithcode.com/paper/blended-conditional-gradients-the
Repo	https://github.com/pokutta/bcg
Framework	none