Paper Group AWR 154
Visual Interpretability for Deep Learning: a Survey. Multi-Stream Dynamic Video Summarization. Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization. Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF). Backprop Evolution. Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep …
Visual Interpretability for Deep Learning: a Survey
Title | Visual Interpretability for Deep Learning: a Survey |
Authors | Quanshi Zhang, Song-Chun Zhu |
Abstract | This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always the Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of low interpretability of their black-box representations. We believe that high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence. |
Tasks | |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00614v2 |
http://arxiv.org/pdf/1802.00614v2.pdf | |
PWC | https://paperswithcode.com/paper/visual-interpretability-for-deep-learning-a |
Repo | https://github.com/JepsonWong/CNN_Visualization |
Framework | none |
Multi-Stream Dynamic Video Summarization
Title | Multi-Stream Dynamic Video Summarization |
Authors | Mohamed Elfeki, Aidean Sharghi, Srikrishna Karanam, Ziyan Wu, Ali Borji |
Abstract | With vast amounts of video content being uploaded to the Internet every minute, video summarization becomes critical for efficient browsing, searching, and indexing of visual content. Nonetheless, the spread of social and egocentric cameras creates an abundance of sparse scenarios captured by several devices, and ultimately required to be jointly summarized. In this paper, we discuss the problem of summarizing videos recorded simultaneously by several dynamic cameras that intermittently share the field of view. We present a robust framework that (a) identifies a diverse set of important events among moving cameras that often are not capturing the same scene, and (b) selects the most representative view(s) at each event to be included in a universal summary. Due to the lack of an applicable alternative, we collected a new multi-view egocentric dataset, Multi-Ego. Our dataset is recorded simultaneously by three cameras, covering a wide variety of real-life scenarios. The footage is annotated by multiple individuals under various summarization configurations, with a consensus analysis ensuring a reliable ground truth. We conduct extensive experiments on the compiled dataset in addition to three other standard benchmarks that show the robustness and the advantage of our approach in both supervised and unsupervised settings. Additionally, we show that our approach learns collectively from data of varied number-of-views and orthogonal to other summarization methods, deeming it scalable and generic. Our materials are made publicly available. |
Tasks | Video Summarization |
Published | 2018-12-01 |
URL | https://arxiv.org/abs/1812.00108v3 |
https://arxiv.org/pdf/1812.00108v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-egocentric-video-summarization |
Repo | https://github.com/M-Elfeki/Multi-DPP |
Framework | none |
Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization
Title | Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization |
Authors | Rishabh Iyer, Pratik Dubal, Kunal Dargan, Suraj Kothawade, Rohan Mahadev, Vishal Kaushal |
Abstract | With increasing amounts of visual data being created in the form of videos and images, visual data selection and summarization are becoming ever increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data Selection and Summarization. Vis-DSS implements a framework of models for summarization and data subset selection using submodular functions, which are becoming increasingly popular today for these problems. We present several classes of models, capturing notions of diversity, coverage, representation and importance, along with optimization/inference and learning algorithms. Vis-DSS is the first open source toolkit for several Data selection and summarization tasks including Image Collection Summarization, Video Summarization, Training Data selection for Classification and Diversified Active Learning. We demonstrate state-of-the art performance on all these tasks, and also show how we can scale to large problems. Vis-DSS allows easy integration for applications to be built on it, also can serve as a general skeleton that can be extended to several use cases, including video and image sharing platforms for creating GIFs, image montage creation, or as a component to surveillance systems and we demonstrate this by providing a graphical user-interface (GUI) desktop app built over Qt framework. Vis-DSS is available at https://github.com/rishabhk108/vis-dss |
Tasks | Active Learning, Video Summarization |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08846v1 |
http://arxiv.org/pdf/1809.08846v1.pdf | |
PWC | https://paperswithcode.com/paper/vis-dss-an-open-source-toolkit-for-visual |
Repo | https://github.com/Pratik08/vis-dss |
Framework | none |
Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)
Title | Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF) |
Authors | Trefor W. Evans, Prasanth B. Nair |
Abstract | We introduce a kernel approximation strategy that enables computation of the Gaussian process log marginal likelihood and all hyperparameter derivatives in $\mathcal{O}(p)$ time. Our GRIEF kernel consists of $p$ eigenfunctions found using a Nystrom approximation from a dense Cartesian product grid of inducing points. By exploiting algebraic properties of Kronecker and Khatri-Rao tensor products, computational complexity of the training procedure can be practically independent of the number of inducing points. This allows us to use arbitrarily many inducing points to achieve a globally accurate kernel approximation, even in high-dimensional problems. The fast likelihood evaluation enables type-I or II Bayesian inference on large-scale datasets. We benchmark our algorithms on real-world problems with up to two-million training points and $10^{33}$ inducing points. |
Tasks | Bayesian Inference, Gaussian Processes |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.02125v2 |
http://arxiv.org/pdf/1807.02125v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-gaussian-processes-with-grid-1 |
Repo | https://github.com/treforevans/gp_grief |
Framework | none |
Backprop Evolution
Title | Backprop Evolution |
Authors | Maximilian Alber, Irwan Bello, Barret Zoph, Pieter-Jan Kindermans, Prajit Ramachandran, Quoc Le |
Abstract | The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation rules that maximize the generalization per- formance after a few epochs of training. We find several update equations that can train faster with short training times than standard back-propagation, and perform similar as standard back-propagation at convergence. |
Tasks | |
Published | 2018-08-08 |
URL | http://arxiv.org/abs/1808.02822v1 |
http://arxiv.org/pdf/1808.02822v1.pdf | |
PWC | https://paperswithcode.com/paper/backprop-evolution |
Repo | https://github.com/mmajewsk/ml_research |
Framework | none |
Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep Neural Network Models
Title | Improved Person Re-Identification Based on Saliency and Semantic Parsing with Deep Neural Network Models |
Authors | Rodolfo Quispe, Helio Pedrini |
Abstract | Given a video or an image of a person acquired from a camera, person re-identification is the process of retrieving all instances of the same person from videos or images taken from a different camera with non-overlapping view. This task has applications in various fields, such as surveillance, forensics, robotics, multimedia. In this paper, we present a novel framework, named Saliency-Semantic Parsing Re-Identification (SSP-ReID), for taking advantage of the capabilities of both clues: saliency and semantic parsing maps, to guide a backbone convolutional neural network (CNN) to learn complementary representations that improves the results over the original backbones. The insight of fusing multiple clues is based on specific scenarios in which one response is better than another, thus favoring the combination of them to increase performance. Due to its definition, our framework can be easily applied to a wide variety of networks and, in contrast to other competitive methods, our training process follows simple and standard protocols. We present extensive evaluation of our approach through five backbones and three benchmarks. Experimental results demonstrate the effectiveness of our person re-identification framework. In addition, we combine our framework with re-ranking techniques to achieve state-of-the-art results on three benchmarks. |
Tasks | Person Re-Identification |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05618v1 |
http://arxiv.org/pdf/1807.05618v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-person-re-identification-based-on |
Repo | https://github.com/RQuispeC/saliency-semantic-parsing-reid |
Framework | pytorch |
BlockQNN: Efficient Block-wise Neural Network Architecture Generation
Title | BlockQNN: Efficient Block-wise Neural Network Architecture Generation |
Authors | Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu |
Abstract | Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained to choose component layers sequentially. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2.35% top-1 error rate on CIFAR-10. (2) it offers tremendous reduction of the search space in designing networks, spending only 3 days with 32 GPUs. A faster version can yield a comparable result with only 1 GPU in 20 hours. (3) it has strong generalizability in that the network built on CIFAR also performs well on the larger-scale dataset. The best network achieves very competitive accuracy of 82.0% top-1 and 96.0% top-5 on ImageNet. |
Tasks | Image Classification, Q-Learning |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05584v1 |
http://arxiv.org/pdf/1808.05584v1.pdf | |
PWC | https://paperswithcode.com/paper/blockqnn-efficient-block-wise-neural-network |
Repo | https://github.com/gomerudo/nas-dmrl |
Framework | tf |
Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts
Title | Inter-sentence Relation Extraction for Associating Biological Context with Events in Biomedical Texts |
Authors | Enrique Noriega-Atala, Paul D. Hein, Shraddha S. Thumsi, Zechy Wong, Xia Wang, Clayton T. Morrison |
Abstract | We present an analysis of the problem of identifying biological context and associating it with biochemical events in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type and cell type that are associated with biochemical events. We describe the properties of an annotated corpus of context-event relations and present and evaluate several classifiers for context-event association trained on syntactic, distance and frequency features. |
Tasks | Relation Extraction |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.06199v1 |
http://arxiv.org/pdf/1812.06199v1.pdf | |
PWC | https://paperswithcode.com/paper/inter-sentence-relation-extraction-for |
Repo | https://github.com/ml4ai/BioContext |
Framework | none |
Incremental Learning in Person Re-Identification
Title | Incremental Learning in Person Re-Identification |
Authors | Prajjwal Bhargava |
Abstract | Person Re-Identification is still a challenging task in Computer Vision due to a variety of reasons. On the other side, Incremental Learning is still an issue since deep learning models tend to face the problem of over catastrophic forgetting when trained on subsequent tasks. In this paper, we propose a model that can be used for multiple tasks in Person Re-Identification, provide state-of-the-art results on a variety of tasks and still achieve considerable accuracy subsequently. We evaluated our model on two datasets Market 1501 and Duke MTMC. Extensive experiments show that this method can achieve Incremental Learning in Person ReID efficiently as well as for other tasks in computer vision as well. |
Tasks | Person Re-Identification |
Published | 2018-08-20 |
URL | https://arxiv.org/abs/1808.06281v5 |
https://arxiv.org/pdf/1808.06281v5.pdf | |
PWC | https://paperswithcode.com/paper/incremental-learning-in-person-re |
Repo | https://github.com/prajjwal1/person-reid-incremental |
Framework | pytorch |
A formal framework for deliberated judgment
Title | A formal framework for deliberated judgment |
Authors | Olivier Cailloux, Yves Meinard |
Abstract | While the philosophical literature has extensively studied how decisions relate to arguments, reasons and justifications, decision theory almost entirely ignores the latter notions and rather focuses on preference and belief. In this article, we argue that decision theory can largely benefit from explicitly taking into account the stance that decision-makers take towards arguments and counter-arguments. To that end, we elaborate a formal framework aiming to integrate the role of arguments and argumentation in decision theory and decision aid. We start from a decision situation, where an individual requests decision support. In this context, we formally define, as a commendable basis for decision-aid, this individual’s deliberated judgment, popularized by Rawls. We explain how models of deliberated judgment can be validated empirically. We then identify conditions upon which the existence of a valid model can be taken for granted, and analyze how these conditions can be relaxed. We then explore the significance of our proposed framework for decision aiding practice. We argue that our concept of deliberated judgment owes its normative credentials both to its normative foundations (the idea of rationality based on arguments) and to its reference to empirical reality (the stance that real, empirical individuals hold towards arguments and counter-arguments, on due reflection). We then highlight that our framework opens promising avenues for future research involving both philosophical and decision theoretic approaches, as well as empirical implementations. |
Tasks | |
Published | 2018-01-17 |
URL | https://arxiv.org/abs/1801.05644v2 |
https://arxiv.org/pdf/1801.05644v2.pdf | |
PWC | https://paperswithcode.com/paper/a-formal-framework-for-deliberated-judgment |
Repo | https://github.com/oliviercailloux/formal-framework-dj |
Framework | none |
Composable Unpaired Image to Image Translation
Title | Composable Unpaired Image to Image Translation |
Authors | Laura Graesser, Anant Gupta |
Abstract | There has been remarkable recent work in unpaired image-to-image translation. However, they’re restricted to translation on single pairs of distributions, with some exceptions. In this study, we extend one of these works to a scalable multidistribution translation mechanism. Our translation models not only converts from one distribution to another but can be stacked to create composite translation functions. We show that this composite property makes it possible to generate images with characteristics not seen in the training set. We also propose a decoupled training mechanism to train multiple distributions separately, which we show, generates better samples than isolated joint training. Further, we do a qualitative and quantitative analysis to assess the plausibility of the samples. The code is made available at https://github.com/lgraesser/im2im2im. |
Tasks | Image-to-Image Translation |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05470v1 |
http://arxiv.org/pdf/1804.05470v1.pdf | |
PWC | https://paperswithcode.com/paper/composable-unpaired-image-to-image |
Repo | https://github.com/lgraesser/im2im2im |
Framework | pytorch |
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
Title | FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs |
Authors | Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, Michaela Blott |
Abstract | It is well known that many types of artificial neural networks, including recurrent networks, can achieve a high classification accuracy even with low-precision weights and activations. The reduction in precision generally yields much more efficient hardware implementations in regards to hardware cost, memory requirements, energy, and achievable throughput. In this paper, we present the first systematic exploration of this design space as a function of precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network. Specifically, we include an in-depth investigation of precision vs. accuracy using a fully hardware-aware training flow, where during training quantization of all aspects of the network including weights, input, output and in-memory cell activations are taken into consideration. In addition, hardware resource cost, power consumption and throughput scalability are explored as a function of precision for FPGA-based implementations of BiLSTM, and multiple approaches of parallelizing the hardware. We provide the first open source HLS library extension of FINN for parameterizable hardware architectures of LSTM layers on FPGAs which offers full precision flexibility and allows for parameterizable performance scaling offering different levels of parallelism within the architecture. Based on this library, we present an FPGA-based accelerator for BiLSTM neural network designed for optical character recognition, along with numerous other experimental proof points for a Zynq UltraScale+ XCZU7EV MPSoC within the given design space. |
Tasks | Optical Character Recognition, Quantization |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04093v1 |
http://arxiv.org/pdf/1807.04093v1.pdf | |
PWC | https://paperswithcode.com/paper/finn-l-library-extensions-and-design-trade |
Repo | https://github.com/Xilinx/LSTM-PYNQ |
Framework | none |
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Title | Adafactor: Adaptive Learning Rates with Sublinear Memory Cost |
Authors | Noam Shazeer, Mitchell Stern |
Abstract | In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-row and per-column sums of these moving averages, and estimating the per-parameter second moments based on these sums. We demonstrate empirically that this method produces similar results to the baseline. Secondly, we show that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow. We propose update clipping and a gradually increasing decay rate scheme as remedies. Combining these methods and dropping momentum, we achieve comparable results to the published Adam regime in training the Transformer model on the WMT 2014 English-German machine translation task, while using very little auxiliary storage in the optimizer. Finally, we propose scaling the parameter updates based on the scale of the parameters themselves. |
Tasks | Machine Translation, Stochastic Optimization |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.04235v1 |
http://arxiv.org/pdf/1804.04235v1.pdf | |
PWC | https://paperswithcode.com/paper/adafactor-adaptive-learning-rates-with |
Repo | https://github.com/DeadAt0m/adafactor-pytorch |
Framework | pytorch |
Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition
Title | Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition |
Authors | Swathikiran Sudhakaran, Oswald Lanz |
Abstract | In this paper we propose an end-to-end trainable deep neural network model for egocentric activity recognition. Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video. Based on this, we develop a spatial attention mechanism that enables the network to attend to regions containing objects that are correlated with the activity under consideration. We learn highly specialized attention maps for each frame using class-specific activations from a CNN pre-trained for generic image recognition, and use them for spatio-temporal encoding of the video with a convolutional LSTM. Our model is trained in a weakly supervised setting using raw video-level activity-class labels. Nonetheless, on standard egocentric activity benchmarks our model surpasses by up to +6% points recognition accuracy the currently best performing method that leverages hand segmentation and object location strong supervision for training. We visually analyze attention maps generated by the network, revealing that the network successfully identifies the relevant objects present in the video frames which may explain the strong recognition performance. We also discuss an extensive ablation analysis regarding the design choices. |
Tasks | Activity Recognition, Egocentric Activity Recognition, Hand Segmentation |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11794v1 |
http://arxiv.org/pdf/1807.11794v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-is-all-we-need-nailing-down-object |
Repo | https://github.com/swathikirans/ego-rnn |
Framework | pytorch |
Blended Conditional Gradients: the unconditioning of conditional gradients
Title | Blended Conditional Gradients: the unconditioning of conditional gradients |
Authors | Gábor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright |
Abstract | We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank–Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex. |
Tasks | |
Published | 2018-05-18 |
URL | https://arxiv.org/abs/1805.07311v3 |
https://arxiv.org/pdf/1805.07311v3.pdf | |
PWC | https://paperswithcode.com/paper/blended-conditional-gradients-the |
Repo | https://github.com/pokutta/bcg |
Framework | none |