Paper Group AWR 217
A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs. ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks). Improving Neural Machine Translation with Parent-Scaled Self-Attention. Pareto Multi-Task Lea …
A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs
Title | A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs |
Authors | Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny |
Abstract | The visual system is hierarchically organized to process visual information in successive stages. Neural representations vary drastically across the first stages of visual processing: at the output of the retina, ganglion cell receptive fields (RFs) exhibit a clear antagonistic center-surround structure, whereas in the primary visual cortex, typical RFs are sharply tuned to a precise orientation. There is currently no unified theory explaining these differences in representations across layers. Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. The key constraint is a reduced number of neurons at the retinal output, consistent with the anatomy of the optic nerve as a stringent bottleneck. Second, we find that, for simple cortical networks, visual representations at the retinal output emerge as nonlinear and lossy feature detectors, whereas they emerge as linear and faithful encoders of the visual scene for more complex cortices. This result predicts that the retinas of small vertebrates should perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli. These predictions could reconcile the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system. |
Tasks | |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00945v1 |
http://arxiv.org/pdf/1901.00945v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-theory-of-early-visual |
Repo | https://github.com/ganguli-lab/RetinalResources |
Framework | tf |
ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)
Title | ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks) |
Authors | Harshal Mittal, Kartikey Pandey, Yash Kant |
Abstract | This work is a part of ICLR Reproducibility Challenge 2019, we try to reproduce the results in the conference submission PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks. Adaptive gradient methods proposed in past demonstrate a degraded generalization performance than the stochastic gradient descent (SGD) with momentum. The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum. With this method a new tunable hyperparameter called partially adaptive parameter p is introduced that varies between [0, 0.5]. We build the proposed optimizer and use it to mirror the experiments performed by the authors. We review and comment on the empirical analysis performed by the authors. Finally, we also propose a future direction for further study of Padam. Our code is available at: https://github.com/yashkant/Padam-Tensorflow |
Tasks | |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.09517v1 |
http://arxiv.org/pdf/1901.09517v1.pdf | |
PWC | https://paperswithcode.com/paper/iclr-reproducibility-challenge-report-padam |
Repo | https://github.com/yashkant/Padam-Tensorflow |
Framework | tf |
Improving Neural Machine Translation with Parent-Scaled Self-Attention
Title | Improving Neural Machine Translation with Parent-Scaled Self-Attention |
Authors | Emanuele Bugliarello, Naoaki Okazaki |
Abstract | Most neural machine translation (NMT) models operate on source and target sentences, treating them as sequences of words and neglecting their syntactic structure. Recent studies have shown that embedding the syntax information of a source sentence in recurrent neural networks can improve their translation accuracy, especially for low-resource language pairs. However, state-of-the-art NMT models are based on self-attention networks (e.g., Transformer), in which it is still not clear how to best embed syntactic information. In this work, we explore different approaches to make such models syntactically aware. Moreover, we propose a novel method to incorporate syntactic information in the self-attention mechanism of the Transformer encoder by introducing attention heads that can attend to the dependency parent of each token. The proposed model is simple yet effective, requiring no additional parameter and improving the translation quality of the Transformer model especially for long sentences and low-resource scenarios. We show the efficacy of the proposed approach on NC11 English-German, WMT16 and WMT17 English-German, WMT18 English-Turkish, and WAT English-Japanese translation tasks. |
Tasks | Machine Translation |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03149v1 |
https://arxiv.org/pdf/1909.03149v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-neural-machine-translation-with-3 |
Repo | https://github.com/e-bug/pascal |
Framework | pytorch |
Pareto Multi-Task Learning
Title | Pareto Multi-Task Learning |
Authors | Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, Sam Kwong |
Abstract | Multi-task learning is a powerful method for solving multiple correlated tasks simultaneously. However, it is often impossible to find one single solution to optimize all the tasks, since different tasks might conflict with each other. Recently, a novel method is proposed to find one single Pareto optimal solution with good trade-off among different tasks by casting multi-task learning as multiobjective optimization. In this paper, we generalize this idea and propose a novel Pareto multi-task learning algorithm (Pareto MTL) to find a set of well-distributed Pareto solutions which can represent different trade-offs among different tasks. The proposed algorithm first formulates a multi-task learning problem as a multiobjective optimization problem, and then decomposes the multiobjective optimization problem into a set of constrained subproblems with different trade-off preferences. By solving these subproblems in parallel, Pareto MTL can find a set of well-representative Pareto optimal solutions with different trade-off among all tasks. Practitioners can easily select their preferred solution from these Pareto solutions, or use different trade-off solutions for different situations. Experimental results confirm that the proposed algorithm can generate well-representative solutions and outperform some state-of-the-art algorithms on many multi-task learning applications. |
Tasks | Multiobjective Optimization, Multi-Task Learning |
Published | 2019-12-30 |
URL | https://arxiv.org/abs/1912.12854v1 |
https://arxiv.org/pdf/1912.12854v1.pdf | |
PWC | https://paperswithcode.com/paper/pareto-multi-task-learning-1 |
Repo | https://github.com/Xi-L/ParetoMTL |
Framework | pytorch |
A Silver Standard Corpus of Human Phenotype-Gene Relations
Title | A Silver Standard Corpus of Human Phenotype-Gene Relations |
Authors | Diana Sousa, Andre Lamurias, Francisco M. Couto |
Abstract | Human phenotype-gene relations are fundamental to fully understand the origin of some phenotypic abnormalities and their associated diseases. Biomedical literature is the most comprehensive source of these relations, however, we need Relation Extraction tools to automatically recognize them. Most of these tools require an annotated corpus and to the best of our knowledge, there is no corpus available annotated with human phenotype-gene relations. This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. The corpus consists of 1712 abstracts, 5676 human phenotype annotations, 13835 gene annotations, and 4283 relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%. By using the corpus we were able to obtain promising results with two state-of-the-art deep learning tools, namely 78.05% of precision. The PGR corpus was made publicly available to the research community. |
Tasks | Named Entity Recognition, Relation Extraction |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10728v1 |
http://arxiv.org/pdf/1903.10728v1.pdf | |
PWC | https://paperswithcode.com/paper/a-silver-standard-corpus-of-human-phenotype |
Repo | https://github.com/lasigeBioTM/PGR |
Framework | jax |
One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment
Title | One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment |
Authors | Vinit Sarode, Xueqian Li, Hunter Goforth, Yasuhiro Aoki, Animesh Dhagat, Rangaprasad Arun Srivatsan, Simon Lucey, Howie Choset |
Abstract | PointNet has recently emerged as a popular representation for unstructured point cloud data, allowing application of deep learning to tasks such as object detection, segmentation and shape completion. However, recent works in literature have shown the sensitivity of the PointNet representation to pose misalignment. This paper presents a novel framework that uses PointNet encoding to align point clouds and perform registration for applications such as 3D reconstruction, tracking and pose estimation. We develop a framework that compares PointNet features of template and source point clouds to find the transformation that aligns them accurately. In doing so, we avoid computationally expensive correspondence finding steps, that are central to popular registration methods such as ICP and its variants. Depending on the prior information about the shape of the object formed by the point clouds, our framework can produce approaches that are shape specific or general to unseen shapes. Our framework produces approaches that are robust to noise and initial misalignment in data and work robustly with sparse as well as partial point clouds. We perform extensive simulation and real-world experiments to validate the efficacy of our approach and compare the performance with state-of-art approaches. Code is available at https://github.com/vinits5/pointnet-registrationframework. |
Tasks | 3D Reconstruction, Object Detection, Pose Estimation |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.05766v1 |
https://arxiv.org/pdf/1912.05766v1.pdf | |
PWC | https://paperswithcode.com/paper/one-framework-to-register-them-all-pointnet |
Repo | https://github.com/vinits5/pointnet-registration-framework |
Framework | tf |
Is it Raining Outside? Detection of Rainfall using General-Purpose Surveillance Cameras
Title | Is it Raining Outside? Detection of Rainfall using General-Purpose Surveillance Cameras |
Authors | Joakim Bruslund Haurum, Chris H. Bahnsen, Thomas B. Moeslund |
Abstract | In integrated surveillance systems based on visual cameras, the mitigation of adverse weather conditions is an active research topic. Within this field, rain removal algorithms have been developed that artificially remove rain streaks from images or video. In order to deploy such rain removal algorithms in a surveillance setting, one must detect if rain is present in the scene. In this paper, we design a system for the detection of rainfall by the use of surveillance cameras. We reimplement the former state-of-the-art method for rain detection and compare it against a modern CNN-based method by utilizing 3D convolutions. The two methods are evaluated on our new AAU Visual Rain Dataset (VIRADA) that consists of 215 hours of general-purpose surveillance video from two traffic crossings. The results show that the proposed 3D CNN outperforms the previous state-of-the-art method by a large margin on all metrics, for both of the traffic crossings. Finally, it is shown that the choice of region-of-interest has a large influence on performance when trying to generalize the investigated methods. The AAU VIRADA dataset and our implementation of the two rain detection algorithms are publicly available at https://bitbucket.org/aauvap/aau-virada. |
Tasks | Rain Removal |
Published | 2019-08-12 |
URL | https://arxiv.org/abs/1908.04034v1 |
https://arxiv.org/pdf/1908.04034v1.pdf | |
PWC | https://paperswithcode.com/paper/is-it-raining-outside-detection-of-rainfall |
Repo | https://github.com/chrisbahnsen/aau-virada |
Framework | pytorch |
ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining
Title | ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining |
Authors | Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Abdullah Hanif, Muhammad Shafique |
Abstract | The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source “EvoApprox” library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate. |
Tasks | Multiobjective Optimization |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1907.07229v2 |
https://arxiv.org/pdf/1907.07229v2.pdf | |
PWC | https://paperswithcode.com/paper/alwann-automatic-layer-wise-approximation-of |
Repo | https://github.com/ehw-fit/tf-approximate |
Framework | tf |
Learning Outside the Box: Discourse-level Features Improve Metaphor Identification
Title | Learning Outside the Box: Discourse-level Features Improve Metaphor Identification |
Authors | Jesse Mu, Helen Yannakoudakis, Ekaterina Shutova |
Abstract | Most current approaches to metaphor identification use restricted linguistic contexts, e.g. by considering only a verb’s arguments or the sentence containing a phrase. Inspired by pragmatic accounts of metaphor, we argue that broader discourse features are crucial for better metaphor identification. We train simple gradient boosting classifiers on representations of an utterance and its surrounding discourse learned with a variety of document embedding methods, obtaining near state-of-the-art results on the 2018 VU Amsterdam metaphor identification task without the complex metaphor-specific features or deep neural architectures employed by other systems. A qualitative analysis further confirms the need for broader context in metaphor processing. |
Tasks | Document Embedding |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02246v2 |
http://arxiv.org/pdf/1904.02246v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-outside-the-box-discourse-level |
Repo | https://github.com/jayelm/broader-metaphor |
Framework | none |
Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Title | Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis |
Authors | Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e., text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the decision making. We propose a context-level inter-modal attention framework for simultaneously predicting the sentiment and expressed emotions of an utterance. We evaluate our proposed approach on CMU-MOSEI dataset for multi-modal sentiment and emotion analysis. Evaluation results suggest that multi-task learning framework offers improvement over the single-task framework. The proposed approach reports new state-of-the-art performance for both sentiment analysis and emotion analysis. |
Tasks | Decision Making, Emotion Recognition, Multi-Task Learning, Sentiment Analysis |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05812v1 |
https://arxiv.org/pdf/1905.05812v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-multi-modal-emotion |
Repo | https://github.com/16631140828/Paper-list |
Framework | none |
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
Title | Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments |
Authors | Maxime Bouton, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer |
Abstract | Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario. |
Tasks | Decision Making |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11483v1 |
http://arxiv.org/pdf/1904.11483v1.pdf | |
PWC | https://paperswithcode.com/paper/safe-reinforcement-learning-with-scene |
Repo | https://github.com/MaximeBouton/AutomotiveSafeRL |
Framework | none |
Multimodal Representation Learning using Deep Multiset Canonical Correlation
Title | Multimodal Representation Learning using Deep Multiset Canonical Correlation |
Authors | Krishna Somandepalli, Naveen Kumar, Ruchir Travadi, Shrikanth Narayanan |
Abstract | We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities. We use deep learning framework to learn non-linear transformations from different modalities to a shared subspace such that the representations maximize the ratio of between- and within-modality covariance of the observations. Unlike linear discriminant analysis, we do not need class information to learn these representations, and we show that this model can be trained for complex data using mini-batches. Using synthetic data experiments, we show that dMCCA can effectively recover the common signal across the different modalities corrupted by multiplicative and additive noise. We also analyze the sensitivity of our model to recover the correlated components with respect to mini-batch size and dimension of the embeddings. Performance evaluation on noisy handwritten datasets shows that our model outperforms other CCA-based approaches and is comparable to deep neural network models trained end-to-end on this dataset. |
Tasks | Representation Learning |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01775v1 |
http://arxiv.org/pdf/1904.01775v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-representation-learning-using-deep |
Repo | https://github.com/usc-sail/mica-deep-mcca |
Framework | none |
Attention is not Explanation
Title | Attention is not Explanation |
Authors | Sarthak Jain, Byron C. Wallace |
Abstract | Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations’ for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. Code for all experiments is available at https://github.com/successar/AttentionExplanation. | |
Tasks | Feature Importance |
Published | 2019-02-26 |
URL | https://arxiv.org/abs/1902.10186v3 |
https://arxiv.org/pdf/1902.10186v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-is-not-explanation |
Repo | https://github.com/successar/AttentionExplanation |
Framework | pytorch |
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer
Title | Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer |
Authors | Edward Beeching, Christian Wolf, Jilles Dibangoye, Olivier Simonin |
Abstract | An important goal of research in Deep Reinforcement Learning in mobile robotics is to train agents capable of solving complex tasks, which require a high level of scene understanding and reasoning from an egocentric perspective. When trained from simulations, optimal environments should satisfy a currently unobtainable combination of high-fidelity photographic observations, massive amounts of different environment configurations and fast simulation speeds. In this paper we argue that research on training agents capable of complex reasoning can be simplified by decoupling from the requirement of high fidelity photographic observations. We present a suite of tasks requiring complex reasoning and exploration in continuous, partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Our scenarios combine two key advantages: (i) they are based on a simple but highly efficient 3D environment (ViZDoom) which allows high speed simulation (12000fps); (ii) the scenarios provide the user with a range of difficulty settings, in order to identify the limitations of current state of the art algorithms and network architectures. We aim to increase accessibility to the field of Deep-RL by providing baselines for challenging scenarios where new ideas can be iterated on quickly. We argue that the community should be able to address challenging problems in reasoning of mobile agents without the need for a large compute infrastructure. |
Tasks | Scene Understanding |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01806v1 |
http://arxiv.org/pdf/1904.01806v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-on-a-budget-3d |
Repo | https://github.com/edbeeching/3d_control_deep_rl |
Framework | pytorch |
IRS: A Large Synthetic Indoor Robotics Stereo Dataset for Disparity and Surface Normal Estimation
Title | IRS: A Large Synthetic Indoor Robotics Stereo Dataset for Disparity and Surface Normal Estimation |
Authors | Qiang Wang, Shizhen Zheng, Qingsong Yan, Fei Deng, Kaiyong Zhao, Xiaowen Chu |
Abstract | Indoor robotics localization, navigation and interaction heavily rely on scene understanding and reconstruction. Compared to monocular vision which usually does not explicitly introduce any geometrical constraint, stereo vision based schemes are more promising and robust to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models trained with large-scale datasets have shown their superior performance in many stereo vision tasks. However, existing stereo datasets rarely contain the high-quality surface normal and disparity ground truth, which hardly satisfy the demand of training a prospective deep model for indoor scenes. To this end, we introduce a large-scale synthetic indoor robotics stereo (IRS) dataset with over 100K stereo RGB images and high-quality surface normal and disparity maps. Leveraging the advanced rendering techniques of our customized rendering engine, the dataset is considerably close to the real-world captured images and covers several visual effects, such as brightness changes, light reflection/transmission, lens flare, vivid shadow, etc. We compare the data distribution of IRS with existing stereo datasets to illustrate the typical visual attributes of indoor scenes. In addition, we present a new deep model DispNormNet to simultaneously infer surface normal and disparity from stereo images. Compared to existing models trained on other datasets, DispNormNet trained with IRS produces much better estimation of surface normal and disparity for indoor scenes. |
Tasks | Scene Understanding |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09678v1 |
https://arxiv.org/pdf/1912.09678v1.pdf | |
PWC | https://paperswithcode.com/paper/irs-a-large-synthetic-indoor-robotics-stereo |
Repo | https://github.com/HKBU-HPML/IRS |
Framework | pytorch |