April 2, 2020

3075 words 15 mins read

Paper Group ANR 332

Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling. Ecological Semantics: Programming Environments for Situated Language Understanding. Attention-guided Chained Context Aggregation for Semantic Segmentation. Improving Learning Effectiveness For Object Detection and Classification in Cluttered Backgrounds. Towards Interpretable Seman …

Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling


Title	Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling
Authors	Hyun-Jin Park, Patrick Violette, Niranjan Subrahmanya
Abstract	We propose smoothed max pooling loss and its application to keyword spotting systems. The proposed approach jointly trains an encoder (to detect keyword parts) and a decoder (to detect whole keyword) in a semi-supervised manner. The proposed new loss function allows training a model to detect parts and whole of a keyword, without strictly depending on frame-level labeling from LVCSR (Large vocabulary continuous speech recognition), making further optimization possible. The proposed system outperforms the baseline keyword spotting model in [1] due to increased optimizability. Further, it can be more easily adapted for on-device learning applications due to reduced dependency on LVCSR.
Tasks	Keyword Spotting, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2020-01-25
URL	https://arxiv.org/abs/2001.09246v1
PDF	https://arxiv.org/pdf/2001.09246v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-detect-keyword-parts-and-whole-by
Repo
Framework

Ecological Semantics: Programming Environments for Situated Language Understanding


Title	Ecological Semantics: Programming Environments for Situated Language Understanding
Authors	Ronen Tamari, Gabriel Stanovsky, Dafna Shahaf, Reut Tsarfaty
Abstract	Large-scale natural language understanding (NLU) systems have made impressive progress: they can be applied flexibly across a variety of tasks, and employ minimal structural assumptions. However, extensive empirical research has shown this to be a double-edged sword, coming at the cost of shallow understanding: inferior generalization, grounding and explainability. Grounded language learning approaches offer the promise of deeper understanding by situating learning in richer, more structured training environments, but are limited in scale to relatively narrow, predefined domains. How might we enjoy the best of both worlds: grounded, general NLU? Following extensive contemporary cognitive science, we propose treating environments as ``first-class citizens’’ in semantic representations, worthy of research and development in their own right. Importantly, models should also be partners in the creation and configuration of environments, rather than just actors within them, as in existing approaches. To do so, we argue that models must begin to understand and program in the language of affordances (which define possible actions in a given situation) both for online, situated discourse comprehension, as well as large-scale, offline common-sense knowledge mining. To this end we propose an environment-oriented ecological semantics, outlining theoretical and practical approaches towards implementation. We further provide actual demonstrations building upon interactive fiction programming languages. \|
Tasks	Common Sense Reasoning
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04567v1
PDF	https://arxiv.org/pdf/2003.04567v1.pdf
PWC	https://paperswithcode.com/paper/ecological-semantics-programming-environments
Repo
Framework

Attention-guided Chained Context Aggregation for Semantic Segmentation


Title	Attention-guided Chained Context Aggregation for Semantic Segmentation
Authors	Quan Tang, Fagui Liu, Jun Jiang, Yu Zhang
Abstract	Recent breakthroughs in semantic segmentation methods based on Fully Convolutional Networks (FCNs) have aroused great research interest. One of the critical issues is how to aggregate multi-scale contextual information effectively to obtain reliable results. To address this problem, we propose a novel paradigm called the Chained Context Aggregation Module (CAM). CAM gains features of various spatial scales through chain-connected ladder-style information flows. The features are then guided by Flow Guidance Connections to interact and fuse in a two-stage process, which we refer to as pre-fusion and re-fusion. We further adopt attention models in CAM to productively recombine and select those fused features to refine performance. Based on these developments, we construct the Chained Context Aggregation Network (CANet), which employs a two-step decoder to recover precise spatial details of prediction maps. We conduct extensive experiments on three challenging datasets, including Pascal VOC 2012, CamVid and SUN-RGBD. Results evidence that our CANet achieves state-of-the-art performance. Codes will be available on the publication of this paper.
Tasks	Semantic Segmentation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12041v1
PDF	https://arxiv.org/pdf/2002.12041v1.pdf
PWC	https://paperswithcode.com/paper/attention-guided-chained-context-aggregation
Repo
Framework

Improving Learning Effectiveness For Object Detection and Classification in Cluttered Backgrounds


Title	Improving Learning Effectiveness For Object Detection and Classification in Cluttered Backgrounds
Authors	Vinorth Varatharasan, Hyo-Sang Shin, Antonios Tsourdos, Nick Colosimo
Abstract	Usually, Neural Networks models are trained with a large dataset of images in homogeneous backgrounds. The issue is that the performance of the network models trained could be significantly degraded in a complex and heterogeneous environment. To mitigate the issue, this paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds. It is clear that the learning effectiveness of the proposed framework should be improved in complex and heterogeneous environments, compared with the ones with the typical dataset. In our framework, a state-of-the-art image segmentation technique called DeepLab is used to extract objects of interest from a picture and Chroma-key technique is then used to merge the extracted objects of interest into specific heterogeneous backgrounds. The performance of the proposed framework is investigated through empirical tests and compared with that of the model trained with the COCO dataset. The results show that the proposed framework outperforms the model compared. This implies that the learning effectiveness of the framework developed is superior to the models with the typical dataset.
Tasks	Object Detection, Semantic Segmentation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12467v1
PDF	https://arxiv.org/pdf/2002.12467v1.pdf
PWC	https://paperswithcode.com/paper/improving-learning-effectiveness-for-object
Repo
Framework

Towards Interpretable Semantic Segmentation via Gradient-weighted Class Activation Mapping


Title	Towards Interpretable Semantic Segmentation via Gradient-weighted Class Activation Mapping
Authors	Kira Vinogradova, Alexandr Dibrov, Gene Myers
Abstract	Convolutional neural networks have become state-of-the-art in a wide range of image recognition tasks. The interpretation of their predictions, however, is an active area of research. Whereas various interpretation methods have been suggested for image classification, the interpretation of image segmentation still remains largely unexplored. To that end, we propose SEG-GRAD-CAM, a gradient-based method for interpreting semantic segmentation. Our method is an extension of the widely-used Grad-CAM method, applied locally to produce heatmaps showing the relevance of individual pixels for semantic segmentation.
Tasks	Image Classification, Semantic Segmentation
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11434v1
PDF	https://arxiv.org/pdf/2002.11434v1.pdf
PWC	https://paperswithcode.com/paper/towards-interpretable-semantic-segmentation
Repo
Framework

Efficient Semantic Video Segmentation with Per-frame Inference


Title	Efficient Semantic Video Segmentation with Per-frame Inference
Authors	Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
Abstract	In semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence. Advanced methods take into considerations the correlations in the video sequence, e.g., by propagating the results to the neighboring frames using optical flow, or extracting the frame representations with other frames, which may lead to inaccurate results or unbalanced latency. In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. Different from previous per-frame models, we explicitly consider the temporal consistency among frames as extra constraints during the training process and embed the temporal consistency into the segmentation network. Therefore, in the inference process, we can process each frame independently with no latency, and improve the temporal consistency with no extra computational cost and post-processing. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed. Our results outperform previous keyframe based methods with a better trade-off between the accuracy and the inference speed on popular benchmarks, including the Cityscapes and Camvid. The temporal consistency is also improved compared with corresponding baselines which are trained with each frame independently. Code is available at: https://tinyurl.com/segment-video
Tasks	Optical Flow Estimation, Semantic Segmentation, Video Semantic Segmentation
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11433v1
PDF	https://arxiv.org/pdf/2002.11433v1.pdf
PWC	https://paperswithcode.com/paper/efficient-semantic-video-segmentation-with
Repo
Framework

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge


Title	KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge
Authors	Peng Zhang, Jianye Hao, Weixun Wang, Hongyao Tang, Yi Ma, Yihai Duan, Yan Zheng
Abstract	Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.
Tasks	Common Sense Reasoning, Continuous Control
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07418v1
PDF	https://arxiv.org/pdf/2002.07418v1.pdf
PWC	https://paperswithcode.com/paper/kogun-accelerating-deep-reinforcement
Repo
Framework

Taurus: An Intelligent Data Plane


Title	Taurus: An Intelligent Data Plane
Authors	Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Kunle Olukotun
Abstract	Emerging applications – cloud computing, the internet of things, and augmented/virtual reality – need responsive, available, secure, ubiquitous, and scalable datacenter networks. Network management currently uses simple, per-packet, data-plane heuristics (e.g., ECMP and sketches) under an intelligent, millisecond-latency control plane that runs data-driven performance and security policies. However, to meet users’ quality-of-service expectations in a modern data center, networks must operate intelligently at line rate. In this paper, we present Taurus, an intelligent data plane capable of machine-learning inference at line rate. Taurus adds custom hardware based on a map-reduce abstraction to programmable network devices, such as switches and NICs; this new hardware uses pipelined and SIMD parallelism for fast inference. Our evaluation of a Taurus-enabled switch ASIC – supporting several real-world benchmarks – shows that Taurus operates three orders of magnitude faster than a server-based control plane, while increasing area by 24% and latency, on average, by 178 ns. On the long road to self-driving networks, Taurus is the equivalent of adaptive cruise control: deterministic rules steer flows, while machine learning tunes performance and heightens security.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.08987v1
PDF	https://arxiv.org/pdf/2002.08987v1.pdf
PWC	https://paperswithcode.com/paper/taurus-an-intelligent-data-plane
Repo
Framework

Exploring Unknown Universes in Probabilistic Relational Models


Title	Exploring Unknown Universes in Probabilistic Relational Models
Authors	Tanya Braun, Ralf Möller
Abstract	Large probabilistic models are often shaped by a pool of known individuals (a universe) and relations between them. Lifted inference algorithms handle sets of known individuals for tractable inference. Universes may not always be known, though, or may only described by assumptions such as “small universes are more likely”. Without a universe, inference is no longer possible for lifted algorithms, losing their advantage of tractable inference. The aim of this paper is to define a semantics for models with unknown universes decoupled from a specific constraint language to enable lifted and thereby, tractable inference.
Tasks
Published	2020-01-07
URL	https://arxiv.org/abs/2001.02021v1
PDF	https://arxiv.org/pdf/2001.02021v1.pdf
PWC	https://paperswithcode.com/paper/exploring-unknown-universes-in-probabilistic
Repo
Framework

Knowledge Integration Networks for Action Recognition


Title	Knowledge Integration Networks for Action Recognition
Authors	Shiwen Zhang, Sheng Guo, Limin Wang, Weilin Huang, Matthew R. Scott
Abstract	In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.
Tasks	Human Parsing, Scene Recognition, Temporal Action Localization
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07471v1
PDF	https://arxiv.org/pdf/2002.07471v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-integration-networks-for-action
Repo
Framework

Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection


Title	Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection
Authors	Zuyao Chen, Qingming Huang
Abstract	There are two main issues in RGB-D salient object detection: (1) how to effectively integrate the complementarity from the cross-modal RGB-D data; (2) how to prevent the contamination effect from the unreliable depth map. In fact, these two problems are linked and intertwined, but the previous methods tend to focus only on the first problem and ignore the consideration of depth map quality, which may yield the model fall into the sub-optimal state. In this paper, we address these two issues in a holistic model synergistically, and propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity. By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner, and guide the fusion process of two modal data to prevent the contamination occurred. The gated multi-modality attention module in the fusion process exploits the attention mechanism with a gate controller to capture long-range dependencies from a cross-modal perspective. Experimental results compared with 15 state-of-the-art methods on 8 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively.
Tasks	Object Detection, Salient Object Detection
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08608v2
PDF	https://arxiv.org/pdf/2003.08608v2.pdf
PWC	https://paperswithcode.com/paper/depth-potentiality-aware-gated-attention
Repo
Framework

HyNNA: Improved Performance for Neuromorphic Vision Sensor based Surveillance using Hybrid Neural Network Architecture


Title	HyNNA: Improved Performance for Neuromorphic Vision Sensor based Surveillance using Hybrid Neural Network Architecture
Authors	Deepak Singla, Soham Chatterjee, Lavanya Ramapantulu, Andres Ussa, Bharath Ramesh, Arindam Basu
Abstract	Applications in the Internet of Video Things (IoVT) domain have very tight constraints with respect to power and area. While neuromorphic vision sensors (NVS) may offer advantages over traditional imagers in this domain, the existing NVS systems either do not meet the power constraints or have not demonstrated end-to-end system performance. To address this, we improve on a recently proposed hybrid event-frame approach by using morphological image processing algorithms for region proposal and address the low-power requirement for object detection and classification by exploring various convolutional neural network (CNN) architectures. Specifically, we compare the results obtained from our object detection framework against the state-of-the-art low-power NVS surveillance system and show an improved accuracy of 82.16% from 63.1%. Moreover, we show that using multiple bits does not improve accuracy, and thus, system designers can save power and area by using only single bit event polarity information. In addition, we explore the CNN architecture space for object classification and show useful insights to trade-off accuracy for lower power using lesser memory and arithmetic operations.
Tasks	Object Classification, Object Detection
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08603v1
PDF	https://arxiv.org/pdf/2003.08603v1.pdf
PWC	https://paperswithcode.com/paper/hynna-improved-performance-for-neuromorphic
Repo
Framework

Learning Perception and Planning with Deep Active Inference


Title	Learning Perception and Planning with Deep Active Inference
Authors	Ozan Çatal, Tim Verbelen, Johannes Nauta, Cedric De Boom, Bart Dhoedt
Abstract	Active inference is a process theory of the brain that states that all living organisms infer actions in order to minimize their (expected) free energy. However, current experiments are limited to predefined, often discrete, state spaces. In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference.
Tasks
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11841v2
PDF	https://arxiv.org/pdf/2001.11841v2.pdf
PWC	https://paperswithcode.com/paper/learning-perception-and-planning-with-deep
Repo
Framework

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing


Title	Scaling up Kernel Ridge Regression via Locality Sensitive Hashing
Authors	Michael Kapralov, Navid Nouri, Ilya Razenshteyn, Ameya Velingker, Amir Zandieh
Abstract	Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes, such as the Gaussian kernel and Matern kernel. In this paper, we introduce a simple weighted version of random binning features and show that the corresponding kernel function generates Gaussian processes of any desired smoothness. We show that our weighted random binning features provide a spectral approximation to the corresponding kernel matrix, leading to efficient algorithms for kernel ridge regression. Experiments on large scale regression datasets show that our method outperforms the accuracy of random Fourier features method.
Tasks	Gaussian Processes
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09756v1
PDF	https://arxiv.org/pdf/2003.09756v1.pdf
PWC	https://paperswithcode.com/paper/scaling-up-kernel-ridge-regression-via
Repo
Framework

Deep Entity Matching with Pre-Trained Language Models


Title	Deep Entity Matching with Pre-Trained Language Models
Authors	Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, Wang-Chiew Tan
Abstract	We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. We fine-tune and cast EM as a sequence-pair classification problem to leverage such models with a simple architecture. Our experiments show that a straightforward application of language models such as BERT, DistilBERT, or ALBERT pre-trained on large text corpora already significantly improves the matching quality and outperforms previous state-of-the-art (SOTA), by up to 19% of F1 score on benchmark datasets. We also developed three optimization techniques to further improve Ditto’s matching capability. Ditto allows domain knowledge to be injected by highlighting important pieces of input information that may be of interest when making matching decisions. Ditto also summarizes strings that are too long so that only the essential information is retained and used for EM. Finally, Ditto adapts a SOTA technique on data augmentation for text to EM to augment the training data with (difficult) examples. This way, Ditto is forced to learn “harder” to improve the model’s matching capability. The optimizations we developed further boost the performance of Ditto by up to 8.5%. Perhaps more surprisingly, we establish that Ditto can achieve the previous SOTA results with at most half the number of labeled data. Finally, we demonstrate Ditto’s effectiveness on a real-world large-scale EM task. On matching two company datasets consisting of 789K and 412K records, Ditto achieves a high F1 score of 96.5%.
Tasks	Data Augmentation
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00584v1
PDF	https://arxiv.org/pdf/2004.00584v1.pdf
PWC	https://paperswithcode.com/paper/deep-entity-matching-with-pre-trained
Repo
Framework