October 20, 2019

3362 words 16 mins read

Paper Group ANR 83

DropFilter: A Novel Regularization Method for Learning Convolutional Neural Networks. End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models. Minimax Lower Bounds for $\mathcal{H}_\infty$-Norm Estimation. Encoding Implicit Relation Requirements for Relation Extraction: A Joint Inference Approach. MONET: Multiview Semi …

DropFilter: A Novel Regularization Method for Learning Convolutional Neural Networks


Title	DropFilter: A Novel Regularization Method for Learning Convolutional Neural Networks
Authors	Hengyue Pan, Hui Jiang, Xin Niu, Yong Dou
Abstract	The past few years have witnessed the fast development of different regularization methods for deep learning models such as fully-connected deep neural networks (DNNs) and Convolutional Neural Networks (CNNs). Most of previous methods mainly consider to drop features from input data and hidden layers, such as Dropout, Cutout and DropBlocks. DropConnect select to drop connections between fully-connected layers. By randomly discard some features or connections, the above mentioned methods control the overfitting problem and improve the performance of neural networks. In this paper, we proposed two novel regularization methods, namely DropFilter and DropFilter-PLUS, for the learning of CNNs. Different from the previous methods, DropFilter and DropFilter-PLUS selects to modify the convolution filters. For DropFilter-PLUS, we find a suitable way to accelerate the learning process based on theoretical analysis. Experimental results on MNIST show that using DropFilter and DropFilter-PLUS may improve performance on image classification tasks.
Tasks	Image Classification
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06783v2
PDF	http://arxiv.org/pdf/1811.06783v2.pdf
PWC	https://paperswithcode.com/paper/dropfilter-a-novel-regularization-method-for
Repo
Framework

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models


Title	End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Authors	Fei Tao, Carlos Busso
Abstract	Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).
Tasks	Action Detection, Activity Detection, Speech Recognition
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04553v1
PDF	http://arxiv.org/pdf/1809.04553v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-audiovisual-speech-activity
Repo
Framework

Minimax Lower Bounds for $\mathcal{H}_\infty$-Norm Estimation


Title	Minimax Lower Bounds for $\mathcal{H}_\infty$-Norm Estimation
Authors	Stephen Tu, Ross Boczar, Benjamin Recht
Abstract	The problem of estimating the $\mathcal{H}\infty$-norm of an LTI system from noisy input/output measurements has attracted recent attention as an alternative to parameter identification for bounding unmodeled dynamics in robust control. In this paper, we study lower bounds for $\mathcal{H}\infty$-norm estimation under a query model where at each iteration the algorithm chooses a bounded input signal and receives the response of the chosen signal corrupted by white noise. We prove that when the underlying system is an FIR filter, $\mathcal{H}_\infty$-norm estimation is no more efficient than model identification for passive sampling. For active sampling, we show that norm estimation is at most a factor of $\log{r}$ more sample efficient than model identification, where $r$ is the length of the filter. We complement our theoretical results with experiments which demonstrate that a simple non-adaptive estimator of the norm is competitive with state-of-the-art adaptive norm estimation algorithms.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10855v1
PDF	http://arxiv.org/pdf/1809.10855v1.pdf
PWC	https://paperswithcode.com/paper/minimax-lower-bounds-for-mathcalh_infty-norm
Repo
Framework

Encoding Implicit Relation Requirements for Relation Extraction: A Joint Inference Approach


Title	Encoding Implicit Relation Requirements for Relation Extraction: A Joint Inference Approach
Authors	Liwei Chen, Yansong Feng, Songfang Huang, Bingfeng Luo, Dongyan Zhao
Abstract	Relation extraction is the task of identifying predefined relationship between entities, and plays an essential role in information extraction, knowledge base construction, question answering and so on. Most existing relation extractors make predictions for each entity pair locally and individually, while ignoring implicit global clues available across different entity pairs and in the knowledge base, which often leads to conflicts among local predictions from different entity pairs. This paper proposes a joint inference framework that employs such global clues to resolve disagreements among local predictions. We exploit two kinds of clues to generate constraints which can capture the implicit type and cardinality requirements of a relation. Those constraints can be examined in either hard style or soft style, both of which can be effectively explored in an integer linear program formulation. Experimental results on both English and Chinese datasets show that our proposed framework can effectively utilize those two categories of global clues and resolve the disagreements among local predictions, thus improve various relation extractors when such clues are applicable to the datasets. Our experiments also indicate that the clues learnt automatically from existing knowledge bases perform comparably to or better than those refined by human.
Tasks	Question Answering, Relation Extraction
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03796v1
PDF	http://arxiv.org/pdf/1811.03796v1.pdf
PWC	https://paperswithcode.com/paper/encoding-implicit-relation-requirements-for
Repo
Framework

MONET: Multiview Semi-supervised Keypoint Detection via Epipolar Divergence


Title	MONET: Multiview Semi-supervised Keypoint Detection via Epipolar Divergence
Authors	Yuan Yao, Yasamin Jafarian, Hyun Soo Park
Abstract	This paper presents MONET – an end-to-end semi-supervised learning framework for a keypoint detector using multiview image streams. In particular, we consider general subjects such as non-human species where attaining a large scale annotated dataset is challenging. While multiview geometry can be used to self-supervise the unlabeled data, integrating the geometry into learning a keypoint detector is challenging due to representation mismatch. We address this mismatch by formulating a new differentiable representation of the epipolar constraint called epipolar divergence—a generalized distance from the epipolar lines to the corresponding keypoint distribution. Epipolar divergence characterizes when two view keypoint distributions produce zero reprojection error. We design a twin network that minimizes the epipolar divergence through stereo rectification that can significantly alleviate computational complexity and sampling aliasing in training. We demonstrate that our framework can localize customized keypoints of diverse species, e.g., humans, dogs, and monkeys.
Tasks	Data Augmentation, Keypoint Detection
Published	2018-05-31
URL	https://arxiv.org/abs/1806.00104v2
PDF	https://arxiv.org/pdf/1806.00104v2.pdf
PWC	https://paperswithcode.com/paper/monet-multiview-semi-supervised-keypoint-via
Repo
Framework

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling


Title	Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
Authors	Minjie Wang, Chien-chin Huang, Jinyang Li
Abstract	Deep learning systems have become vital tools across many fields, but the increasing model sizes mean that training must be accelerated to maintain such systems’ utility. Current systems like Tensorflow and MXNet focus on one specific parallelization strategy, data parallelism, which requires large training batch sizes in order to scale. We cast the problem of finding the best parallelization strategy as the problem of finding the best tiling to partition tensors with the least overall communication. We propose an algorithm that can find the optimal tiling. Our resulting parallelization solution is a hybrid of data parallelism and model parallelism. We build the SoyBean system that performs automatic parallelization. SoyBean automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflow graph based on the optimal tiling it has found. Our evaluations show that SoyBean is 1.5x-4x faster than pure data parallelism for AlexNet and VGG. We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.
Tasks
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04170v1
PDF	http://arxiv.org/pdf/1805.04170v1.pdf
PWC	https://paperswithcode.com/paper/unifying-data-model-and-hybrid-parallelism-in
Repo
Framework

Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution


Title	Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution
Authors	Jonas Rothfuss, Fabio Ferreira, Eren Erdal Aksoy, You Zhou, Tamim Asfour
Abstract	We present a novel deep neural network architecture for representing robot experiences in an episodic-like memory which facilitates encoding, recalling, and predicting action experiences. Our proposed unsupervised deep episodic memory model 1) encodes observed actions in a latent vector space and, based on this latent encoding, 2) infers most similar episodes previously experienced, 3) reconstructs original episodes, and 4) predicts future frames in an end-to-end fashion. Results show that conceptually similar actions are mapped into the same region of the latent vector space. Based on these results, we introduce an action matching and retrieval mechanism, benchmark its performance on two large-scale action datasets, 20BN-something-something and ActivityNet and evaluate its generalization capability in a real-world scenario on a humanoid robot.
Tasks
Published	2018-01-12
URL	http://arxiv.org/abs/1801.04134v3
PDF	http://arxiv.org/pdf/1801.04134v3.pdf
PWC	https://paperswithcode.com/paper/deep-episodic-memory-encoding-recalling-and
Repo
Framework

Assessing the Contribution of Semantic Congruency to Multisensory Integration and Conflict Resolution


Title	Assessing the Contribution of Semantic Congruency to Multisensory Integration and Conflict Resolution
Authors	Di Fu, Pablo Barros, German I. Parisi, Haiyan Wu, Sven Magg, Xun Liu, Stefan Wermter
Abstract	The efficient integration of multisensory observations is a key property of the brain that yields the robust interaction with the environment. However, artificial multisensory perception remains an open issue especially in situations of sensory uncertainty and conflicts. In this work, we extend previous studies on audio-visual (AV) conflict resolution in complex environments. In particular, we focus on quantitatively assessing the contribution of semantic congruency during an AV spatial localization task. In addition to conflicts in the spatial domain (i.e. spatially misaligned stimuli), we consider gender-specific conflicts with male and female avatars. Our results suggest that while semantically related stimuli affect the magnitude of the visual bias (perceptually shifting the location of the sound towards a semantically congruent visual cue), humans still strongly rely on environmental statistics to solve AV conflicts. Together with previously reported results, this work contributes to a better understanding of how multisensory integration and conflict resolution can be modelled in artificial agents and robots operating in real-world environments.
Tasks
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06748v1
PDF	http://arxiv.org/pdf/1810.06748v1.pdf
PWC	https://paperswithcode.com/paper/assessing-the-contribution-of-semantic
Repo
Framework

Opacity, Obscurity, and the Geometry of Question-Asking


Title	Opacity, Obscurity, and the Geometry of Question-Asking
Authors	Christina Boyce-Jacino, Simon DeDeo
Abstract	Asking questions is a pervasive human activity, but little is understood about what makes them difficult to answer. An analysis of a pair of large databases, of New York Times crosswords and questions from the quiz-show Jeopardy, establishes two orthogonal dimensions of question difficulty: obscurity (the rarity of the answer) and opacity (the indirectness of question cues, operationalized with word2vec). The importance of opacity, and the role of synergistic information in resolving it, suggests that accounts of difficulty in terms of prior expectations captures only a part of the question-asking process. A further regression analysis shows the presence of additional dimensions to question-asking: question complexity, the answer’s local network density, cue intersection, and the presence of signal words. Our work shows how question-askers can help their interlocutors by using contextual cues, or, conversely, how a particular kind of unfamiliarity with the domain in question can make it harder for individuals to learn from others. Taken together, these results suggest how Bayesian models of question difficulty can be supplemented by process models and accounts of the heuristics individuals use to navigate conceptual spaces.
Tasks
Published	2018-09-21
URL	http://arxiv.org/abs/1809.08291v1
PDF	http://arxiv.org/pdf/1809.08291v1.pdf
PWC	https://paperswithcode.com/paper/opacity-obscurity-and-the-geometry-of
Repo
Framework

Effects of Higher Order and Long-Range Synchronizations for Classification and Computing in Oscillator-Based Spiking Neural Networks


Title	Effects of Higher Order and Long-Range Synchronizations for Classification and Computing in Oscillator-Based Spiking Neural Networks
Authors	Andrey Velichko, Vadim Putrolaynen, Maksim Belyaev
Abstract	Development of artificial oscillator-based spiking neural networks (SNN), which are able to solve effectively various cybernetics problems including image recognition and adaptive control, is a key line of research. We have thoroughly explored the scheme of two thermally coupled $VO_2$ oscillators and found its effect of high order synchronization (HOS), which may be used to increase SNN classification capacity $N_s$. Phase-locking estimation method has been developed to determine values of subharmonic ratio SHR and synchronization effectiveness {\eta}. The experimental scheme has $N_s=12$ and SHR distributions are shaped as Arnold’s tongues. In a model $N_s$ may reach maximum of $N_s>150$ at certain levels of coupling strength and noise. We demonstrate the long-range synchronization effect in a one-dimensional chain of oscillators and the phenomenon of synchronization transfer even at low values of {\eta} for intermediate links. The paper demonstrates realization of analogue operation of “multiplication”, binary logic, and possibility of development of the interface between SNN and computer. The described effects increasing classification capacity of oscillator schemes and calculation principles based on the universal physical effect - HOS may be applied for any spiking type oscillators with any coupling type therefore enhancing practical value of the presented results to expand SNN capabilities.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03395v1
PDF	http://arxiv.org/pdf/1804.03395v1.pdf
PWC	https://paperswithcode.com/paper/effects-of-higher-order-and-long-range
Repo
Framework

Random Temporal Skipping for Multirate Video Analysis


Title	Random Temporal Skipping for Multirate Video Analysis
Authors	Yi Zhu, Shawn Newsam
Abstract	Current state-of-the-art approaches to video understanding adopt temporal jittering to simulate analyzing the video at varying frame rates. However, this does not work well for multirate videos, in which actions or subactions occur at different speeds. The frame sampling rate should vary in accordance with the different motion speeds. In this work, we propose a simple yet effective strategy, termed random temporal skipping, to address this situation. This strategy effectively handles multirate videos by randomizing the sampling rate during training. It is an exhaustive approach, which can potentially cover all motion speed variations. Furthermore, due to the large temporal skipping, our network can see video clips that originally cover over 100 frames. Such a time range is enough to analyze most actions/events. We also introduce an occlusion-aware optical flow learning method that generates improved motion maps for human action recognition. Our framework is end-to-end trainable, runs in real-time, and achieves state-of-the-art performance on six widely adopted video benchmarks.
Tasks	Optical Flow Estimation, Temporal Action Localization, Video Understanding
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12522v1
PDF	http://arxiv.org/pdf/1810.12522v1.pdf
PWC	https://paperswithcode.com/paper/random-temporal-skipping-for-multirate-video
Repo
Framework

End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching


Title	End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching
Authors	Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jan Ernst, Jana Kosecka
Abstract	Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors. Various learning approaches have been applied in the past to different stages of the matching pipeline, considering detector, descriptor, or metric learning objectives. These objectives were typically addressed separately and most previous work has focused on image data. This paper proposes an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can be jointly optimized towards task-specific objectives without a need for separate annotations. We employ a Siamese architecture augmented by a sampling layer and a novel score loss function which in turn affects the selection of region proposals. The positive and negative examples are obtained automatically by sampling corresponding region proposals based on their consistency with known 3D pose labels. Matching experiments with depth data on multiple benchmark datasets demonstrate the efficacy of the proposed approach, showing significant improvements over state-of-the-art methods.
Tasks	Image Retrieval, Keypoint Detection, Metric Learning
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07869v2
PDF	http://arxiv.org/pdf/1802.07869v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-keypoint-detector-and
Repo
Framework

Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data


Title	Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data
Authors	Kahkashan Afrin, Gurudev Illangovan, Sanjay S. Srivatsa, Satish T. S. Bukkapatnam
Abstract	Accuracies of survival models for life expectancy prediction as well as critical-care applications are significantly compromised due to the sparsity of samples and extreme imbalance between the survival (usually, the majority) and mortality class sizes. While a recent random survival forest (RSF) model overcomes the limitations of the proportional hazard assumption, an imbalance in the data results in an underestimation (overestimation) of the hazard of the mortality (survival) classes. A balanced random survival forests (BRSF) model, based on training the RSF model with data generated from a synthetic minority sampling scheme is presented to address this gap. Theoretical results on the effect of balancing on prediction accuracies in BRSF are reported. Benchmarking studies were conducted using five datasets with different levels of class imbalance from public repositories and an imbalanced dataset of 267 acute cardiac patients, collected at the Heart, Artery, and Vein Center of Fresno, CA. Investigations suggest that BRSF provides an improved discriminatory strength between the survival and the mortality classes. It outperformed both optimized Cox (without and with balancing) and RSF with an average reduction of 55% in the prediction error over the next best alternative.
Tasks
Published	2018-03-24
URL	http://arxiv.org/abs/1803.09177v2
PDF	http://arxiv.org/pdf/1803.09177v2.pdf
PWC	https://paperswithcode.com/paper/balanced-random-survival-forests-for
Repo
Framework

SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems


Title	SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems
Authors	Ankush Desai, Shromona Ghosh, Sanjit A. Seshia, Natarajan Shankar, Ashish Tiwari
Abstract	The recent drive towards achieving greater autonomy and intelligence in robotics has led to high levels of complexity. Autonomous robots increasingly depend on third party off-the-shelf components and complex machine-learning techniques. This trend makes it challenging to provide strong design-time certification of correct operation. To address these challenges, we present SOTER, a robotics programming framework with two key components: (1) a programming language for implementing and testing high-level reactive robotics software and (2) an integrated runtime assurance (RTA) system that helps enable the use of uncertified components, while still providing safety guarantees. SOTER provides language primitives to declaratively construct a RTA module consisting of an advanced, high-performance controller (uncertified), a safe, lower-performance controller (certified), and the desired safety specification. The framework provides a formal guarantee that a well-formed RTA module always satisfies the safety specification, without completely sacrificing performance by using higher performance uncertified components whenever safe. SOTER allows the complex robotics software stack to be constructed as a composition of RTA modules, where each uncertified component is protected using a RTA module. To demonstrate the efficacy of our framework, we consider a real-world case-study of building a safe drone surveillance system. Our experiments both in simulation and on actual drones show that the SOTER-enabled RTA ensures the safety of the system, including when untrusted third-party components have bugs or deviate from the desired behavior.
Tasks
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07921v3
PDF	http://arxiv.org/pdf/1808.07921v3.pdf
PWC	https://paperswithcode.com/paper/soter-programming-safe-robotics-system-using
Repo
Framework

Application of Superpixels to Segment Several Landmarks in Running Rodents


Title	Application of Superpixels to Segment Several Landmarks in Running Rodents
Authors	Omid Haji Maghsoudi, Annie Vahedipour, Benjamin Robertson, Andrew Spence
Abstract	Examining locomotion has improved our basic understanding of motor control and aided in treating motor impairment. Mice and rats are the model system of choice for basic neuroscience studies of human disease. High frame rates are needed to quantify the kinematics of running rodents, due to their high stride frequency. Manual tracking, especially for multiple body landmarks, becomes extremely time-consuming. To overcome these limitations, we proposed the use of superpixels based image segmentation as superpixels utilized both spatial and color information for segmentation. We segmented some parts of body and tested the success of segmentation as a function of color space and SLIC segment size. We used a simple merging function to connect the segmented regions considered as neighbor and having the same intensity value range. In addition, 28 features were extracted, and t-SNE was used to demonstrate how much the methods are capable to differentiate the regions. Finally, we compared the segmented regions to a manually outlined region. The results showed for segmentation, using the RGB image was slightly better compared to the hue channel. For merg- ing and classification, however, the hue representation was better as it captures the relevant color information in a single channel.
Tasks	Semantic Segmentation
Published	2018-04-07
URL	http://arxiv.org/abs/1804.02574v1
PDF	http://arxiv.org/pdf/1804.02574v1.pdf
PWC	https://paperswithcode.com/paper/application-of-superpixels-to-segment-several
Repo
Framework