October 20, 2019

3140 words 15 mins read

Paper Group ANR 30

Robust Text Classifier on Test-Time Budgets. Multi-modality Sensor Data Classification with Selective Attention. Newton Methods for Convolutional Neural Networks. EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction. Fully Convolutional Adaptation Networks for Semantic Segmentation. Multi-Label Learning from Medical Plain Text with Con …

Robust Text Classifier on Test-Time Budgets


Title	Robust Text Classifier on Test-Time Budgets
Authors	Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang, Venkatesh Saligrama
Abstract	We propose a generic and interpretable learning framework for building robust text classification model that achieves accuracy comparable to full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and passes them to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.
Tasks	Text Classification
Published	2018-08-24
URL	https://arxiv.org/abs/1808.08270v5
PDF	https://arxiv.org/pdf/1808.08270v5.pdf
PWC	https://paperswithcode.com/paper/building-a-robust-text-classifier-on-a-test
Repo
Framework

Multi-modality Sensor Data Classification with Selective Attention


Title	Multi-modality Sensor Data Classification with Selective Attention
Authors	Xiang Zhang, Lina Yao, Chaoran Huang, Sen Wang, Mingkui Tan, Guodong Long, Can Wang
Abstract	Multimodal wearable sensor data classification plays an important role in ubiquitous computing and has a wide range of applications in scenarios from healthcare to entertainment. However, most existing work in this field employs domain-specific approaches and is thus ineffective in complex sit- uations where multi-modality sensor data are col- lected. Moreover, the wearable sensor data are less informative than the conventional data such as texts or images. In this paper, to improve the adapt- ability of such classification methods across differ- ent application domains, we turn this classification task into a game and apply a deep reinforcement learning scheme to deal with complex situations dynamically. Additionally, we introduce a selective attention mechanism into the reinforcement learn- ing scheme to focus on the crucial dimensions of the data. This mechanism helps to capture extra information from the signal and thus it is able to significantly improve the discriminative power of the classifier. We carry out several experiments on three wearable sensor datasets and demonstrate the competitive performance of the proposed approach compared to several state-of-the-art baselines.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05493v2
PDF	http://arxiv.org/pdf/1804.05493v2.pdf
PWC	https://paperswithcode.com/paper/multi-modality-sensor-data-classification
Repo
Framework

Newton Methods for Convolutional Neural Networks


Title	Newton Methods for Convolutional Neural Networks
Authors	Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin
Abstract	Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feedforward neural networks. They do not investigate other types of networks such as Convolutional Neural Networks (CNN), which are more commonly used in deep-learning applications. One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks including function, gradient, and Jacobian evaluation, and Gauss-Newton matrix-vector products. These basic components are very important because with them further developments of Newton methods for CNN become possible. We show that an efficient MATLAB implementation can be done in just several hundred lines of code and demonstrate that the Newton method gives competitive test accuracy.
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06100v1
PDF	http://arxiv.org/pdf/1811.06100v1.pdf
PWC	https://paperswithcode.com/paper/newton-methods-for-convolutional-neural
Repo
Framework

EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction


Title	EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction
Authors	Sen Jia, Neil D. B. Bruce
Abstract	Saliency prediction can benefit from training that involves scene understanding that may be tangential to the central task; this may include understanding places, spatial layout, objects or involve different datasets and their bias. One can combine models, but to do this in a sophisticated manner can be complex, and also result in unwieldy networks or produce competing objectives that are hard to balance. In this paper, we propose a scalable system to leverage multiple powerful deep CNN models to better extract visual features for saliency prediction. Our design differs from previous studies in that the whole system is trained in an almost end-to-end piece-wise fashion. The encoder and decoder components are separately trained to deal with complexity tied to the computational paradigm and required space. Furthermore, the encoder can contain more than one CNN model to extract features, and models can have different architectures or be pre-trained on different datasets. This parallel design yields a better computational paradigm overcoming limits to the variety of information or inference that can be combined at the encoder stage towards deeper networks and a more powerful encoding. Our network can be easily expanded almost without any additional cost, and other pre-trained CNN models can be incorporated availing a wider range of visual knowledge. We denote our expandable multi-layer network as EML-NET and our method achieves the state-of-the-art results on the public saliency benchmarks, SALICON, MIT300 and CAT2000.
Tasks	Saliency Prediction, Scene Understanding
Published	2018-05-02
URL	http://arxiv.org/abs/1805.01047v2
PDF	http://arxiv.org/pdf/1805.01047v2.pdf
PWC	https://paperswithcode.com/paper/eml-netan-expandable-multi-layer-network-for
Repo
Framework

Fully Convolutional Adaptation Networks for Semantic Segmentation


Title	Fully Convolutional Adaptation Networks for Semantic Segmentation
Authors	Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei
Abstract	The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets. Nevertheless, collecting expert labeled datasets especially with pixel-level annotations is an extremely expensive process. An appealing alternative is to render synthetic data (e.g., computer games) and generate ground truth automatically. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. In this paper, we facilitate this issue from the perspectives of both visual appearance-level and representation-level domain adaptation. The former adapts source-domain images to appear as if drawn from the “style” in the target domain and the latter attempts to learn domain-invariant representations. Specifically, we present Fully Convolutional Adaptation Networks (FCAN), a novel deep architecture for semantic segmentation which combines Appearance Adaptation Networks (AAN) and Representation Adaptation Networks (RAN). AAN learns a transformation from one domain to the other in the pixel space and RAN is optimized in an adversarial learning manner to maximally fool the domain discriminator with the learnt source and target representations. Extensive experiments are conducted on the transfer from GTA5 (game videos) to Cityscapes (urban street scenes) on semantic segmentation and our proposal achieves superior results when comparing to state-of-the-art unsupervised adaptation techniques. More remarkably, we obtain a new record: mIoU of 47.5% on BDDS (drive-cam videos) in an unsupervised setting.
Tasks	Domain Adaptation, Semantic Segmentation
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08286v1
PDF	http://arxiv.org/pdf/1804.08286v1.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-adaptation-networks-for
Repo
Framework

Multi-Label Learning from Medical Plain Text with Convolutional Residual Models


Title	Multi-Label Learning from Medical Plain Text with Convolutional Residual Models
Authors	Xinyuan Zhang, Ricardo Henao, Zhe Gan, Yitong Li, Lawrence Carin
Abstract	Predicting diagnoses from Electronic Health Records (EHRs) is an important medical application of multi-label learning. We propose a convolutional residual model for multi-label classification from doctor notes in EHR data. A given patient may have multiple diagnoses, and therefore multi-label learning is required. We employ a Convolutional Neural Network (CNN) to encode plain text into a fixed-length sentence embedding vector. Since diagnoses are typically correlated, a deep residual network is employed on top of the CNN encoder, to capture label (diagnosis) dependencies and incorporate information directly from the encoded sentence vector. A real EHR dataset is considered, and we compare the proposed model with several well-known baselines, to predict diagnoses based on doctor notes. Experimental results demonstrate the superiority of the proposed convolutional residual model.
Tasks	Multi-Label Classification, Multi-Label Learning, Sentence Embedding
Published	2018-01-15
URL	http://arxiv.org/abs/1801.05062v2
PDF	http://arxiv.org/pdf/1801.05062v2.pdf
PWC	https://paperswithcode.com/paper/multi-label-learning-from-medical-plain-text
Repo
Framework

Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation


Title	Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation
Authors	Ye Wang, Jongmoo Choi, Yueru Chen, Siyang Li, Qin Huang, Kaitai Zhang, Ming-Sui Lee, C. -C. Jay Kuo
Abstract	Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this paper, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.
Tasks	Instance Segmentation, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07712v1
PDF	http://arxiv.org/pdf/1812.07712v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-video-object-segmentation-with
Repo
Framework

Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation


Title	Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation
Authors	Ye Wang, Jongmoo Choi, Yueru Chen, Qin Huang, Siyang Li, Ming-Sui Lee, C. -C. Jay Kuo
Abstract	One major technique debt in video object segmentation is to label the object masks for training instances. As a result, we propose to prepare inexpensive, yet high quality pseudo ground truth corrected with motion cue for video object segmentation training. Our method conducts semantic segmentation using instance segmentation networks and, then, selects the segmented object of interest as the pseudo ground truth based on the motion information. Afterwards, the pseudo ground truth is exploited to finetune the pretrained objectness network to facilitate object segmentation in the remaining frames of the video. We show that the pseudo ground truth could effectively improve the segmentation performance. This straightforward unsupervised video object segmentation method is more efficient than existing methods. Experimental results on DAVIS and FBMS show that the proposed method outperforms state-of-the-art unsupervised segmentation methods on various benchmark datasets. And the category-agnostic pseudo ground truth has great potential to extend to multiple arbitrary object tracking.
Tasks	Instance Segmentation, Object Tracking, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05206v1
PDF	http://arxiv.org/pdf/1812.05206v1.pdf
PWC	https://paperswithcode.com/paper/design-pseudo-ground-truth-with-motion-cue
Repo
Framework

A Survey and Critique of Multiagent Deep Reinforcement Learning


Title	A Survey and Critique of Multiagent Deep Reinforcement Learning
Authors	Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor
Abstract	Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.
Tasks
Published	2018-10-12
URL	https://arxiv.org/abs/1810.05587v3
PDF	https://arxiv.org/pdf/1810.05587v3.pdf
PWC	https://paperswithcode.com/paper/is-multiagent-deep-reinforcement-learning-the
Repo
Framework

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation


Title	Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Authors	Yilin Yang, Liang Huang, Mingbo Ma
Abstract	Beam search is widely used in neural machine translation, and usually improves translation quality compared to greedy search. It has been widely observed that, however, beam sizes larger than 5 hurt translation quality. We explain why this happens, and propose several methods to address this problem. Furthermore, we discuss the optimal stopping criteria for these methods. Results show that our hyperparameter-free methods outperform the widely-used hyperparameter-free heuristic of length normalization by +2.0 BLEU, and achieve the best results among all methods on Chinese-to-English translation.
Tasks	Machine Translation
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09582v3
PDF	http://arxiv.org/pdf/1808.09582v3.pdf
PWC	https://paperswithcode.com/paper/breaking-the-beam-search-curse-a-study-of-re
Repo
Framework


Title	Accurate and Robust Neural Networks for Security Related Applications Exampled by Face Morphing Attacks
Authors	Clemens Seibold, Wojciech Samek, Anna Hilsmann, Peter Eisert
Abstract	Artificial neural networks tend to learn only what they need for a task. A manipulation of the training data can counter this phenomenon. In this paper, we study the effect of different alterations of the training data, which limit the amount and position of information that is available for the decision making. We analyze the accuracy and robustness against semantic and black box attacks on the networks that were trained on different training data modifications for the particular example of morphing attacks. A morphing attack is an attack on a biometric facial recognition system where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image.
Tasks	Decision Making
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04265v1
PDF	http://arxiv.org/pdf/1806.04265v1.pdf
PWC	https://paperswithcode.com/paper/accurate-and-robust-neural-networks-for
Repo
Framework

Classification-Reconstruction Learning for Open-Set Recognition


Title	Classification-Reconstruction Learning for Open-Set Recognition
Authors	Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura
Abstract	Open-set classification is a problem of handling `unknown’ classes that are not contained in the training dataset, whereas traditional classifiers assume that only known classes appear in the test environment. Existing open-set classifiers rely on deep networks trained in a supervised manner on known classes in the training set; this causes specialization of learned representations to known classes and makes it hard to distinguish unknowns from knowns. In contrast, we train networks for joint classification and reconstruction of input data. This enhances the learned representation so as to preserve information useful for separating unknowns from knowns, as well as to discriminate classes of knowns. Our novel Classification-Reconstruction learning for Open-Set Recognition (CROSR) utilizes latent representations for reconstruction and enables robust unknown detection without harming the known-class classification accuracy. Extensive experiments reveal that the proposed method outperforms existing deep open-set classifiers in multiple standard datasets and is robust to diverse outliers. The code is available in https://nae-lab.org/~rei/research/crosr/. \|
Tasks	Open Set Learning
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04246v3
PDF	https://arxiv.org/pdf/1812.04246v3.pdf
PWC	https://paperswithcode.com/paper/classification-reconstruction-learning-for
Repo
Framework

Recurrent Residual Module for Fast Inference in Videos


Title	Recurrent Residual Module for Fast Inference in Videos
Authors	Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu
Abstract	Deep convolutional neural networks (CNNs) have made impressive progress in many video recognition tasks such as video pose estimation and video object detection. However, CNN inference on video is computationally expensive due to processing dense frames individually. In this work, we propose a framework called Recurrent Residual Module (RRM) to accelerate the CNN inference for video recognition tasks. This framework has a novel design of using the similarity of the intermediate feature maps of two consecutive frames, to largely reduce the redundant computation. One unique property of the proposed method compared to previous work is that feature maps of each frame are precisely computed. The experiments show that, while maintaining the similar recognition performance, our RRM yields averagely 2x acceleration on the commonly used CNNs such as AlexNet, ResNet, deep compression model (thus 8-12x faster than the original dense models using the efficient inference engine), and impressively 9x acceleration on some binary networks such as XNOR-Nets (thus 500x faster than the original model). We further verify the effectiveness of the RRM on speeding up CNNs for video pose estimation and video object detection.
Tasks	Object Detection, Pose Estimation, Video Object Detection, Video Recognition
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09723v1
PDF	http://arxiv.org/pdf/1802.09723v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-residual-module-for-fast-inference
Repo
Framework

Large-Scale Unsupervised Deep Representation Learning for Brain Structure


Title	Large-Scale Unsupervised Deep Representation Learning for Brain Structure
Authors	Ayush Jaiswal, Dong Guo, Cauligi S. Raghavendra, Paul Thompson
Abstract	Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks.
Tasks	Representation Learning
Published	2018-05-02
URL	http://arxiv.org/abs/1805.01049v1
PDF	http://arxiv.org/pdf/1805.01049v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-unsupervised-deep-representation
Repo
Framework

Maximizing Expected Impact in an Agent Reputation Network – Technical Report


Title	Maximizing Expected Impact in an Agent Reputation Network – Technical Report
Authors	Gavin Rens, Abhaya Nayak, Thomas Meyer
Abstract	Many multi-agent systems (MASs) are situated in stochastic environments. Some such systems that are based on the partially observable Markov decision process (POMDP) do not take the benevolence of other agents for granted. We propose a new POMDP-based framework which is general enough for the specification of a variety of stochastic MAS domains involving the impact of agents on each other’s reputations. A unique feature of this framework is that actions are specified as either undirected (regular) or directed (towards a particular agent), and a new directed transition function is provided for modeling the effects of reputation in interactions. Assuming that an agent must maintain a good enough reputation to survive in the network, a planning algorithm is developed for an agent to select optimal actions in stochastic MASs. Preliminary evaluation is provided via an example specification and by determining the algorithm’s complexity.
Tasks
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05230v1
PDF	http://arxiv.org/pdf/1805.05230v1.pdf
PWC	https://paperswithcode.com/paper/maximizing-expected-impact-in-an-agent
Repo
Framework