Paper Group ANR 30
Robust Text Classifier on Test-Time Budgets. Multi-modality Sensor Data Classification with Selective Attention. Newton Methods for Convolutional Neural Networks. EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction. Fully Convolutional Adaptation Networks for Semantic Segmentation. Multi-Label Learning from Medical Plain Text with Con …
Robust Text Classifier on Test-Time Budgets
Title | Robust Text Classifier on Test-Time Budgets |
Authors | Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang, Venkatesh Saligrama |
Abstract | We propose a generic and interpretable learning framework for building robust text classification model that achieves accuracy comparable to full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and passes them to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance. |
Tasks | Text Classification |
Published | 2018-08-24 |
URL | https://arxiv.org/abs/1808.08270v5 |
https://arxiv.org/pdf/1808.08270v5.pdf | |
PWC | https://paperswithcode.com/paper/building-a-robust-text-classifier-on-a-test |
Repo | |
Framework | |
Multi-modality Sensor Data Classification with Selective Attention
Title | Multi-modality Sensor Data Classification with Selective Attention |
Authors | Xiang Zhang, Lina Yao, Chaoran Huang, Sen Wang, Mingkui Tan, Guodong Long, Can Wang |
Abstract | Multimodal wearable sensor data classification plays an important role in ubiquitous computing and has a wide range of applications in scenarios from healthcare to entertainment. However, most existing work in this field employs domain-specific approaches and is thus ineffective in complex sit- uations where multi-modality sensor data are col- lected. Moreover, the wearable sensor data are less informative than the conventional data such as texts or images. In this paper, to improve the adapt- ability of such classification methods across differ- ent application domains, we turn this classification task into a game and apply a deep reinforcement learning scheme to deal with complex situations dynamically. Additionally, we introduce a selective attention mechanism into the reinforcement learn- ing scheme to focus on the crucial dimensions of the data. This mechanism helps to capture extra information from the signal and thus it is able to significantly improve the discriminative power of the classifier. We carry out several experiments on three wearable sensor datasets and demonstrate the competitive performance of the proposed approach compared to several state-of-the-art baselines. |
Tasks | |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05493v2 |
http://arxiv.org/pdf/1804.05493v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-modality-sensor-data-classification |
Repo | |
Framework | |
Newton Methods for Convolutional Neural Networks
Title | Newton Methods for Convolutional Neural Networks |
Authors | Chien-Chih Wang, Kent Loong Tan, Chih-Jen Lin |
Abstract | Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feedforward neural networks. They do not investigate other types of networks such as Convolutional Neural Networks (CNN), which are more commonly used in deep-learning applications. One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks including function, gradient, and Jacobian evaluation, and Gauss-Newton matrix-vector products. These basic components are very important because with them further developments of Newton methods for CNN become possible. We show that an efficient MATLAB implementation can be done in just several hundred lines of code and demonstrate that the Newton method gives competitive test accuracy. |
Tasks | |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06100v1 |
http://arxiv.org/pdf/1811.06100v1.pdf | |
PWC | https://paperswithcode.com/paper/newton-methods-for-convolutional-neural |
Repo | |
Framework | |
EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction
Title | EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction |
Authors | Sen Jia, Neil D. B. Bruce |
Abstract | Saliency prediction can benefit from training that involves scene understanding that may be tangential to the central task; this may include understanding places, spatial layout, objects or involve different datasets and their bias. One can combine models, but to do this in a sophisticated manner can be complex, and also result in unwieldy networks or produce competing objectives that are hard to balance. In this paper, we propose a scalable system to leverage multiple powerful deep CNN models to better extract visual features for saliency prediction. Our design differs from previous studies in that the whole system is trained in an almost end-to-end piece-wise fashion. The encoder and decoder components are separately trained to deal with complexity tied to the computational paradigm and required space. Furthermore, the encoder can contain more than one CNN model to extract features, and models can have different architectures or be pre-trained on different datasets. This parallel design yields a better computational paradigm overcoming limits to the variety of information or inference that can be combined at the encoder stage towards deeper networks and a more powerful encoding. Our network can be easily expanded almost without any additional cost, and other pre-trained CNN models can be incorporated availing a wider range of visual knowledge. We denote our expandable multi-layer network as EML-NET and our method achieves the state-of-the-art results on the public saliency benchmarks, SALICON, MIT300 and CAT2000. |
Tasks | Saliency Prediction, Scene Understanding |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.01047v2 |
http://arxiv.org/pdf/1805.01047v2.pdf | |
PWC | https://paperswithcode.com/paper/eml-netan-expandable-multi-layer-network-for |
Repo | |
Framework | |
Fully Convolutional Adaptation Networks for Semantic Segmentation
Title | Fully Convolutional Adaptation Networks for Semantic Segmentation |
Authors | Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei |
Abstract | The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets. Nevertheless, collecting expert labeled datasets especially with pixel-level annotations is an extremely expensive process. An appealing alternative is to render synthetic data (e.g., computer games) and generate ground truth automatically. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. In this paper, we facilitate this issue from the perspectives of both visual appearance-level and representation-level domain adaptation. The former adapts source-domain images to appear as if drawn from the “style” in the target domain and the latter attempts to learn domain-invariant representations. Specifically, we present Fully Convolutional Adaptation Networks (FCAN), a novel deep architecture for semantic segmentation which combines Appearance Adaptation Networks (AAN) and Representation Adaptation Networks (RAN). AAN learns a transformation from one domain to the other in the pixel space and RAN is optimized in an adversarial learning manner to maximally fool the domain discriminator with the learnt source and target representations. Extensive experiments are conducted on the transfer from GTA5 (game videos) to Cityscapes (urban street scenes) on semantic segmentation and our proposal achieves superior results when comparing to state-of-the-art unsupervised adaptation techniques. More remarkably, we obtain a new record: mIoU of 47.5% on BDDS (drive-cam videos) in an unsupervised setting. |
Tasks | Domain Adaptation, Semantic Segmentation |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08286v1 |
http://arxiv.org/pdf/1804.08286v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-adaptation-networks-for |
Repo | |
Framework | |
Multi-Label Learning from Medical Plain Text with Convolutional Residual Models
Title | Multi-Label Learning from Medical Plain Text with Convolutional Residual Models |
Authors | Xinyuan Zhang, Ricardo Henao, Zhe Gan, Yitong Li, Lawrence Carin |
Abstract | Predicting diagnoses from Electronic Health Records (EHRs) is an important medical application of multi-label learning. We propose a convolutional residual model for multi-label classification from doctor notes in EHR data. A given patient may have multiple diagnoses, and therefore multi-label learning is required. We employ a Convolutional Neural Network (CNN) to encode plain text into a fixed-length sentence embedding vector. Since diagnoses are typically correlated, a deep residual network is employed on top of the CNN encoder, to capture label (diagnosis) dependencies and incorporate information directly from the encoded sentence vector. A real EHR dataset is considered, and we compare the proposed model with several well-known baselines, to predict diagnoses based on doctor notes. Experimental results demonstrate the superiority of the proposed convolutional residual model. |
Tasks | Multi-Label Classification, Multi-Label Learning, Sentence Embedding |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.05062v2 |
http://arxiv.org/pdf/1801.05062v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-learning-from-medical-plain-text |
Repo | |
Framework | |
Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation
Title | Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation |
Authors | Ye Wang, Jongmoo Choi, Yueru Chen, Siyang Li, Qin Huang, Kaitai Zhang, Ming-Sui Lee, C. -C. Jay Kuo |
Abstract | Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this paper, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets. |
Tasks | Instance Segmentation, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.07712v1 |
http://arxiv.org/pdf/1812.07712v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-video-object-segmentation-with |
Repo | |
Framework | |
Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation
Title | Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation |
Authors | Ye Wang, Jongmoo Choi, Yueru Chen, Qin Huang, Siyang Li, Ming-Sui Lee, C. -C. Jay Kuo |
Abstract | One major technique debt in video object segmentation is to label the object masks for training instances. As a result, we propose to prepare inexpensive, yet high quality pseudo ground truth corrected with motion cue for video object segmentation training. Our method conducts semantic segmentation using instance segmentation networks and, then, selects the segmented object of interest as the pseudo ground truth based on the motion information. Afterwards, the pseudo ground truth is exploited to finetune the pretrained objectness network to facilitate object segmentation in the remaining frames of the video. We show that the pseudo ground truth could effectively improve the segmentation performance. This straightforward unsupervised video object segmentation method is more efficient than existing methods. Experimental results on DAVIS and FBMS show that the proposed method outperforms state-of-the-art unsupervised segmentation methods on various benchmark datasets. And the category-agnostic pseudo ground truth has great potential to extend to multiple arbitrary object tracking. |
Tasks | Instance Segmentation, Object Tracking, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05206v1 |
http://arxiv.org/pdf/1812.05206v1.pdf | |
PWC | https://paperswithcode.com/paper/design-pseudo-ground-truth-with-motion-cue |
Repo | |
Framework | |
A Survey and Critique of Multiagent Deep Reinforcement Learning
Title | A Survey and Critique of Multiagent Deep Reinforcement Learning |
Authors | Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor |
Abstract | Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community. |
Tasks | |
Published | 2018-10-12 |
URL | https://arxiv.org/abs/1810.05587v3 |
https://arxiv.org/pdf/1810.05587v3.pdf | |
PWC | https://paperswithcode.com/paper/is-multiagent-deep-reinforcement-learning-the |
Repo | |
Framework | |
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Title | Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation |
Authors | Yilin Yang, Liang Huang, Mingbo Ma |
Abstract | Beam search is widely used in neural machine translation, and usually improves translation quality compared to greedy search. It has been widely observed that, however, beam sizes larger than 5 hurt translation quality. We explain why this happens, and propose several methods to address this problem. Furthermore, we discuss the optimal stopping criteria for these methods. Results show that our hyperparameter-free methods outperform the widely-used hyperparameter-free heuristic of length normalization by +2.0 BLEU, and achieve the best results among all methods on Chinese-to-English translation. |
Tasks | Machine Translation |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09582v3 |
http://arxiv.org/pdf/1808.09582v3.pdf | |
PWC | https://paperswithcode.com/paper/breaking-the-beam-search-curse-a-study-of-re |
Repo | |
Framework | |
Accurate and Robust Neural Networks for Security Related Applications Exampled by Face Morphing Attacks
Title | Accurate and Robust Neural Networks for Security Related Applications Exampled by Face Morphing Attacks |
Authors | Clemens Seibold, Wojciech Samek, Anna Hilsmann, Peter Eisert |
Abstract | Artificial neural networks tend to learn only what they need for a task. A manipulation of the training data can counter this phenomenon. In this paper, we study the effect of different alterations of the training data, which limit the amount and position of information that is available for the decision making. We analyze the accuracy and robustness against semantic and black box attacks on the networks that were trained on different training data modifications for the particular example of morphing attacks. A morphing attack is an attack on a biometric facial recognition system where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image. |
Tasks | Decision Making |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04265v1 |
http://arxiv.org/pdf/1806.04265v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-and-robust-neural-networks-for |
Repo | |
Framework | |
Classification-Reconstruction Learning for Open-Set Recognition
Title | Classification-Reconstruction Learning for Open-Set Recognition |
Authors | Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura |
Abstract | Open-set classification is a problem of handling `unknown’ classes that are not contained in the training dataset, whereas traditional classifiers assume that only known classes appear in the test environment. Existing open-set classifiers rely on deep networks trained in a supervised manner on known classes in the training set; this causes specialization of learned representations to known classes and makes it hard to distinguish unknowns from knowns. In contrast, we train networks for joint classification and reconstruction of input data. This enhances the learned representation so as to preserve information useful for separating unknowns from knowns, as well as to discriminate classes of knowns. Our novel Classification-Reconstruction learning for Open-Set Recognition (CROSR) utilizes latent representations for reconstruction and enables robust unknown detection without harming the known-class classification accuracy. Extensive experiments reveal that the proposed method outperforms existing deep open-set classifiers in multiple standard datasets and is robust to diverse outliers. The code is available in https://nae-lab.org/~rei/research/crosr/. | |
Tasks | Open Set Learning |
Published | 2018-12-11 |
URL | https://arxiv.org/abs/1812.04246v3 |
https://arxiv.org/pdf/1812.04246v3.pdf | |
PWC | https://paperswithcode.com/paper/classification-reconstruction-learning-for |
Repo | |
Framework | |
Recurrent Residual Module for Fast Inference in Videos
Title | Recurrent Residual Module for Fast Inference in Videos |
Authors | Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu |
Abstract | Deep convolutional neural networks (CNNs) have made impressive progress in many video recognition tasks such as video pose estimation and video object detection. However, CNN inference on video is computationally expensive due to processing dense frames individually. In this work, we propose a framework called Recurrent Residual Module (RRM) to accelerate the CNN inference for video recognition tasks. This framework has a novel design of using the similarity of the intermediate feature maps of two consecutive frames, to largely reduce the redundant computation. One unique property of the proposed method compared to previous work is that feature maps of each frame are precisely computed. The experiments show that, while maintaining the similar recognition performance, our RRM yields averagely 2x acceleration on the commonly used CNNs such as AlexNet, ResNet, deep compression model (thus 8-12x faster than the original dense models using the efficient inference engine), and impressively 9x acceleration on some binary networks such as XNOR-Nets (thus 500x faster than the original model). We further verify the effectiveness of the RRM on speeding up CNNs for video pose estimation and video object detection. |
Tasks | Object Detection, Pose Estimation, Video Object Detection, Video Recognition |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09723v1 |
http://arxiv.org/pdf/1802.09723v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-residual-module-for-fast-inference |
Repo | |
Framework | |
Large-Scale Unsupervised Deep Representation Learning for Brain Structure
Title | Large-Scale Unsupervised Deep Representation Learning for Brain Structure |
Authors | Ayush Jaiswal, Dong Guo, Cauligi S. Raghavendra, Paul Thompson |
Abstract | Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks. |
Tasks | Representation Learning |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.01049v1 |
http://arxiv.org/pdf/1805.01049v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-unsupervised-deep-representation |
Repo | |
Framework | |
Maximizing Expected Impact in an Agent Reputation Network – Technical Report
Title | Maximizing Expected Impact in an Agent Reputation Network – Technical Report |
Authors | Gavin Rens, Abhaya Nayak, Thomas Meyer |
Abstract | Many multi-agent systems (MASs) are situated in stochastic environments. Some such systems that are based on the partially observable Markov decision process (POMDP) do not take the benevolence of other agents for granted. We propose a new POMDP-based framework which is general enough for the specification of a variety of stochastic MAS domains involving the impact of agents on each other’s reputations. A unique feature of this framework is that actions are specified as either undirected (regular) or directed (towards a particular agent), and a new directed transition function is provided for modeling the effects of reputation in interactions. Assuming that an agent must maintain a good enough reputation to survive in the network, a planning algorithm is developed for an agent to select optimal actions in stochastic MASs. Preliminary evaluation is provided via an example specification and by determining the algorithm’s complexity. |
Tasks | |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05230v1 |
http://arxiv.org/pdf/1805.05230v1.pdf | |
PWC | https://paperswithcode.com/paper/maximizing-expected-impact-in-an-agent |
Repo | |
Framework | |