Paper Group ANR 663
Hierarchical Representation Learning for Kinship Verification. Learning with Non-Convex Truncated Losses by SGD. Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers. Analysis of Invariance and Robustness via Invertibility of ReLU-Networks. Testing the Generalization Power of Neural Network Models Across NLI Benchmarks. F …
Hierarchical Representation Learning for Kinship Verification
Title | Hierarchical Representation Learning for Kinship Verification |
Authors | Naman Kohli, Mayank Vatsa, Richa Singh, Afzel Noore, Angshul Majumdar |
Abstract | Kinship verification has a number of applications such as organizing large collections of images and recognizing resemblances among humans. In this research, first, a human study is conducted to understand the capabilities of human mind and to identify the discriminatory areas of a face that facilitate kinship-cues. Utilizing the information obtained from the human study, a hierarchical Kinship Verification via Representation Learning (KVRL) framework is utilized to learn the representation of different face regions in an unsupervised manner. We propose a novel approach for feature representation termed as filtered contractive deep belief networks (fcDBN). The proposed feature representation encodes relational information present in images using filters and contractive regularization penalty. A compact representation of facial images of kin is extracted as an output from the learned model and a multi-layer neural network is utilized to verify the kin accurately. A new WVU Kinship Database is created which consists of multiple images per subject to facilitate kinship verification. The results show that the proposed deep learning framework (KVRL-fcDBN) yields stateof-the-art kinship verification accuracy on the WVU Kinship database and on four existing benchmark datasets. Further, kinship information is used as a soft biometric modality to boost the performance of face verification via product of likelihood ratio and support vector machine based approaches. Using the proposed KVRL-fcDBN framework, an improvement of over 20% is observed in the performance of face verification. |
Tasks | Face Verification, Representation Learning |
Published | 2018-05-27 |
URL | http://arxiv.org/abs/1805.10557v1 |
http://arxiv.org/pdf/1805.10557v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-representation-learning-for |
Repo | |
Framework | |
Learning with Non-Convex Truncated Losses by SGD
Title | Learning with Non-Convex Truncated Losses by SGD |
Authors | Yi Xu, Shenghuo Zhu, Sen Yang, Chi Zhang, Rong Jin, Tianbao Yang |
Abstract | Learning with a {\it convex loss} function has been a dominating paradigm for many years. It remains an interesting question how non-convex loss functions help improve the generalization of learning with broad applicability. In this paper, we study a family of objective functions formed by truncating traditional loss functions, which is applicable to both shallow learning and deep learning. Truncating loss functions has potential to be less vulnerable and more robust to large noise in observations that could be adversarial. More importantly, it is a generic technique without assuming the knowledge of noise distribution. To justify non-convex learning with truncated losses, we establish excess risk bounds of empirical risk minimization based on truncated losses for heavy-tailed output, and statistical error of an approximate stationary point found by stochastic gradient descent (SGD) method. Our experiments for shallow and deep learning for regression with outliers, corrupted data and heavy-tailed noise further justify the proposed method. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07880v1 |
http://arxiv.org/pdf/1805.07880v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-with-non-convex-truncated-losses-by |
Repo | |
Framework | |
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers
Title | Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers |
Authors | Emre Ozfatura, Deniz Gunduz, Sennur Ulukus |
Abstract | Distributed gradient descent (DGD) is an efficient way of implementing gradient descent (GD), especially for large data sets, by dividing the computation tasks into smaller subtasks and assigning to different computing servers (CSs) to be executed in parallel. In standard parallel execution, per-iteration waiting time is limited by the execution time of the straggling servers. Coded DGD techniques have been introduced recently, which can tolerate straggling servers via assigning redundant computation tasks to the CSs. In most of the existing DGD schemes, either with coded computation or coded communication, the non-straggling CSs transmit one message per iteration once they complete all their assigned computation tasks. However, although the straggling servers cannot complete all their assigned tasks, they are often able to complete a certain portion of them. In this paper, we allow multiple transmissions from each CS at each iteration in order to make sure a maximum number of completed computations can be reported to the aggregating server (AS), including the straggling servers. We numerically show that the average completion time per iteration can be reduced significantly by slightly increasing the communication load per server. |
Tasks | |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02240v3 |
http://arxiv.org/pdf/1808.02240v3.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-distributed-gradient-descent-by |
Repo | |
Framework | |
Analysis of Invariance and Robustness via Invertibility of ReLU-Networks
Title | Analysis of Invariance and Robustness via Invertibility of ReLU-Networks |
Authors | Jens Behrmann, Sören Dittmer, Pascal Fernsel, Peter Maaß |
Abstract | Studying the invertibility of deep neural networks (DNNs) provides a principled approach to better understand the behavior of these powerful models. Despite being a promising diagnostic tool, a consistent theory on their invertibility is still lacking. We derive a theoretically motivated approach to explore the preimages of ReLU-layers and mechanisms affecting the stability of the inverse. Using the developed theory, we numerically show how this approach uncovers characteristic properties of the network. |
Tasks | |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09730v2 |
http://arxiv.org/pdf/1806.09730v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-invariance-and-robustness-via |
Repo | |
Framework | |
Testing the Generalization Power of Neural Network Models Across NLI Benchmarks
Title | Testing the Generalization Power of Neural Network Models Across NLI Benchmarks |
Authors | Aarne Talman, Stergios Chatzikyriakidis |
Abstract | Neural network models have been very successful in natural language inference, with the best models reaching 90% accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar. We train six high performing neural network models on different datasets and show that each one of these has problems of generalizing when we replace the original test set with a test set taken from another corpus designed for the same task. In light of these results, we argue that most of the current neural network models are not able to generalize well in the task of natural language inference. We find that using large pre-trained language models helps with transfer learning when the datasets are similar enough. Our results also highlight that the current NLI datasets do not cover the different nuances of inference extensively enough. |
Tasks | Natural Language Inference, Transfer Learning |
Published | 2018-10-23 |
URL | https://arxiv.org/abs/1810.09774v3 |
https://arxiv.org/pdf/1810.09774v3.pdf | |
PWC | https://paperswithcode.com/paper/testing-the-generalization-power-of-neural |
Repo | |
Framework | |
From direct tagging to Tagging with sentences compression
Title | From direct tagging to Tagging with sentences compression |
Authors | Peihui Chen |
Abstract | In essence, the two tagging methods (direct tagging and tagging with sentences compression) are to tag the information we need by using regular expression which basing on the inherent language patterns of the natural language. Though it has many advantages in extracting regular data, Direct tagging is not applicable to some situations. if the data we need extract is not regular and its surrounding words are regular is relatively regular, then we can use information compression to cut the information we do not need before we tagging the data we need. In this way we can increase the precision of the data while not undermine the recall of the data. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02741v1 |
http://arxiv.org/pdf/1810.02741v1.pdf | |
PWC | https://paperswithcode.com/paper/from-direct-tagging-to-tagging-with-sentences |
Repo | |
Framework | |
Secure Mobile Crowdsensing with Deep Learning
Title | Secure Mobile Crowdsensing with Deep Learning |
Authors | Liang Xiao, Donghua Jiang, Dongjin Xu, Ning An |
Abstract | In order to stimulate secure sensing for Internet of Things (IoT) applications such as healthcare and traffic monitoring, mobile crowdsensing (MCS) systems have to address security threats, such as jamming, spoofing and faked sensing attacks, during both the sensing and the information exchange processes in large-scale dynamic and heterogenous networks. In this article, we investigate secure mobile crowdsensing and present how to use deep learning (DL) methods such as stacked autoencoder (SAE), deep neural network (DNN), and convolutional neural network (CNN) to improve the MCS security approaches including authentication, privacy protection, faked sensing countermeasures, intrusion detection and anti-jamming transmissions in MCS. We discuss the performance gain of these DL-based approaches compared with traditional security schemes and identify the challenges that need to be addressed to implement them in practical MCS systems. |
Tasks | Intrusion Detection |
Published | 2018-01-23 |
URL | http://arxiv.org/abs/1801.07379v1 |
http://arxiv.org/pdf/1801.07379v1.pdf | |
PWC | https://paperswithcode.com/paper/secure-mobile-crowdsensing-with-deep-learning |
Repo | |
Framework | |
Co-occurrence matrix analysis-based semi-supervised training for object detection
Title | Co-occurrence matrix analysis-based semi-supervised training for object detection |
Authors | Min-Kook Choi, Jaehyeong Park, Jihun Jung, Heechul Jung, Jin-Hee Lee, Woong Jae Won, Woo Young Jung, Jincheol Kim, Soon Kwon |
Abstract | One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object detection, which makes use of automatic labeling of un-annotated data by applying a network previously trained from an annotated dataset. Because an inferred label by the trained network is dependent on the learned parameters, it is often meaningless for re-training the network. To transfer a valuable inferred label to the unlabeled data, we propose a re-alignment method based on co-occurrence matrix analysis that takes into account one-hot-vector encoding of the estimated label and the correlation between the objects in the image. We used an MS-COCO detection dataset to verify the performance of the proposed SSL method and deformable neural networks (D-ConvNets) as an object detector for basic training. The performance of the existing state-of-the-art detectors (DConvNets, YOLO v2, and single shot multi-box detector (SSD)) can be improved by the proposed SSL method without using the additional model parameter or modifying the network architecture. |
Tasks | Object Detection, Object Recognition, Semantic Segmentation |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.06964v1 |
http://arxiv.org/pdf/1802.06964v1.pdf | |
PWC | https://paperswithcode.com/paper/co-occurrence-matrix-analysis-based-semi |
Repo | |
Framework | |
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Title | Importance Sampling Policy Evaluation with an Estimated Behavior Policy |
Authors | Josiah P. Hanna, Scott Niekum, Peter Stone |
Abstract | We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a technique for off-policy evaluation that re-weights off-policy returns to account for differences in the likelihood of the returns between the two policies. In this paper, we study importance sampling with an estimated behavior policy where the behavior policy estimate comes from the same set of data used to compute the importance sampling estimate. We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set. Intuitively, estimating the behavior policy in this way corrects for error due to sampling in the action-space. Our empirical results also extend to other popular variants of importance sampling and show that estimating a non-Markovian behavior policy can further lower large-sample mean squared error even when the true behavior policy is Markovian. |
Tasks | |
Published | 2018-06-04 |
URL | https://arxiv.org/abs/1806.01347v3 |
https://arxiv.org/pdf/1806.01347v3.pdf | |
PWC | https://paperswithcode.com/paper/importance-sampling-policy-evaluation-with-an |
Repo | |
Framework | |
DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain
Title | DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain |
Authors | Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. Rivas, Carlos D. Bustamante, James Zou |
Abstract | Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multi-task LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal pre-processing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10722v2 |
http://arxiv.org/pdf/1806.10722v2.pdf | |
PWC | https://paperswithcode.com/paper/deeptag-inferring-all-cause-diagnoses-from |
Repo | |
Framework | |
Dynamic Control of Explore/Exploit Trade-Off In Bayesian Optimization
Title | Dynamic Control of Explore/Exploit Trade-Off In Bayesian Optimization |
Authors | Dipti Jasrasaria, Edward O. Pyzer-Knapp |
Abstract | Bayesian optimization offers the possibility of optimizing black-box operations not accessible through traditional techniques. The success of Bayesian optimization methods such as Expected Improvement (EI) are significantly affected by the degree of trade-off between exploration and exploitation. Too much exploration can lead to inefficient optimization protocols, whilst too much exploitation leaves the protocol open to strong initial biases, and a high chance of getting stuck in a local minimum. Typically, a constant margin is used to control this trade-off, which results in yet another hyper-parameter to be optimized. We propose contextual improvement as a simple, yet effective heuristic to counter this - achieving a one-shot optimization strategy. Our proposed heuristic can be swiftly calculated and improves both the speed and robustness of discovery of optimal solutions. We demonstrate its effectiveness on both synthetic and real world problems and explore the unaccounted for uncertainty in the pre-determination of search hyperparameters controlling explore-exploit trade-off. |
Tasks | |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01279v1 |
http://arxiv.org/pdf/1807.01279v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-control-of-exploreexploit-trade-off |
Repo | |
Framework | |
Dual-Primal Graph Convolutional Networks
Title | Dual-Primal Graph Convolutional Networks |
Authors | Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, Michael M. Bronstein |
Abstract | In recent years, there has been a surge of interest in developing deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (GAT) model. We provide extensive experimental validation showing state-of-the-art results on a variety of tasks tested on established graph benchmarks, including CORA and Citeseer citation networks as well as MovieLens, Flixter, Douban and Yahoo Music graph-guided recommender systems. |
Tasks | Recommendation Systems |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00770v1 |
http://arxiv.org/pdf/1806.00770v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-primal-graph-convolutional-networks |
Repo | |
Framework | |
Learning Comment Generation by Leveraging User-Generated Data
Title | Learning Comment Generation by Leveraging User-Generated Data |
Authors | Zhaojiang Lin, Genta Indra Winata, Pascale Fung |
Abstract | Existing models on open-domain comment generation are difficult to train, and they produce repetitive and uninteresting responses. The problem is due to multiple and contradictory responses from a single article, and by the rigidity of retrieval methods. To solve this problem, we propose a combined approach to retrieval and generation methods. We propose an attentive scorer to retrieve informative and relevant comments by leveraging user-generated data. Then, we use such comments, together with the article, as input for a sequence-to-sequence model with copy mechanism. We show the robustness of our model and how it can alleviate the aforementioned issue by using a large scale comment generation dataset. The result shows that the proposed generative model significantly outperforms strong baseline such as Seq2Seq with attention and Information Retrieval models by around 27 and 30 BLEU-1 points respectively. |
Tasks | Information Retrieval |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12264v2 |
http://arxiv.org/pdf/1810.12264v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-comment-generation-by-leveraging |
Repo | |
Framework | |
Global Navigation Using Predictable and Slow Feature Analysis in Multiroom Environments, Path Planning and Other Control Tasks
Title | Global Navigation Using Predictable and Slow Feature Analysis in Multiroom Environments, Path Planning and Other Control Tasks |
Authors | Stefan Richthofer, Laurenz Wiskott |
Abstract | Extended Predictable Feature Analysis (PFAx) [Richthofer and Wiskott, 2017] is an extension of PFA [Richthofer and Wiskott, 2015] that allows generating a goal-directed control signal of an agent whose dynamics has previously been learned during a training phase in an unsupervised manner. PFAx hardly requires assumptions or prior knowledge of the agent’s sensor or control mechanics, or of the environment. It selects features from a high-dimensional input by intrinsic predictability and organizes them into a reasonably low-dimensional model. While PFA obtains a well predictable model, PFAx yields a model ideally suited for manipulations with predictable outcome. This allows for goal-directed manipulation of an agent and thus for local navigation, i.e. for reaching states where intermediate actions can be chosen by a permanent descent of distance to the goal. The approach is limited when it comes to global navigation, e.g. involving obstacles or multiple rooms. In this article, we extend theoretical results from [Sprekeler and Wiskott, 2008], enabling PFAx to perform stable global navigation. So far, the most widely exploited characteristic of Slow Feature Analysis (SFA) was that slowness yields invariances. We focus on another fundamental characteristics of slow signals: They tend to yield monotonicity and one significant property of monotonicity is that local optimization is sufficient to find a global optimum. We present an SFA-based algorithm that structures an environment such that navigation tasks hierarchically decompose into subgoals. Each of these can be efficiently achieved by PFAx, yielding an overall global solution of the task. The algorithm needs to explore and process an environment only once and can then perform all sorts of navigation tasks efficiently. We support this algorithm by mathematical theory and apply it to different problems. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08565v1 |
http://arxiv.org/pdf/1805.08565v1.pdf | |
PWC | https://paperswithcode.com/paper/global-navigation-using-predictable-and-slow |
Repo | |
Framework | |
Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer
Title | Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer |
Authors | Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, Alec Jacobson |
Abstract | Many machine learning image classifiers are vulnerable to adversarial attacks, inputs with perturbations designed to intentionally trigger misclassification. Current adversarial methods directly alter pixel colors and evaluate against pixel norm-balls: pixel perturbations smaller than a specified magnitude, according to a measurement norm. This evaluation, however, has limited practical utility since perturbations in the pixel space do not correspond to underlying real-world phenomena of image formation that lead to them and has no security motivation attached. Pixels in natural images are measurements of light that has interacted with the geometry of a physical scene. As such, we propose the direct perturbation of physical parameters that underly image formation: lighting and geometry. As such, we propose a novel evaluation measure, parametric norm-balls, by directly perturbing physical parameters that underly image formation. One enabling contribution we present is a physically-based differentiable renderer that allows us to propagate pixel gradients to the parametric space of lighting and geometry. Our approach enables physically-based adversarial attacks, and our differentiable renderer leverages models from the interactive rendering literature to balance the performance and accuracy trade-offs necessary for a memory-efficient and scalable adversarial data augmentation workflow. |
Tasks | Data Augmentation |
Published | 2018-08-08 |
URL | http://arxiv.org/abs/1808.02651v2 |
http://arxiv.org/pdf/1808.02651v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-pixel-norm-balls-parametric |
Repo | |
Framework | |