Paper Group ANR 990
Fairness GAN. Unsupervised Features for Facial Expression Intensity Estimation over Time. Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training. A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting. Data Summarization at Scale: A Two-Stage Submodular Approach. A Storm in an IoT Cup: The Emergen …
Fairness GAN
Title | Fairness GAN |
Authors | Prasanna Sattigeri, Samuel C. Hoffman, Vijil Chenthamarakshan, Kush R. Varshney |
Abstract | In this paper, we introduce the Fairness GAN, an approach for generating a dataset that is plausibly similar to a given multimedia dataset, but is more fair with respect to protected attributes in allocative decision making. We propose a novel auxiliary classifier GAN that strives for demographic parity or equality of opportunity and show empirical results on several datasets, including the CelebFaces Attributes (CelebA) dataset, the Quick, Draw!\ dataset, and a dataset of soccer player images and the offenses they were called for. The proposed formulation is well-suited to absorbing unlabeled data; we leverage this to augment the soccer dataset with the much larger CelebA dataset. The methodology tends to improve demographic parity and equality of opportunity while generating plausible images. |
Tasks | Decision Making |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09910v1 |
http://arxiv.org/pdf/1805.09910v1.pdf | |
PWC | https://paperswithcode.com/paper/fairness-gan |
Repo | |
Framework | |
Unsupervised Features for Facial Expression Intensity Estimation over Time
Title | Unsupervised Features for Facial Expression Intensity Estimation over Time |
Authors | Maren Awiszus, Stella Graßhof, Felix Kuhnke, Jörn Ostermann |
Abstract | The diversity of facial shapes and motions among persons is one of the greatest challenges for automatic analysis of facial expressions. In this paper, we propose a feature describing expression intensity over time, while being invariant to person and the type of performed expression. Our feature is a weighted combination of the dynamics of multiple points adapted to the overall expression trajectory. We evaluate our method on several tasks all related to temporal analysis of facial expression. The proposed feature is compared to a state-of-the-art method for expression intensity estimation, which it outperforms. We use our proposed feature to temporally align multiple sequences of recorded 3D facial expressions. Furthermore, we show how our feature can be used to reveal person-specific differences in performances of facial expressions. Additionally, we apply our feature to identify the local changes in face video sequences based on action unit labels. For all the experiments our feature proves to be robust against noise and outliers, making it applicable to a variety of applications for analysis of facial movements. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00780v2 |
http://arxiv.org/pdf/1805.00780v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-features-for-facial-expression |
Repo | |
Framework | |
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Title | Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training |
Authors | Saurabh Adya, Vinay Palakkode, Oncel Tuzel |
Abstract | Nonlinear conjugate gradient (NLCG) based optimizers have shown superior loss convergence properties compared to gradient descent based optimizers for traditional optimization problems. However, in Deep Neural Network (DNN) training, the dominant optimization algorithm of choice is still Stochastic Gradient Descent (SGD) and its variants. In this work, we propose and evaluate the stochastic preconditioned nonlinear conjugate gradient algorithm for large scale DNN training tasks. We show that a nonlinear conjugate gradient algorithm improves the convergence speed of DNN training, especially in the large mini-batch scenario, which is essential for scaling synchronous distributed DNN training to large number of workers. We show how to efficiently use second order information in the NLCG pre-conditioner for improving DNN training convergence. For the ImageNet classification task, at extremely large mini-batch sizes of greater than 65k, NLCG optimizer is able to improve top-1 accuracy by more than 10 percentage points for standard training of the Resnet-50 model for 90 epochs. For the CIFAR-100 classification task, at extremely large mini-batch sizes of greater than 16k, NLCG optimizer is able to improve top-1 accuracy by more than 15 percentage points for standard training of the Resnet-32 model for 200 epochs. |
Tasks | |
Published | 2018-12-07 |
URL | https://arxiv.org/abs/1812.02886v2 |
https://arxiv.org/pdf/1812.02886v2.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-conjugate-gradients-for-scaling |
Repo | |
Framework | |
A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting
Title | A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting |
Authors | Sisheng Liang, Long Nguyen, Fang Jin |
Abstract | Precisely forecasting wind speed is essential for wind power producers and grid operators. However, this task is challenging due to the stochasticity of wind speed. To accurately predict short-term wind speed under uncertainties, this paper proposed a multi-variable stacked LSTMs model (MSLSTM). The proposed method utilizes multiple historical meteorological variables, such as wind speed, temperature, humidity, pressure, dew point and solar radiation to accurately predict wind speeds. The prediction performance is extensively assessed using real data collected in West Texas, USA. The experimental results show that the proposed MSLSTM can preferably capture and learn uncertainties while output competitive performance. |
Tasks | |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09735v1 |
http://arxiv.org/pdf/1811.09735v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-variable-stacked-long-short-term |
Repo | |
Framework | |
Data Summarization at Scale: A Two-Stage Submodular Approach
Title | Data Summarization at Scale: A Two-Stage Submodular Approach |
Authors | Marko Mitrovic, Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi |
Abstract | The sheer scale of modern datasets has resulted in a dire need for summarization techniques that identify representative elements in a dataset. Fortunately, the vast majority of data summarization tasks satisfy an intuitive diminishing returns condition known as submodularity, which allows us to find nearly-optimal solutions in linear time. We focus on a two-stage submodular framework where the goal is to use some given training functions to reduce the ground set so that optimizing new functions (drawn from the same distribution) over the reduced set provides almost as much value as optimizing them over the entire ground set. In this paper, we develop the first streaming and distributed solutions to this problem. In addition to providing strong theoretical guarantees, we demonstrate both the utility and efficiency of our algorithms on real-world tasks including image summarization and ride-share optimization. |
Tasks | Data Summarization |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02815v1 |
http://arxiv.org/pdf/1806.02815v1.pdf | |
PWC | https://paperswithcode.com/paper/data-summarization-at-scale-a-two-stage |
Repo | |
Framework | |
A Storm in an IoT Cup: The Emergence of Cyber-Physical Social Machines
Title | A Storm in an IoT Cup: The Emergence of Cyber-Physical Social Machines |
Authors | Aastha Madaan, Jason R. C. Nurse, David De Roure, Kieron O’Hara, Wendy Hall, Sadie Creese |
Abstract | The concept of social machines is increasingly being used to characterise various socio-cognitive spaces on the Web. Social machines are human collectives using networked digital technology which initiate real-world processes and activities including human communication, interactions and knowledge creation. As such, they continuously emerge and fade on the Web. The relationship between humans and machines is made more complex by the adoption of Internet of Things (IoT) sensors and devices. The scale, automation, continuous sensing, and actuation capabilities of these devices add an extra dimension to the relationship between humans and machines making it difficult to understand their evolution at either the systemic or the conceptual level. This article describes these new socio-technical systems, which we term Cyber-Physical Social Machines, through different exemplars, and considers the associated challenges of security and privacy. |
Tasks | |
Published | 2018-09-16 |
URL | http://arxiv.org/abs/1809.05904v2 |
http://arxiv.org/pdf/1809.05904v2.pdf | |
PWC | https://paperswithcode.com/paper/a-storm-in-an-iot-cup-the-emergence-of-cyber |
Repo | |
Framework | |
Robust identification of thermal models for in-production High-Performance-Computing clusters with machine learning-based data selection
Title | Robust identification of thermal models for in-production High-Performance-Computing clusters with machine learning-based data selection |
Authors | Federico Pittino, Roberto Diversi, Luca Benini, Andrea Bartolini |
Abstract | Power and thermal management are critical components of High-Performance-Computing (HPC) systems, due to their high power density and large total power consumption. The assessment of thermal dissipation by means of compact models directly from the thermal response of the final device enables more robust and precise thermal control strategies as well as automated diagnosis. However, when dealing with large scale systems “in production”, the accuracy of learned thermal models depends on the dynamics of the power excitation, which depends also on the executed workload, and measurement nonidealities, such as quantization. In this paper we show that, using an advanced system identification algorithm, we are able to generate very accurate thermal models (average error lower than our sensors quantization step of 1{\deg}C) for a large scale HPC system on real workloads for very long time periods. However, we also show that: 1) not all real workloads allow for the identification of a good model; 2) starting from the theory of system identification it is very difficult to evaluate if a trace of data leads to a good estimated model. We then propose and validate a set of techniques based on machine learning and deep learning algorithms for the choice of data traces to be used for model identification. We also show that deep learning techniques are absolutely necessary to correctly choose such traces up to 96% of the times. |
Tasks | Quantization |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01865v2 |
http://arxiv.org/pdf/1810.01865v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-identification-of-thermal-models-for |
Repo | |
Framework | |
Shared Multi-Task Imitation Learning for Indoor Self-Navigation
Title | Shared Multi-Task Imitation Learning for Indoor Self-Navigation |
Authors | Junhong Xu, Qiwei Liu, Hanqing Guo, Aaron Kageza, Saeed AlQarni, Shaoen Wu |
Abstract | Deep imitation learning enables robots to learn from expert demonstrations to perform tasks such as lane following or obstacle avoidance. However, in the traditional imitation learning framework, one model only learns one task, and thus it lacks of the capability to support a robot to perform various different navigation tasks with one model in indoor environments. This paper proposes a new framework, Shared Multi-headed Imitation Learning(SMIL), that allows a robot to perform multiple tasks with one model without switching among different models. We model each task as a sub-policy and design a multi-headed policy to learn the shared information among related tasks by summing up activations from all sub-policies. Compared to single or non-shared multi-headed policies, this framework is able to leverage correlated information among tasks to increase performance.We have implemented this framework using a robot based on NVIDIA TX2 and performed extensive experiments in indoor environments with different baseline solutions. The results demonstrate that SMIL has doubled the performance over nonshared multi-headed policy. |
Tasks | Imitation Learning |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04503v1 |
http://arxiv.org/pdf/1808.04503v1.pdf | |
PWC | https://paperswithcode.com/paper/shared-multi-task-imitation-learning-for |
Repo | |
Framework | |
Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport
Title | Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport |
Authors | Théo Lacombe, Marco Cuturi, Steve Oudot |
Abstract | Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data. Despite several appealing properties, incorporating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. Indeed, this was recently exemplified in a string of papers which show that the simple task of averaging a few PDs can be computationally prohibitive. We propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering. This framework builds upon a reformulation of PD metrics as optimal transport (OT) problems. Doing so, we can exploit recent computational advances: the OT problem on a planar grid, when regularized with entropy, is convex can be solved in linear time using the Sinkhorn algorithm and convolutions. This results in scalable computations that can stream on GPUs. We demonstrate the efficiency of our approach by carrying out clustering with diagrams metrics on several thousands of PDs, a scale never seen before in the literature. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08331v2 |
http://arxiv.org/pdf/1805.08331v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-computation-of-means-and-clusters |
Repo | |
Framework | |
Adversarial Alignment of Class Prediction Uncertainties for Domain Adaptation
Title | Adversarial Alignment of Class Prediction Uncertainties for Domain Adaptation |
Authors | Jeroen Manders, Twan van Laarhoven, Elena Marchiori |
Abstract | We consider unsupervised domain adaptation: given labelled examples from a source domain and unlabelled examples from a related target domain, the goal is to infer the labels of target examples. Under the assumption that features from pre-trained deep neural networks are transferable across related domains, domain adaptation reduces to aligning source and target domain at class prediction uncertainty level. We tackle this problem by introducing a method based on adversarial learning which forces the label uncertainty predictions on the target domain to be indistinguishable from those on the source domain. Pre-trained deep neural networks are used to generate deep features having high transferability across related domains. We perform an extensive experimental analysis of the proposed method over a wide set of publicly available pre-trained deep neural networks. Results of our experiments on domain adaptation tasks for image classification show that class prediction uncertainty alignment with features extracted from pre-trained deep neural networks provides an efficient, robust and effective method for domain adaptation. |
Tasks | Domain Adaptation, Image Classification, Unsupervised Domain Adaptation |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04448v2 |
http://arxiv.org/pdf/1804.04448v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-alignment-of-class-prediction |
Repo | |
Framework | |
An Improved Tabu Search Heuristic for Static Dial-A-Ride Problem
Title | An Improved Tabu Search Heuristic for Static Dial-A-Ride Problem |
Authors | Songguang Ho, Sarat Chandra Nagavarapu, Ramesh Ramasamy Pandi, Justin Dauwels |
Abstract | Multi-vehicle routing has become increasingly important with the rapid development of autonomous vehicle technology. Dial-a-ride problem, a variant of vehicle routing problem (VRP), deals with the allocation of customer requests to vehicles, scheduling the pick-up and drop-off times and the sequence of serving those requests by ensuring high customer satisfaction with minimized travel cost. In this paper, we propose an improved tabu search (ITS) heuristic for static dial-a-ride problem (DARP) with the objective of obtaining high-quality solutions in short time. Two new techniques, initialization heuristic, and time window adjustment are proposed to achieve faster convergence to the global optimum. Various numerical experiments are conducted for the proposed solution methodology using DARP test instances from the literature and the convergence speed up is validated. |
Tasks | |
Published | 2018-01-25 |
URL | http://arxiv.org/abs/1801.09547v5 |
http://arxiv.org/pdf/1801.09547v5.pdf | |
PWC | https://paperswithcode.com/paper/an-improved-tabu-search-heuristic-for-static |
Repo | |
Framework | |
Multi-task learning for Joint Language Understanding and Dialogue State Tracking
Title | Multi-task learning for Joint Language Understanding and Dialogue State Tracking |
Authors | Abhinav Rastogi, Raghav Gupta, Dilek Hakkani-Tur |
Abstract | This paper presents a novel approach for multi-task learning of language understanding (LU) and dialogue state tracking (DST) in task-oriented dialogue systems. Multi-task training enables the sharing of the neural network layers responsible for encoding the user utterance for both LU and DST and improves performance while reducing the number of network parameters. In our proposed framework, DST operates on a set of candidate values for each slot that has been mentioned so far. These candidate sets are generated using LU slot annotations for the current user utterance, dialogue acts corresponding to the preceding system utterance and the dialogue state estimated for the previous turn, enabling DST to handle slots with a large or unbounded set of possible values and deal with slot values not seen during training. Furthermore, to bridge the gap between training and inference, we investigate the use of scheduled sampling on LU output for the current user utterance as well as the DST output for the preceding turn. |
Tasks | Dialogue State Tracking, Multi-Task Learning, Task-Oriented Dialogue Systems |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05408v1 |
http://arxiv.org/pdf/1811.05408v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-joint-language |
Repo | |
Framework | |
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Title | Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models |
Authors | Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He |
Abstract | Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks. However, NLMs are very computationally demanding largely due to the computational cost of the softmax layer over a large vocabulary. We observe that, in decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and language modeling tasks. We also prove the theoretical guarantee on the softmax approximation quality. |
Tasks | Language Modelling, Machine Translation |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04189v1 |
http://arxiv.org/pdf/1806.04189v1.pdf | |
PWC | https://paperswithcode.com/paper/navigating-with-graph-representations-for |
Repo | |
Framework | |
Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks
Title | Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks |
Authors | Marek Rei, Joshua Oppenheimer, Marek Sirendi |
Abstract | We describe a novel neural network architecture for the prediction of ventricular tachyarrhythmias. The model receives input features that capture the change in RR intervals and ectopic beats, along with features based on heart rate variability and frequency analysis. Patient age is also included as a trainable embedding, while the whole network is optimized with multi-task objectives. Each of these modifications provides a consistent improvement to the model performance, achieving 74.02% prediction accuracy and 77.22% specificity 60 seconds in advance of the episode. |
Tasks | Heart Rate Variability |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1811.12938v1 |
http://arxiv.org/pdf/1811.12938v1.pdf | |
PWC | https://paperswithcode.com/paper/advance-prediction-of-ventricular |
Repo | |
Framework | |
Multi-class Active Learning: A Hybrid Informative and Representative Criterion Inspired Approach
Title | Multi-class Active Learning: A Hybrid Informative and Representative Criterion Inspired Approach |
Authors | Xi Fang, Zengmao Wang, Xinyao Tang, Chen Wu |
Abstract | Labeling each instance in a large dataset is extremely labor- and time- consuming . One way to alleviate this problem is active learning, which aims to which discover the most valuable instances for labeling to construct a powerful classifier. Considering both informativeness and representativeness provides a promising way to design a practical active learning. However, most existing active learning methods select instances favoring either informativeness or representativeness. Meanwhile, many are designed based on the binary class, so that they may present suboptimal solutions on the datasets with multiple classes. In this paper, a hybrid informative and representative criterion based multi-class active learning approach is proposed. We combine the informative informativeness and representativeness into one formula, which can be solved under a unified framework. The informativeness is measured by the margin minimum while the representative information is measured by the maximum mean discrepancy. By minimizing the upper bound for the true risk, we generalize the empirical risk minimization principle to the active learning setting. Simultaneously, our proposed method makes full use of the label information, and the proposed active learning is designed based on multiple classes. So the proposed method is not suitable to the binary class but also the multiple classes. We conduct our experiments on twelve benchmark UCI data sets, and the experimental results demonstrate that the proposed method performs better than some state-of-the-art methods. |
Tasks | Active Learning |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02222v1 |
http://arxiv.org/pdf/1803.02222v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-class-active-learning-a-hybrid |
Repo | |
Framework | |