Paper Group AWR 12
Multi-Person Pose Estimation with Local Joint-to-Person Associations. Whitening-Free Least-Squares Non-Gaussian Component Analysis. Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution. Equality of Opportunity in Supervised Learning. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neur …
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Title | Multi-Person Pose Estimation with Local Joint-to-Person Associations |
Authors | Umar Iqbal, Juergen Gall |
Abstract | Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated. To this end, we consider multi-person pose estimation as a joint-to-person association problem. We construct a fully connected graph from a set of detected joint candidates in an image and resolve the joint-to-person association and outlier detection using integer linear programming. Since solving joint-to-person association jointly for all persons in an image is an NP-hard problem and even approximations are expensive, we solve the problem locally for each person. On the challenging MPII Human Pose Dataset for multiple persons, our approach achieves the accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster. |
Tasks | Multi-Person Pose Estimation, Outlier Detection, Pose Estimation |
Published | 2016-08-30 |
URL | http://arxiv.org/abs/1608.08526v2 |
http://arxiv.org/pdf/1608.08526v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-person-pose-estimation-with-local-joint |
Repo | https://github.com/MVIG-SJTU/RMPE |
Framework | torch |
Whitening-Free Least-Squares Non-Gaussian Component Analysis
Title | Whitening-Free Least-Squares Non-Gaussian Component Analysis |
Authors | Hiroaki Shiino, Hiroaki Sasaki, Gang Niu, Masashi Sugiyama |
Abstract | Non-Gaussian component analysis (NGCA) is an unsupervised linear dimension reduction method that extracts low-dimensional non-Gaussian “signals” from high-dimensional data contaminated with Gaussian noise. NGCA can be regarded as a generalization of projection pursuit (PP) and independent component analysis (ICA) to multi-dimensional and dependent non-Gaussian components. Indeed, seminal approaches to NGCA are based on PP and ICA. Recently, a novel NGCA approach called least-squares NGCA (LSNGCA) has been developed, which gives a solution analytically through least-squares estimation of log-density gradients and eigendecomposition. However, since pre-whitening of data is involved in LSNGCA, it performs unreliably when the data covariance matrix is ill-conditioned, which is often the case in high-dimensional data analysis. In this paper, we propose a whitening-free LSNGCA method and experimentally demonstrate its superiority. |
Tasks | Dimensionality Reduction |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.01029v2 |
http://arxiv.org/pdf/1603.01029v2.pdf | |
PWC | https://paperswithcode.com/paper/whitening-free-least-squares-non-gaussian |
Repo | https://github.com/hgeno/WFLSNGCA |
Framework | none |
Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution
Title | Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution |
Authors | Emad Barsoum, Cha Zhang, Cristian Canton Ferrer, Zhengyou Zhang |
Abstract | Crowd sourcing has become a widely adopted scheme to collect ground truth labels. However, it is a well-known problem that these labels can be very noisy. In this paper, we demonstrate how to learn a deep convolutional neural network (DCNN) from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches to utilizing the multiple labels: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. An enhanced FER+ data set with multiple labels for each face image will also be shared with the research community. |
Tasks | Facial Expression Recognition, Multi-Label Learning |
Published | 2016-08-03 |
URL | http://arxiv.org/abs/1608.01041v2 |
http://arxiv.org/pdf/1608.01041v2.pdf | |
PWC | https://paperswithcode.com/paper/training-deep-networks-for-facial-expression |
Repo | https://github.com/Microsoft/FERPlus |
Framework | none |
Equality of Opportunity in Supervised Learning
Title | Equality of Opportunity in Supervised Learning |
Authors | Moritz Hardt, Eric Price, Nathan Srebro |
Abstract | We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to the decision maker, who can respond by improving the classification accuracy. In line with other studies, our notion is oblivious: it depends only on the joint statistics of the predictor, the target and the protected attribute, but not on interpretation of individualfeatures. We study the inherent limits of defining and identifying biases based on such oblivious measures, outlining what can and cannot be inferred from different oblivious tests. We illustrate our notion using a case study of FICO credit scores. |
Tasks | |
Published | 2016-10-07 |
URL | http://arxiv.org/abs/1610.02413v1 |
http://arxiv.org/pdf/1610.02413v1.pdf | |
PWC | https://paperswithcode.com/paper/equality-of-opportunity-in-supervised |
Repo | https://github.com/stes/drk.ki-macht-schule |
Framework | tf |
vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
Title | vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design |
Authors | Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, Stephen W. Keckler |
Abstract | The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher’s flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN. |
Tasks | |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.08124v3 |
http://arxiv.org/pdf/1602.08124v3.pdf | |
PWC | https://paperswithcode.com/paper/vdnn-virtualized-deep-neural-networks-for |
Repo | https://github.com/adderbyte/MultiClassLabelSegTF |
Framework | tf |
Optimization for Large-Scale Machine Learning with Distributed Features and Observations
Title | Optimization for Large-Scale Machine Learning with Distributed Features and Observations |
Authors | Alexandros Nathan, Diego Klabjan |
Abstract | As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention in the literature. Although previous research has mostly focused on settings where either the observations, or features of the problem at hand are stored in distributed fashion, the situation where both are partitioned across the nodes of a computer cluster (doubly distributed) has barely been studied. In this work we propose two doubly distributed optimization algorithms. The first one falls under the umbrella of distributed dual coordinate ascent methods, while the second one belongs to the class of stochastic gradient/coordinate descent hybrid methods. We conduct numerical experiments in Spark using real-world and simulated data sets and study the scaling properties of our methods. Our empirical evaluation of the proposed algorithms demonstrates the out-performance of a block distributed ADMM method, which, to the best of our knowledge is the only other existing doubly distributed optimization algorithm. |
Tasks | Distributed Optimization |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.10060v2 |
http://arxiv.org/pdf/1610.10060v2.pdf | |
PWC | https://paperswithcode.com/paper/optimization-for-large-scale-machine-learning |
Repo | https://github.com/anathan90/RADiSA |
Framework | none |
Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs
Title | Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs |
Authors | Ferdinando Fioretto, Enrico Pontelli, William Yeoh, Rina Dechter |
Abstract | Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic variants such as the tasks of finding the most probable explanation (MPE) in belief networks. Inference-based algorithms are powerful techniques for solving discrete optimization problems, which can be used independently or in combination with other techniques. However, their applicability is often limited by their compute intensive nature and their space requirements. This paper proposes the design and implementation of a novel inference-based technique, which exploits modern massively parallel architectures, such as those found in Graphical Processing Units (GPUs), to speed up the resolution of exact and approximated inference-based algorithms for discrete optimization. The paper studies the proposed algorithm in both centralized and distributed optimization contexts. The paper demonstrates that the use of GPUs provides significant advantages in terms of runtime and scalability, achieving up to two orders of magnitude in speedups and showing a considerable reduction in execution time (up to 345 times faster) with respect to a sequential version. |
Tasks | Distributed Optimization |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05288v2 |
http://arxiv.org/pdf/1608.05288v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-exact-and-approximate-inference |
Repo | https://github.com/nandofioretto/GpuBE |
Framework | none |
Using the Output Embedding to Improve Language Models
Title | Using the Output Embedding to Improve Language Models |
Authors | Ofir Press, Lior Wolf |
Abstract | We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance. |
Tasks | |
Published | 2016-08-20 |
URL | http://arxiv.org/abs/1608.05859v3 |
http://arxiv.org/pdf/1608.05859v3.pdf | |
PWC | https://paperswithcode.com/paper/using-the-output-embedding-to-improve |
Repo | https://github.com/floydhub/word-language-model |
Framework | pytorch |
Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Title | Incorporating Copying Mechanism in Sequence-to-Sequence Learning |
Authors | Jiatao Gu, Zhengdong Lu, Hang Li, Victor O. K. Li |
Abstract | We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks. |
Tasks | Text Summarization |
Published | 2016-03-21 |
URL | http://arxiv.org/abs/1603.06393v3 |
http://arxiv.org/pdf/1603.06393v3.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-copying-mechanism-in-sequence |
Repo | https://github.com/adamklec/copynet |
Framework | pytorch |
We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next!
Title | We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next! |
Authors | Ankesh Anand, Tanmoy Chakraborty, Noseong Park |
Abstract | Online content publishers often use catchy headlines for their articles in order to attract users to their websites. These headlines, popularly known as clickbaits, exploit a user’s curiosity gap and lure them to click on links that often disappoint them. Existing methods for automatically detecting clickbaits rely on heavy feature engineering and domain knowledge. Here, we introduce a neural network architecture based on Recurrent Neural Networks for detecting clickbaits. Our model relies on distributed word representations learned from a large unannotated corpora, and character embeddings learned via Convolutional Neural Networks. Experimental results on a dataset of news headlines show that our model outperforms existing techniques for clickbait detection with an accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99. |
Tasks | Clickbait Detection, Feature Engineering |
Published | 2016-12-05 |
URL | https://arxiv.org/abs/1612.01340v2 |
https://arxiv.org/pdf/1612.01340v2.pdf | |
PWC | https://paperswithcode.com/paper/we-used-neural-networks-to-detect-clickbaits |
Repo | https://github.com/ankeshanand/deep-clickbait-detection |
Framework | tf |
Human pose estimation via Convolutional Part Heatmap Regression
Title | Human pose estimation via Convolutional Part Heatmap Regression |
Authors | Adrian Bulat, Georgios Tzimiropoulos |
Abstract | This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The first part of our cascade outputs part detection heatmaps and the second part performs regression on these heatmaps. The benefits of the proposed architecture are multi-fold: It guides the network where to focus in the image and effectively encodes part constraints and context. More importantly, it can effectively cope with occlusions because part detection heatmaps for occluded parts provide low confidence scores which subsequently guide the regression part of our network to rely on contextual information in order to predict the location of these parts. Additionally, we show that the proposed cascade is flexible enough to readily allow the integration of various CNN architectures for both detection and regression, including recent ones based on residual learning. Finally, we illustrate that our cascade achieves top performance on the MPII and LSP data sets. Code can be downloaded from http://www.cs.nott.ac.uk/~psxab5/ |
Tasks | Pose Estimation |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01743v1 |
http://arxiv.org/pdf/1609.01743v1.pdf | |
PWC | https://paperswithcode.com/paper/human-pose-estimation-via-convolutional-part |
Repo | https://github.com/1adrianb/human-pose-estimation |
Framework | torch |
3D Fully Convolutional Network for Vehicle Detection in Point Cloud
Title | 3D Fully Convolutional Network for Vehicle Detection in Point Cloud |
Authors | Bo Li |
Abstract | 2D fully convolutional network has been recently successfully applied to object detection from images. In this paper, we extend the fully convolutional network based detection techniques to 3D and apply it to point cloud data. The proposed approach is verified on the task of vehicle detection from lidar point cloud for autonomous driving. Experiments on the KITTI dataset shows a significant performance improvement over the previous point cloud based detection approaches. |
Tasks | Autonomous Driving, Object Detection |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08069v2 |
http://arxiv.org/pdf/1611.08069v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-fully-convolutional-network-for-vehicle |
Repo | https://github.com/s10803926/3D-Object-detection-from-Pointcloud |
Framework | tf |
DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
Title | DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks |
Authors | Jie Fu, Hongyin Luo, Jiashi Feng, Kian Hsiang Low, Tat-Seng Chua |
Abstract | The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on several image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to state-of-the-art methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks. The code can be downloaded from https://github.com/bigaidream-projects/drmad |
Tasks | |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00917v5 |
http://arxiv.org/pdf/1601.00917v5.pdf | |
PWC | https://paperswithcode.com/paper/drmad-distilling-reverse-mode-automatic |
Repo | https://github.com/bigaidream-projects/drmad |
Framework | none |
Real-Time Anomaly Detection for Streaming Analytics
Title | Real-Time Anomaly Detection for Streaming Analytics |
Authors | Subutai Ahmad, Scott Purdy |
Abstract | Much of the worlds data is streaming, time-series data, where anomalies give significant information in critical situations. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, and learn while simultaneously making predictions. We present a novel anomaly detection technique based on an on-line sequence memory algorithm called Hierarchical Temporal Memory (HTM). We show results from a live application that detects anomalies in financial metrics in real-time. We also test the algorithm on NAB, a published benchmark for real-time anomaly detection, where our algorithm achieves best-in-class results. |
Tasks | Anomaly Detection, Time Series |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02480v1 |
http://arxiv.org/pdf/1607.02480v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-anomaly-detection-for-streaming |
Repo | https://github.com/SudeepSarkar/matlabHTM |
Framework | none |
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
Title | MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos |
Authors | Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency |
Abstract | People are sharing their opinions, stories and reviews through online video sharing websites every day. Studying sentiment and subjectivity in these opinion videos is experiencing a growing attention from academia and industry. While sentiment analysis has been successful for text, it is an understudied research question for videos and multimedia content. The biggest setbacks for studies in this direction are lack of a proper dataset, methodology, baselines and statistical analysis of how information from different modality sources relate to each other. This paper introduces to the scientific community the first opinion-level annotated corpus of sentiment and subjectivity analysis in online videos called Multimodal Opinion-level Sentiment Intensity dataset (MOSI). The dataset is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, and per-milliseconds annotated audio features. Furthermore, we present baselines for future studies in this direction as well as a new multimodal fusion approach that jointly models spoken words and visual gestures. |
Tasks | Sentiment Analysis, Subjectivity Analysis |
Published | 2016-06-20 |
URL | http://arxiv.org/abs/1606.06259v2 |
http://arxiv.org/pdf/1606.06259v2.pdf | |
PWC | https://paperswithcode.com/paper/mosi-multimodal-corpus-of-sentiment-intensity |
Repo | https://github.com/soujanyaporia/multimodal-sentiment-analysis |
Framework | tf |