July 29, 2019

3003 words 15 mins read

Paper Group AWR 112

Detailed, accurate, human shape estimation from clothed 3D scan sequences. Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks. Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering. On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation St …

Detailed, accurate, human shape estimation from clothed 3D scan sequences


Title	Detailed, accurate, human shape estimation from clothed 3D scan sequences
Authors	Chao Zhang, Sergi Pujades, Michael Black, Gerard Pons-Moll
Abstract	We address the problem of estimating human pose and body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited body models produce smooth shapes lacking personalized details. We contribute a new approach to recover a personalized shape of the person. The estimated shape deviates from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available BUFF, a new 4D dataset that enables quantitative evaluation (http://buff.is.tue.mpg.de). Our method outperforms the state of the art in both pose estimation and shape estimation, qualitatively and quantitatively.
Tasks	Pose Estimation
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04454v2
PDF	http://arxiv.org/pdf/1703.04454v2.pdf
PWC	https://paperswithcode.com/paper/detailed-accurate-human-shape-estimation-from
Repo	https://github.com/maria-korosteleva/Body-Shape-Estimation
Framework	none

Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks


Title	Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks
Authors	Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu
Abstract	We present a new loss function, namely Wing loss, for robust facial landmark localisation with Convolutional Neural Networks (CNNs). We first compare and analyse different loss functions including L2, L1 and smooth L1. The analysis of these loss functions suggests that, for the training of a CNN-based localisation model, more attention should be paid to small and medium range errors. To this end, we design a piece-wise loss function. The new loss amplifies the impact of errors from the interval (-w, w) by switching from L1 loss to a modified logarithm function. To address the problem of under-representation of samples with large out-of-plane head rotations in the training set, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation approaches. Last, the proposed approach is extended to create a two-stage framework for robust facial landmark localisation. The experimental results obtained on AFLW and 300W demonstrate the merits of the Wing loss function, and prove the superiority of the proposed method over the state-of-the-art approaches.
Tasks	Data Augmentation, Face Alignment
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06753v5
PDF	http://arxiv.org/pdf/1711.06753v5.pdf
PWC	https://paperswithcode.com/paper/wing-loss-for-robust-facial-landmark
Repo	https://github.com/xialuxi/arcface-caffe
Framework	none

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering


Title	Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering
Authors	Seunghyun Yoon, Joongbo Shin, Kyomin Jung
Abstract	In this paper, we propose a novel end-to-end neural architecture for ranking candidate answers, that adapts a hierarchical recurrent neural network and a latent topic clustering module. With our proposed model, a text is encoded to a vector representation from an word-level to a chunk-level to effectively capture the entire meaning. In particular, by adapting the hierarchical structure, our model shows very small performance degradations in longer text comprehension while other state-of-the-art recurrent neural network models suffer from it. Additionally, the latent topic clustering module extracts semantic information from target samples. This clustering module is useful for any text related tasks by allowing each data sample to find its nearest topic cluster, thus helping the neural network model analyze the entire data. We evaluate our models on the Ubuntu Dialogue Corpus and consumer electronic domain question answering dataset, which is related to Samsung products. The proposed model shows state-of-the-art results for ranking question-answer pairs.
Tasks	Answer Selection, Learning-To-Rank, Question Answering
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03430v3
PDF	http://arxiv.org/pdf/1710.03430v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-rank-question-answer-pairs-using
Repo	https://github.com/david-yoon/QA_HRDE_LTC
Framework	tf

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis


Title	On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
Authors	Jose Camacho-Collados, Mohammad Taher Pilehvar
Abstract	Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.
Tasks	Sentiment Analysis, Text Categorization, Text Classification, Tokenization, Word Embeddings
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01780v3
PDF	http://arxiv.org/pdf/1707.01780v3.pdf
PWC	https://paperswithcode.com/paper/on-the-role-of-text-preprocessing-in-neural
Repo	https://github.com/changji2069/Scope-Project
Framework	none

Improving Discourse Relation Projection to Build Discourse Annotated Corpora


Title	Improving Discourse Relation Projection to Build Discourse Annotated Corpora
Authors	Majid Laali, Leila Kosseim
Abstract	The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.
Tasks	Word Alignment
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06357v1
PDF	http://arxiv.org/pdf/1707.06357v1.pdf
PWC	https://paperswithcode.com/paper/improving-discourse-relation-projection-to
Repo	https://github.com/mjlaali/Europarl-ConcoDisco
Framework	none

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet


Title	Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet
Authors	Naimish Agarwal, Artus Krohn-Grimberghe, Ranjana Vyas
Abstract	Facial Key Points (FKPs) Detection is an important and challenging problem in the fields of computer vision and machine learning. It involves predicting the co-ordinates of the FKPs, e.g. nose tip, center of eyes, etc, for a given face. In this paper, we propose a LeNet adapted Deep CNN model - NaimishNet, to operate on facial key points data and compare our model’s performance against existing state of the art approaches.
Tasks
Published	2017-10-03
URL	http://arxiv.org/abs/1710.00977v1
PDF	http://arxiv.org/pdf/1710.00977v1.pdf
PWC	https://paperswithcode.com/paper/facial-key-points-detection-using-deep
Repo	https://github.com/anitagold/30-days-of-Udacity
Framework	pytorch

Evolution Strategies as a Scalable Alternative to Reinforcement Learning


Title	Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Authors	Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever
Abstract	We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.
Tasks	Atari Games, Q-Learning
Published	2017-03-10
URL	http://arxiv.org/abs/1703.03864v2
PDF	http://arxiv.org/pdf/1703.03864v2.pdf
PWC	https://paperswithcode.com/paper/evolution-strategies-as-a-scalable
Repo	https://github.com/aspk/space_battle
Framework	none

Evolving Deep Convolutional Neural Networks for Image Classification


Title	Evolving Deep Convolutional Neural Networks for Image Classification
Authors	Yanan Sun, Bing Xue, Mengjie Zhang, Gary G. Yen
Abstract	Evolutionary computation methods have been successfully applied to neural networks since two decades ago, while those methods cannot scale well to the modern deep neural networks due to the complicated architectures and large quantities of connection weights. In this paper, we propose a new method using genetic algorithms for evolving the architectures and connection weight initialization values of a deep convolutional neural network to address image classification problems. In the proposed algorithm, an efficient variable-length gene encoding strategy is designed to represent the different building blocks and the unpredictable optimal depth in convolutional neural networks. In addition, a new representation scheme is developed for effectively initializing connection weights of deep convolutional neural networks, which is expected to avoid networks getting stuck into local minima which is typically a major issue in the backward gradient-based optimization. Furthermore, a novel fitness evaluation method is proposed to speed up the heuristic search with substantially less computational resource. The proposed algorithm is examined and compared with 22 existing algorithms on nine widely used image classification tasks, including the state-of-the-art methods. The experimental results demonstrate the remarkable superiority of the proposed algorithm over the state-of-the-art algorithms in terms of classification error rate and the number of parameters (weights).
Tasks	Image Classification
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10741v3
PDF	http://arxiv.org/pdf/1710.10741v3.pdf
PWC	https://paperswithcode.com/paper/evolving-deep-convolutional-neural-networks
Repo	https://github.com/MagnusCaligo/EvoCNN
Framework	none

Target contrastive pessimistic risk for robust domain adaptation


Title	Target contrastive pessimistic risk for robust domain adaptation
Authors	Wouter M. Kouw, Marco Loog
Abstract	In domain adaptation, classifiers with information from a source domain adapt to generalize to a target domain. However, an adaptive classifier can perform worse than a non-adaptive classifier due to invalid assumptions, increased sensitivity to estimation errors or model misspecification. Our goal is to develop a domain-adaptive classifier that is robust in the sense that it does not rely on restrictive assumptions on how the source and target domains relate to each other and that it does not perform worse than the non-adaptive classifier. We formulate a conservative parameter estimator that only deviates from the source classifier when a lower risk is guaranteed for all possible labellings of the given target samples. We derive the classical least-squares and discriminant analysis cases and show that these perform on par with state-of-the-art domain adaptive classifiers in sample selection bias settings, while outperforming them in more general domain adaptation settings.
Tasks	Domain Adaptation
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08082v1
PDF	http://arxiv.org/pdf/1706.08082v1.pdf
PWC	https://paperswithcode.com/paper/target-contrastive-pessimistic-risk-for
Repo	https://github.com/wmkouw/libTLDA
Framework	none

Structured Attention Networks


Title	Structured Attention Networks
Authors	Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush
Abstract	Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. We experiment with two different classes of structured attention networks: a linear-chain conditional random field and a graph-based parsing model, and describe how these models can be practically implemented as neural network layers. Experiments show that this approach is effective for incorporating structural biases, and structured attention networks outperform baseline attention models on a variety of synthetic and real tasks: tree transduction, neural machine translation, question answering, and natural language inference. We further find that models trained in this way learn interesting unsupervised hidden representations that generalize simple attention.
Tasks	Machine Translation, Natural Language Inference, Question Answering
Published	2017-02-03
URL	http://arxiv.org/abs/1702.00887v3
PDF	http://arxiv.org/pdf/1702.00887v3.pdf
PWC	https://paperswithcode.com/paper/structured-attention-networks
Repo	https://github.com/harvardnlp/struct-attn
Framework	none

Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade


Title	Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade
Authors	Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
Abstract	We propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and ‘difficulty-aware’ learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed.
Tasks	Semantic Segmentation
Published	2017-04-05
URL	http://arxiv.org/abs/1704.01344v1
PDF	http://arxiv.org/pdf/1704.01344v1.pdf
PWC	https://paperswithcode.com/paper/not-all-pixels-are-equal-difficulty-aware
Repo	https://github.com/liuziwei7/region-conv
Framework	none

Max-value Entropy Search for Efficient Bayesian Optimization


Title	Max-value Entropy Search for Efficient Bayesian Optimization
Authors	Zi Wang, Stefanie Jegelka
Abstract	Entropy Search (ES) and Predictive Entropy Search (PES) are popular and empirically successful Bayesian Optimization techniques. Both rely on a compelling information-theoretic motivation, and maximize the information gained about the $\arg\max$ of the unknown function; yet, both are plagued by the expensive computation for estimating entropies. We propose a new criterion, Max-value Entropy Search (MES), that instead uses the information about the maximum function value. We show relations of MES to other Bayesian optimization methods, and establish a regret bound. We observe that MES maintains or improves the good empirical performance of ES/PES, while tremendously lightening the computational burden. In particular, MES is much more robust to the number of samples used for computing the entropy, and hence more efficient for higher dimensional problems.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.01968v3
PDF	http://arxiv.org/pdf/1703.01968v3.pdf
PWC	https://paperswithcode.com/paper/max-value-entropy-search-for-efficient
Repo	https://github.com/zi-w/Max-value-Entropy-Search
Framework	none

Completion of High Order Tensor Data with Missing Entries via Tensor-train Decomposition


Title	Completion of High Order Tensor Data with Missing Entries via Tensor-train Decomposition
Authors	Longhao Yuan, Qibin Zhao, Jianting Cao
Abstract	In this paper, we aim at the completion problem of high order tensor data with missing entries. The existing tensor factorization and completion methods suffer from the curse of dimensionality when the order of tensor N»3. To overcome this problem, we propose an efficient algorithm called TT-WOPT (Tensor-train Weighted OPTimization) to find the latent core tensors of tensor data and recover the missing entries. Tensor-train decomposition, which has the powerful representation ability with linear scalability to tensor order, is employed in our algorithm. The experimental results on synthetic data and natural image completion demonstrate that our method significantly outperforms the other related methods. Especially when the missing rate of data is very high, e.g., 85% to 99%, our algorithm can achieve much better performance than other state-of-the-art algorithms.
Tasks
Published	2017-09-08
URL	http://arxiv.org/abs/1709.02641v2
PDF	http://arxiv.org/pdf/1709.02641v2.pdf
PWC	https://paperswithcode.com/paper/completion-of-high-order-tensor-data-with
Repo	https://github.com/yuanlonghao/T3C_tensor_completion
Framework	none

A study of Thompson Sampling with Parameter h


Title	A study of Thompson Sampling with Parameter h
Authors	Qiang Ha
Abstract	Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.
Tasks
Published	2017-10-05
URL	http://arxiv.org/abs/1710.02174v1
PDF	http://arxiv.org/pdf/1710.02174v1.pdf
PWC	https://paperswithcode.com/paper/a-study-of-thompson-sampling-with-parameter-h
Repo	https://github.com/qiangha/thompson_sampling
Framework	none

Conditional Variance Penalties and Domain Shift Robustness


Title	Conditional Variance Penalties and Domain Shift Robustness
Authors	Christina Heinze-Deml, Nicolai Meinshausen
Abstract	When training a deep neural network for image classification, one can broadly distinguish between two types of latent features of images that will drive the classification. We can divide latent features into (i) “core” or “conditionally invariant” features $X^\text{core}$ whose distribution $X^\text{core}\vert Y$, conditional on the class $Y$, does not change substantially across domains and (ii) “style” features $X^{\text{style}}$ whose distribution $X^{\text{style}} \vert Y$ can change substantially across domains. Examples for style features include position, rotation, image quality or brightness but also more complex ones like hair color, image quality or posture for images of persons. Our goal is to minimize a loss that is robust under changes in the distribution of these style features. In contrast to previous work, we assume that the domain itself is not observed and hence a latent variable. We do assume that we can sometimes observe a typically discrete identifier or “$\mathrm{ID}$ variable”. In some applications we know, for example, that two images show the same person, and $\mathrm{ID}$ then refers to the identity of the person. The proposed method requires only a small fraction of images to have $\mathrm{ID}$ information. We group observations if they share the same class and identifier $(Y,\mathrm{ID})=(y,\mathrm{id})$ and penalize the conditional variance of the prediction or the loss if we condition on $(Y,\mathrm{ID})$. Using a causal framework, this conditional variance regularization (CoRe) is shown to protect asymptotically against shifts in the distribution of the style variables. Empirically, we show that the CoRe penalty improves predictive accuracy substantially in settings where domain changes occur in terms of image quality, brightness and color while we also look at more complex changes such as changes in movement and posture.
Tasks	Image Classification
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11469v5
PDF	http://arxiv.org/pdf/1710.11469v5.pdf
PWC	https://paperswithcode.com/paper/conditional-variance-penalties-and-domain
Repo	https://github.com/christinaheinze/core
Framework	tf