February 2, 2020

3239 words 16 mins read

Paper Group AWR 69

Reconstructing faces from voices. Embedded Neural Networks for Robot Autonomy. Weakly-supervised Caricature Face Parsing through Domain Adaptation. Attribute-Driven Spontaneous Motion in Unpaired Image Translation. Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction. MinAtar: An Atari-Inspired Testbed for Thorough and Repro …

Reconstructing faces from voices


Title	Reconstructing faces from voices
Authors	Yandong Wen, Rita Singh, Bhiksha Raj
Abstract	Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone’s face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10604v2
PDF	https://arxiv.org/pdf/1905.10604v2.pdf
PWC	https://paperswithcode.com/paper/reconstructing-faces-from-voices
Repo	https://github.com/cmu-mlsp/reconstructing_faces_from_voices
Framework	pytorch

Embedded Neural Networks for Robot Autonomy


Title	Embedded Neural Networks for Robot Autonomy
Authors	Sarah Aguasvivas Manzano, Dana Hughes, Cooper Simpson, Radhen Patel, Nikolaus Correll
Abstract	We present a library to automatically embed signal processing and neural network predictions into the material robots are made of. Deep and shallow neural network models are first trained offline using state-of-the-art machine learning tools and then transferred onto general purpose microcontrollers that are co-located with a robot’s sensors and actuators. We validate this approach using multiple examples: a smart robotic tire for terrain classification, a robotic finger sensor for load classification and a smart composite capable of regressing impact source localization. In each example, sensing and computation are embedded inside the material, creating artifacts that serve as stand-in replacement for otherwise inert conventional parts. The open source software library takes as inputs trained model files from higher level learning software, such as Tensorflow/Keras, and outputs code that is readable in a microcontroller that supports C. We compare the performance of this approach for various embedded platforms. In particular, we show that low-cost off-the-shelf microcontrollers can match the accuracy of a desktop computer, while being fast enough for real-time applications at different neural network configurations. We provide means to estimate the maximum number of parameters that the hardware will support based on the microcontroller’s specifications.
Tasks
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03848v1
PDF	https://arxiv.org/pdf/1911.03848v1.pdf
PWC	https://paperswithcode.com/paper/embedded-neural-networks-for-robot-autonomy
Repo	https://github.com/correlllab/nn4mc
Framework	tf

Weakly-supervised Caricature Face Parsing through Domain Adaptation


Title	Weakly-supervised Caricature Face Parsing through Domain Adaptation
Authors	Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang
Abstract	A caricature is an artistic form of a person’s picture in which certain striking characteristics are abstracted or exaggerated in order to create a humor or sarcasm effect. For numerous caricature related applications such as attribute recognition and caricature editing, face parsing is an essential pre-processing step that provides a complete facial structure understanding. However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive. For real photos, there are numerous labeled datasets for face parsing. Thus, we formulate caricature face parsing as a domain adaptation problem, where real photos play the role of the source domain, adapting to the target caricatures. Specifically, we first leverage a spatial transformer based network to enable shape domain shifts. A feed-forward style transfer network is then utilized to capture texture-level domain gaps. With these two steps, we synthesize face caricatures from real photos, and thus we can use parsing ground truths of the original photos to learn the parsing model. Experimental results on the synthetic and real caricatures demonstrate the effectiveness of the proposed domain adaptation algorithm. Code is available at: https://github.com/ZJULearning/CariFaceParsing .
Tasks	Caricature, Domain Adaptation, Style Transfer
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05091v1
PDF	https://arxiv.org/pdf/1905.05091v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-caricature-face-parsing
Repo	https://github.com/ZJULearning/CariFaceParsing
Framework	pytorch

Attribute-Driven Spontaneous Motion in Unpaired Image Translation


Title	Attribute-Driven Spontaneous Motion in Unpaired Image Translation
Authors	Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia
Abstract	Current image translation methods, albeit effective to produce high-quality results in various applications, still do not consider much geometric transform. We in this paper propose the spontaneous motion estimation module, along with a refinement part, to learn attribute-driven deformation between source and target domains. Extensive experiments and visualization demonstrate effectiveness of these modules. We achieve promising results in unpaired-image translation tasks, and enable interesting applications based on spontaneous motion.
Tasks	Motion Estimation
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01452v2
PDF	https://arxiv.org/pdf/1907.01452v2.pdf
PWC	https://paperswithcode.com/paper/attribute-driven-spontaneous-motion-in
Repo	https://github.com/mikirui/ADSPM
Framework	pytorch

Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction


Title	Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction
Authors	Yova Kementchedjhieva, Mareike Hartmann, Anders Søgaard
Abstract	The task of bilingual dictionary induction (BDI) is commonly used for intrinsic evaluation of cross-lingual word embeddings. The largest dataset for BDI was generated automatically, so its quality is dubious. We study the composition and quality of the test sets for five diverse languages from this dataset, with concerning findings: (1) a quarter of the data consists of proper nouns, which can be hardly indicative of BDI performance, and (2) there are pervasive gaps in the gold-standard targets. These issues appear to affect the ranking between cross-lingual embedding systems on individual languages, and the overall degree to which the systems differ in performance. With proper nouns removed from the data, the margin between the top two systems included in the study grows from 3.4% to 17.2%. Manual verification of the predictions, on the other hand, reveals that gaps in the gold standard targets artificially inflate the margin between the two systems on English to Bulgarian BDI from 0.1% to 6.7%. We thus suggest that future research either avoids drawing conclusions from quantitative results on this BDI dataset, or accompanies such evaluation with rigorous error analysis.
Tasks	Word Embeddings
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05708v2
PDF	https://arxiv.org/pdf/1909.05708v2.pdf
PWC	https://paperswithcode.com/paper/lost-in-evaluation-misleading-benchmarks-for
Repo	https://github.com/coastalcph/MUSE_dicos
Framework	none

MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments


Title	MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments
Authors	Kenny Young, Tian Tian
Abstract	The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games demonstrate aspects of competency we expect from an intelligent agent and are not biased toward any particular solution approach. The challenge of the ALE includes (1) the representation learning problem of extracting pertinent information from raw pixels, and (2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, the research questions we are interested in pertain more to the latter, but the representation learning problem adds significant computational expense. We introduce MinAtar, short for miniature Atari, a new set of environments that capture the general mechanics of specific Atari games while simplifying the representational complexity to focus more on the behavioural challenges. MinAtar consists of analogues of five Atari games: Seaquest, Breakout, Asterix, Freeway and Space Invaders. Each MinAtar environment provides the agent with a 10x10xn binary state representation. Each game plays out on a 10x10 grid with n channels corresponding to game-specific objects, such as ball, paddle and brick in the game Breakout. To investigate the behavioural challenges posed by MinAtar, we evaluated a smaller version of the DQN architecture as well as online actor-critic with eligibility traces. With the representation learning problem simplified, we can perform experiments with significantly less computational expense. In our experiments, we use the saved compute time to perform step-size parameter sweeps and more runs than is typical for the ALE. Experiments like this improve reproducibility, and allow us to draw more confident conclusions. We hope that MinAtar can allow researchers to thoroughly investigate behavioural challenges similar to those inherent in the ALE.
Tasks	Atari Games, Representation Learning
Published	2019-03-07
URL	https://arxiv.org/abs/1903.03176v2
PDF	https://arxiv.org/pdf/1903.03176v2.pdf
PWC	https://paperswithcode.com/paper/minatar-an-atari-inspired-testbed-for-more
Repo	https://github.com/kenjyoung/MinAtar
Framework	pytorch

Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation


Title	Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation
Authors	George De Ath, Richard M. Everson, Alma A. M. Rahat, Jonathan E. Fieldsend
Abstract	The performance of acquisition functions for Bayesian optimisation is investigated in terms of the Pareto front between exploration and exploitation. We show that Expected Improvement and the Upper Confidence Bound always select solutions to be expensively evaluated on the Pareto front, but Probability of Improvement is never guaranteed to do so and Weighted Expected Improvement does only for a restricted range of weights. We introduce two novel $\epsilon$-greedy acquisition functions. Extensive empirical evaluation of these together with random search, purely exploratory and purely exploitative search on 10 benchmark problems in 1 to 10 dimensions shows that $\epsilon$-greedy algorithms are generally at least as effective as conventional acquisition functions, particularly with a limited budget. In higher dimensions $\epsilon$-greedy approaches are shown to have improved performance over conventional approaches. These results are borne out on a real world computational fluid dynamics optimisation problem and a robotics active learning problem.
Tasks	Active Learning, Bayesian Optimisation
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12809v1
PDF	https://arxiv.org/pdf/1911.12809v1.pdf
PWC	https://paperswithcode.com/paper/greed-is-good-exploration-and-exploitation
Repo	https://github.com/georgedeath/egreedy
Framework	none

InceptionTime: Finding AlexNet for Time Series Classification


Title	InceptionTime: Finding AlexNet for Time Series Classification
Authors	Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, François Petitjean
Abstract	Time series classification (TSC) is the area of machine learning interested in learning how to assign labels to time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE is infeasible to use in many applications because of its very high training time complexity in O(N^2*T^4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 72,000s to learn from a small dataset with N=700 time series of short length T=46. Deep learning, on the other hand, has now received enormous attention because of its high scalability and state-of-the-art accuracy in computer vision and natural language processing tasks. Deep learning for TSC has only very recently started to be explored, with the first few architectures developed over the last 3 years only. The accuracy of deep learning for TSC has been raised to a competitive level, but has not quite reached the level of HIVE-COTE. This is what this paper achieves: outperforming HIVE-COTE’s accuracy together with scalability. We take an important step towards finding the AlexNet network for TSC by presenting InceptionTime—an ensemble of deep Convolutional Neural Network (CNN) models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime slightly outperforms HIVE-COTE with a win/draw/loss on the UCR archive of 40/6/39. Not only is InceptionTime more accurate, but it is much faster: InceptionTime learns from that same dataset with 700 time series in 2,300s but can also learn from a dataset with 8M time series in 13 hours, a quantity of data that is fully out of reach of HIVE-COTE.
Tasks	Time Series, Time Series Classification
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04939v2
PDF	https://arxiv.org/pdf/1909.04939v2.pdf
PWC	https://paperswithcode.com/paper/dreamtime-finding-alexnet-for-time-series
Repo	https://github.com/hfawaz/InceptionTime
Framework	none

Hyperspherical Prototype Networks


Title	Hyperspherical Prototype Networks
Authors	Pascal Mettes, Elise van der Pol, Cees G. M. Snoek
Abstract	This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10514v3
PDF	https://arxiv.org/pdf/1901.10514v3.pdf
PWC	https://paperswithcode.com/paper/hyperspherical-prototype-networks
Repo	https://github.com/psmmettes/hpn
Framework	pytorch

Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS


Title	Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS
Authors	Shengyi Huang, Santiago Ontañón
Abstract	This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in MicroRTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.
Tasks
Published	2019-10-26
URL	https://arxiv.org/abs/1910.12134v2
PDF	https://arxiv.org/pdf/1910.12134v2.pdf
PWC	https://paperswithcode.com/paper/comparing-observation-and-action
Repo	https://github.com/vwxyzjn/gym-microrts
Framework	pytorch

Adaptive Estimation for Approximate k-Nearest-Neighbor Computations


Title	Adaptive Estimation for Approximate k-Nearest-Neighbor Computations
Authors	Daniel LeJeune, Richard G. Baraniuk, Reinhard Heckel
Abstract	Algorithms often carry out equally many computations for “easy” and “hard” problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate k-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of k nearest neighbors of a given query point. We propose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method.
Tasks
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09465v1
PDF	http://arxiv.org/pdf/1902.09465v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-estimation-for-approximate-k-nearest
Repo	https://github.com/dlej/adaptive-knn
Framework	none

Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition


Title	Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition
Authors	Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, Aram Galstyan
Abstract	We study the bias in several state-of-the-art named entity recognition (NER) models—specifically, a difference in the ability to recognize male and female names as PERSON entity types. We evaluate NER models on a dataset containing 139 years of U.S. census baby names and find that relatively more female names, as opposed to male names, are not recognized as PERSON entities. We study the extent of this bias in several NER systems that are used prominently in industry and academia. In addition, we also report a bias in the datasets on which these models were trained. The result of this analysis yields a new benchmark for gender bias evaluation in named entity recognition systems. The data and code for the application of this benchmark will be publicly available for researchers to use.
Tasks	Named Entity Recognition
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10872v1
PDF	https://arxiv.org/pdf/1910.10872v1.pdf
PWC	https://paperswithcode.com/paper/man-is-to-person-as-woman-is-to-location
Repo	https://github.com/Ninarehm/NERGenderBias
Framework	none

Sequential Attention-based Network for Noetic End-to-End Response Selection


Title	Sequential Attention-based Network for Noetic End-to-End Response Selection
Authors	Qian Chen, Wen Wang
Abstract	The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper describes our systems that are ranked the top on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchy-based (utterance-level and token-level) neural networks to explicitly model the interactions among different turns’ utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking the top in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, and achieves new state-of-the-art performances on two large-scale public multi-turn response selection benchmark datasets.
Tasks	Conversational Response Selection, Goal-Oriented Dialog
Published	2019-01-09
URL	https://arxiv.org/abs/1901.02609v3
PDF	https://arxiv.org/pdf/1901.02609v3.pdf
PWC	https://paperswithcode.com/paper/sequential-attention-based-network-for-noetic
Repo	https://github.com/alibaba/esim-response-selection
Framework	tf

A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources


Title	A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources
Authors	Shuguan Yang, Wei Ma, Xidong Pi, Sean Qian
Abstract	A deep learning model is applied for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 10.6% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas.
Tasks
Published	2019-01-21
URL	https://arxiv.org/abs/1901.06758v5
PDF	https://arxiv.org/pdf/1901.06758v5.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-to-real-time-parking
Repo	https://github.com/BreadYang/GraphCNN_parking
Framework	pytorch

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss


Title	Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Authors	Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma
Abstract	Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07413v2
PDF	https://arxiv.org/pdf/1906.07413v2.pdf
PWC	https://paperswithcode.com/paper/learning-imbalanced-datasets-with-label
Repo	https://github.com/feidfoe/AdjustBnd4Imbalance
Framework	pytorch