Paper Group AWR 69
Reconstructing faces from voices. Embedded Neural Networks for Robot Autonomy. Weakly-supervised Caricature Face Parsing through Domain Adaptation. Attribute-Driven Spontaneous Motion in Unpaired Image Translation. Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction. MinAtar: An Atari-Inspired Testbed for Thorough and Repro …
Reconstructing faces from voices
Title | Reconstructing faces from voices |
Authors | Yandong Wen, Rita Singh, Bhiksha Raj |
Abstract | Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone’s face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10604v2 |
https://arxiv.org/pdf/1905.10604v2.pdf | |
PWC | https://paperswithcode.com/paper/reconstructing-faces-from-voices |
Repo | https://github.com/cmu-mlsp/reconstructing_faces_from_voices |
Framework | pytorch |
Embedded Neural Networks for Robot Autonomy
Title | Embedded Neural Networks for Robot Autonomy |
Authors | Sarah Aguasvivas Manzano, Dana Hughes, Cooper Simpson, Radhen Patel, Nikolaus Correll |
Abstract | We present a library to automatically embed signal processing and neural network predictions into the material robots are made of. Deep and shallow neural network models are first trained offline using state-of-the-art machine learning tools and then transferred onto general purpose microcontrollers that are co-located with a robot’s sensors and actuators. We validate this approach using multiple examples: a smart robotic tire for terrain classification, a robotic finger sensor for load classification and a smart composite capable of regressing impact source localization. In each example, sensing and computation are embedded inside the material, creating artifacts that serve as stand-in replacement for otherwise inert conventional parts. The open source software library takes as inputs trained model files from higher level learning software, such as Tensorflow/Keras, and outputs code that is readable in a microcontroller that supports C. We compare the performance of this approach for various embedded platforms. In particular, we show that low-cost off-the-shelf microcontrollers can match the accuracy of a desktop computer, while being fast enough for real-time applications at different neural network configurations. We provide means to estimate the maximum number of parameters that the hardware will support based on the microcontroller’s specifications. |
Tasks | |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03848v1 |
https://arxiv.org/pdf/1911.03848v1.pdf | |
PWC | https://paperswithcode.com/paper/embedded-neural-networks-for-robot-autonomy |
Repo | https://github.com/correlllab/nn4mc |
Framework | tf |
Weakly-supervised Caricature Face Parsing through Domain Adaptation
Title | Weakly-supervised Caricature Face Parsing through Domain Adaptation |
Authors | Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang |
Abstract | A caricature is an artistic form of a person’s picture in which certain striking characteristics are abstracted or exaggerated in order to create a humor or sarcasm effect. For numerous caricature related applications such as attribute recognition and caricature editing, face parsing is an essential pre-processing step that provides a complete facial structure understanding. However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive. For real photos, there are numerous labeled datasets for face parsing. Thus, we formulate caricature face parsing as a domain adaptation problem, where real photos play the role of the source domain, adapting to the target caricatures. Specifically, we first leverage a spatial transformer based network to enable shape domain shifts. A feed-forward style transfer network is then utilized to capture texture-level domain gaps. With these two steps, we synthesize face caricatures from real photos, and thus we can use parsing ground truths of the original photos to learn the parsing model. Experimental results on the synthetic and real caricatures demonstrate the effectiveness of the proposed domain adaptation algorithm. Code is available at: https://github.com/ZJULearning/CariFaceParsing . |
Tasks | Caricature, Domain Adaptation, Style Transfer |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05091v1 |
https://arxiv.org/pdf/1905.05091v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-caricature-face-parsing |
Repo | https://github.com/ZJULearning/CariFaceParsing |
Framework | pytorch |
Attribute-Driven Spontaneous Motion in Unpaired Image Translation
Title | Attribute-Driven Spontaneous Motion in Unpaired Image Translation |
Authors | Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia |
Abstract | Current image translation methods, albeit effective to produce high-quality results in various applications, still do not consider much geometric transform. We in this paper propose the spontaneous motion estimation module, along with a refinement part, to learn attribute-driven deformation between source and target domains. Extensive experiments and visualization demonstrate effectiveness of these modules. We achieve promising results in unpaired-image translation tasks, and enable interesting applications based on spontaneous motion. |
Tasks | Motion Estimation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01452v2 |
https://arxiv.org/pdf/1907.01452v2.pdf | |
PWC | https://paperswithcode.com/paper/attribute-driven-spontaneous-motion-in |
Repo | https://github.com/mikirui/ADSPM |
Framework | pytorch |
Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction
Title | Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction |
Authors | Yova Kementchedjhieva, Mareike Hartmann, Anders Søgaard |
Abstract | The task of bilingual dictionary induction (BDI) is commonly used for intrinsic evaluation of cross-lingual word embeddings. The largest dataset for BDI was generated automatically, so its quality is dubious. We study the composition and quality of the test sets for five diverse languages from this dataset, with concerning findings: (1) a quarter of the data consists of proper nouns, which can be hardly indicative of BDI performance, and (2) there are pervasive gaps in the gold-standard targets. These issues appear to affect the ranking between cross-lingual embedding systems on individual languages, and the overall degree to which the systems differ in performance. With proper nouns removed from the data, the margin between the top two systems included in the study grows from 3.4% to 17.2%. Manual verification of the predictions, on the other hand, reveals that gaps in the gold standard targets artificially inflate the margin between the two systems on English to Bulgarian BDI from 0.1% to 6.7%. We thus suggest that future research either avoids drawing conclusions from quantitative results on this BDI dataset, or accompanies such evaluation with rigorous error analysis. |
Tasks | Word Embeddings |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05708v2 |
https://arxiv.org/pdf/1909.05708v2.pdf | |
PWC | https://paperswithcode.com/paper/lost-in-evaluation-misleading-benchmarks-for |
Repo | https://github.com/coastalcph/MUSE_dicos |
Framework | none |
MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments
Title | MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments |
Authors | Kenny Young, Tian Tian |
Abstract | The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games demonstrate aspects of competency we expect from an intelligent agent and are not biased toward any particular solution approach. The challenge of the ALE includes (1) the representation learning problem of extracting pertinent information from raw pixels, and (2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, the research questions we are interested in pertain more to the latter, but the representation learning problem adds significant computational expense. We introduce MinAtar, short for miniature Atari, a new set of environments that capture the general mechanics of specific Atari games while simplifying the representational complexity to focus more on the behavioural challenges. MinAtar consists of analogues of five Atari games: Seaquest, Breakout, Asterix, Freeway and Space Invaders. Each MinAtar environment provides the agent with a 10x10xn binary state representation. Each game plays out on a 10x10 grid with n channels corresponding to game-specific objects, such as ball, paddle and brick in the game Breakout. To investigate the behavioural challenges posed by MinAtar, we evaluated a smaller version of the DQN architecture as well as online actor-critic with eligibility traces. With the representation learning problem simplified, we can perform experiments with significantly less computational expense. In our experiments, we use the saved compute time to perform step-size parameter sweeps and more runs than is typical for the ALE. Experiments like this improve reproducibility, and allow us to draw more confident conclusions. We hope that MinAtar can allow researchers to thoroughly investigate behavioural challenges similar to those inherent in the ALE. |
Tasks | Atari Games, Representation Learning |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.03176v2 |
https://arxiv.org/pdf/1903.03176v2.pdf | |
PWC | https://paperswithcode.com/paper/minatar-an-atari-inspired-testbed-for-more |
Repo | https://github.com/kenjyoung/MinAtar |
Framework | pytorch |
Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation
Title | Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation |
Authors | George De Ath, Richard M. Everson, Alma A. M. Rahat, Jonathan E. Fieldsend |
Abstract | The performance of acquisition functions for Bayesian optimisation is investigated in terms of the Pareto front between exploration and exploitation. We show that Expected Improvement and the Upper Confidence Bound always select solutions to be expensively evaluated on the Pareto front, but Probability of Improvement is never guaranteed to do so and Weighted Expected Improvement does only for a restricted range of weights. We introduce two novel $\epsilon$-greedy acquisition functions. Extensive empirical evaluation of these together with random search, purely exploratory and purely exploitative search on 10 benchmark problems in 1 to 10 dimensions shows that $\epsilon$-greedy algorithms are generally at least as effective as conventional acquisition functions, particularly with a limited budget. In higher dimensions $\epsilon$-greedy approaches are shown to have improved performance over conventional approaches. These results are borne out on a real world computational fluid dynamics optimisation problem and a robotics active learning problem. |
Tasks | Active Learning, Bayesian Optimisation |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12809v1 |
https://arxiv.org/pdf/1911.12809v1.pdf | |
PWC | https://paperswithcode.com/paper/greed-is-good-exploration-and-exploitation |
Repo | https://github.com/georgedeath/egreedy |
Framework | none |
InceptionTime: Finding AlexNet for Time Series Classification
Title | InceptionTime: Finding AlexNet for Time Series Classification |
Authors | Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, François Petitjean |
Abstract | Time series classification (TSC) is the area of machine learning interested in learning how to assign labels to time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE is infeasible to use in many applications because of its very high training time complexity in O(N^2*T^4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 72,000s to learn from a small dataset with N=700 time series of short length T=46. Deep learning, on the other hand, has now received enormous attention because of its high scalability and state-of-the-art accuracy in computer vision and natural language processing tasks. Deep learning for TSC has only very recently started to be explored, with the first few architectures developed over the last 3 years only. The accuracy of deep learning for TSC has been raised to a competitive level, but has not quite reached the level of HIVE-COTE. This is what this paper achieves: outperforming HIVE-COTE’s accuracy together with scalability. We take an important step towards finding the AlexNet network for TSC by presenting InceptionTime—an ensemble of deep Convolutional Neural Network (CNN) models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime slightly outperforms HIVE-COTE with a win/draw/loss on the UCR archive of 40/6/39. Not only is InceptionTime more accurate, but it is much faster: InceptionTime learns from that same dataset with 700 time series in 2,300s but can also learn from a dataset with 8M time series in 13 hours, a quantity of data that is fully out of reach of HIVE-COTE. |
Tasks | Time Series, Time Series Classification |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04939v2 |
https://arxiv.org/pdf/1909.04939v2.pdf | |
PWC | https://paperswithcode.com/paper/dreamtime-finding-alexnet-for-time-series |
Repo | https://github.com/hfawaz/InceptionTime |
Framework | none |
Hyperspherical Prototype Networks
Title | Hyperspherical Prototype Networks |
Authors | Pascal Mettes, Elise van der Pol, Cees G. M. Snoek |
Abstract | This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches. |
Tasks | |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10514v3 |
https://arxiv.org/pdf/1901.10514v3.pdf | |
PWC | https://paperswithcode.com/paper/hyperspherical-prototype-networks |
Repo | https://github.com/psmmettes/hpn |
Framework | pytorch |
Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS
Title | Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS |
Authors | Shengyi Huang, Santiago Ontañón |
Abstract | This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in MicroRTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources. |
Tasks | |
Published | 2019-10-26 |
URL | https://arxiv.org/abs/1910.12134v2 |
https://arxiv.org/pdf/1910.12134v2.pdf | |
PWC | https://paperswithcode.com/paper/comparing-observation-and-action |
Repo | https://github.com/vwxyzjn/gym-microrts |
Framework | pytorch |
Adaptive Estimation for Approximate k-Nearest-Neighbor Computations
Title | Adaptive Estimation for Approximate k-Nearest-Neighbor Computations |
Authors | Daniel LeJeune, Richard G. Baraniuk, Reinhard Heckel |
Abstract | Algorithms often carry out equally many computations for “easy” and “hard” problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate k-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of k nearest neighbors of a given query point. We propose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method. |
Tasks | |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09465v1 |
http://arxiv.org/pdf/1902.09465v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-estimation-for-approximate-k-nearest |
Repo | https://github.com/dlej/adaptive-knn |
Framework | none |
Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition
Title | Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition |
Authors | Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, Aram Galstyan |
Abstract | We study the bias in several state-of-the-art named entity recognition (NER) models—specifically, a difference in the ability to recognize male and female names as PERSON entity types. We evaluate NER models on a dataset containing 139 years of U.S. census baby names and find that relatively more female names, as opposed to male names, are not recognized as PERSON entities. We study the extent of this bias in several NER systems that are used prominently in industry and academia. In addition, we also report a bias in the datasets on which these models were trained. The result of this analysis yields a new benchmark for gender bias evaluation in named entity recognition systems. The data and code for the application of this benchmark will be publicly available for researchers to use. |
Tasks | Named Entity Recognition |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.10872v1 |
https://arxiv.org/pdf/1910.10872v1.pdf | |
PWC | https://paperswithcode.com/paper/man-is-to-person-as-woman-is-to-location |
Repo | https://github.com/Ninarehm/NERGenderBias |
Framework | none |
Sequential Attention-based Network for Noetic End-to-End Response Selection
Title | Sequential Attention-based Network for Noetic End-to-End Response Selection |
Authors | Qian Chen, Wen Wang |
Abstract | The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper describes our systems that are ranked the top on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchy-based (utterance-level and token-level) neural networks to explicitly model the interactions among different turns’ utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking the top in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, and achieves new state-of-the-art performances on two large-scale public multi-turn response selection benchmark datasets. |
Tasks | Conversational Response Selection, Goal-Oriented Dialog |
Published | 2019-01-09 |
URL | https://arxiv.org/abs/1901.02609v3 |
https://arxiv.org/pdf/1901.02609v3.pdf | |
PWC | https://paperswithcode.com/paper/sequential-attention-based-network-for-noetic |
Repo | https://github.com/alibaba/esim-response-selection |
Framework | tf |
A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources
Title | A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources |
Authors | Shuguan Yang, Wei Ma, Xidong Pi, Sean Qian |
Abstract | A deep learning model is applied for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 10.6% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas. |
Tasks | |
Published | 2019-01-21 |
URL | https://arxiv.org/abs/1901.06758v5 |
https://arxiv.org/pdf/1901.06758v5.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-approach-to-real-time-parking |
Repo | https://github.com/BreadYang/GraphCNN_parking |
Framework | pytorch |
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Title | Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss |
Authors | Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma |
Abstract | Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains. |
Tasks | |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07413v2 |
https://arxiv.org/pdf/1906.07413v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-imbalanced-datasets-with-label |
Repo | https://github.com/feidfoe/AdjustBnd4Imbalance |
Framework | pytorch |