October 20, 2019

3269 words 16 mins read

Paper Group ANR 79

Can DNNs Learn to Lipread Full Sentences?. Comparing phonemes and visemes with DNN-based lipreading. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. Deep Positron: A Deep Neural Network Using the Posit Number System. EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis. Playing for Depth. Extracting Co …

Can DNNs Learn to Lipread Full Sentences?


Title	Can DNNs Learn to Lipread Full Sentences?
Authors	George Sterpu, Christian Saam, Naomi Harte
Abstract	Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread.
Tasks	Language Modelling, Lipreading
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11685v1
PDF	http://arxiv.org/pdf/1805.11685v1.pdf
PWC	https://paperswithcode.com/paper/can-dnns-learn-to-lipread-full-sentences
Repo
Framework

Comparing phonemes and visemes with DNN-based lipreading


Title	Comparing phonemes and visemes with DNN-based lipreading
Authors	Kwanchiva Thangthai, Helen L Bear, Richard Harvey
Abstract	There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.
Tasks	Lipreading
Published	2018-05-08
URL	http://arxiv.org/abs/1805.02924v1
PDF	http://arxiv.org/pdf/1805.02924v1.pdf
PWC	https://paperswithcode.com/paper/comparing-phonemes-and-visemes-with-dnn-based
Repo
Framework

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge


Title	Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Authors	Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
Abstract	We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurence algorithm. The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQuAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult nature of this task. We are also releasing the ARC Corpus, a corpus of 14M science sentences relevant to the task, and implementations of the three neural baseline models tested. Can your model perform better? We pose ARC as a challenge to the community.
Tasks	Question Answering
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05457v1
PDF	http://arxiv.org/pdf/1803.05457v1.pdf
PWC	https://paperswithcode.com/paper/think-you-have-solved-question-answering-try
Repo
Framework

Deep Positron: A Deep Neural Network Using the Posit Number System


Title	Deep Positron: A Deep Neural Network Using the Posit Number System
Authors	Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, Dhireesha Kudithipudi
Abstract	The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at $\leq$8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below.
Tasks
Published	2018-12-05
URL	http://arxiv.org/abs/1812.01762v2
PDF	http://arxiv.org/pdf/1812.01762v2.pdf
PWC	https://paperswithcode.com/paper/deep-positron-a-deep-neural-network-using-the
Repo
Framework

EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis


Title	EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis
Authors	Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Daniela Moctezuma
Abstract	Sentiment analysis (SA) is a task related to understanding people’s feelings in written text; the starting point would be to identify the polarity level (positive, neutral or negative) of a given text, moving on to identify emotions or whether a text is humorous or not. This task has been the subject of several research competitions in a number of languages, e.g., English, Spanish, and Arabic, among others. In this contribution, we propose an SA system, namely EvoMSA, that unifies our participating systems in various SA competitions, making it domain independent and multilingual by processing text using only language-independent techniques. EvoMSA is a classifier, based on Genetic Programming, that works by combining the output of different text classifiers and text models to produce the final prediction. We analyze EvoMSA on different SA competitions to provide a global overview of its performance, and as the results show, EvoMSA is competitive obtaining top rankings in several SA competitions. Furthermore, we performed an analysis of EvoMSA’s components to measure their contribution to the performance; the idea is to facilitate a practitioner or newcomer to implement a competitive SA classifier. Finally, it is worth to mention that EvoMSA is available as open-source software.
Tasks	Sentiment Analysis
Published	2018-11-29
URL	https://arxiv.org/abs/1812.02307v4
PDF	https://arxiv.org/pdf/1812.02307v4.pdf
PWC	https://paperswithcode.com/paper/evomsa-a-multilingual-evolutionary-approach
Repo
Framework

Playing for Depth


Title	Playing for Depth
Authors	Mohammad Mahdi Haji-Esmaeili, Gholamali Montazer
Abstract	Estimating the relative depth of a scene is a significant step towards understanding the general structure of the depicted scenery, the relations of entities in the scene and their interactions. When faced with the task of estimating depth without the use of Stereo images, we are dependent on the availability of large-scale depth datasets and high-capacity models to capture the intrinsic nature of depth. Unfortunately, creating datasets of depth images is not a trivial task as the requirements for the camera mainly limits us to areas where we can provide the necessities for the camera to work. In this work, we present a new depth dataset captured from Video Games in an easy and reproducible way. The nature of open-world video games gives us the ability to capture high-quality depth maps in the wild without the constrictions of Stereo cameras. Experiments on this dataset shows that using such synthetic datasets increases the accuracy of Monocular Depth Estimation in the wild where other approaches usually fail to generalize.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06268v1
PDF	http://arxiv.org/pdf/1810.06268v1.pdf
PWC	https://paperswithcode.com/paper/playing-for-depth
Repo
Framework

Extracting Contact and Motion from Manipulation Videos


Title	Extracting Contact and Motion from Manipulation Videos
Authors	Konstantinos Zampogiannis, Kanishka Ganguly, Cornelia Fermuller, Yiannis Aloimonos
Abstract	When we physically interact with our environment using our hands, we touch objects and force them to move: contact and motion are defining properties of manipulation. In this paper, we present an active, bottom-up method for the detection of actor-object contacts and the extraction of moved objects and their motions in RGBD videos of manipulation actions. At the core of our approach lies non-rigid registration: we continuously warp a point cloud model of the observed scene to the current video frame, generating a set of dense 3D point trajectories. Under loose assumptions, we employ simple point cloud segmentation techniques to extract the actor and subsequently detect actor-environment contacts based on the estimated trajectories. For each such interaction, using the detected contact as an attention mechanism, we obtain an initial motion segment for the manipulated object by clustering trajectories in the contact area vicinity and then we jointly refine the object segment and estimate its 6DOF pose in all observed frames. Because of its generality and the fundamental, yet highly informative, nature of its outputs, our approach is applicable to a wide range of perception and planning tasks. We qualitatively evaluate our method on a number of input sequences and present a comprehensive robot imitation learning example, in which we demonstrate the crucial role of our outputs in developing action representations/plans from observation.
Tasks	Imitation Learning
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04870v3
PDF	http://arxiv.org/pdf/1807.04870v3.pdf
PWC	https://paperswithcode.com/paper/extracting-contact-and-motion-from
Repo
Framework

Deep Learning for Causal Inference


Title	Deep Learning for Causal Inference
Authors	Vikas Ramachandra
Abstract	In this paper, we propose deep learning techniques for econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1. For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space. This deep learning based technique is shown to perform better than simple k nearest neighbor matching for estimating treatment effects, especially when the data points have several features/covariates but reside in a low dimensional manifold in high dimensional space. We also observe better performance than manifold learning methods for neighbor matching. 2. Propensity score matching is one specific and popular way to perform matching in order to estimate average and individual treatment effects. We propose the use of deep neural networks (DNNs) for propensity score matching, and present a network called PropensityNet for this. This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching. Code for both methods will be made available shortly on Github at: https://github.com/vikas84bf
Tasks	Causal Inference, Dimensionality Reduction
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00149v1
PDF	http://arxiv.org/pdf/1803.00149v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-causal-inference
Repo
Framework

Analyzing DNA Hybridization via machine learning


Title	Analyzing DNA Hybridization via machine learning
Authors	Weijun Zhu
Abstract	In DNA computing, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. In order to address this common problem, we introduce a new method based on the machine learning technique. First, a sample set is employed to train the Boosted Tree (BT) algorithm, and the corresponding model is obtained. Second, this model is used to predict classification results of molecular hybridizations. The experiments show that the average accuracy of the new method is over 94.2%, and its average efficiency is over 90839 times higher than that of the existing method. These results indicate that the new method can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.11062v2
PDF	http://arxiv.org/pdf/1803.11062v2.pdf
PWC	https://paperswithcode.com/paper/analyzing-dna-hybridization-via-machine
Repo
Framework

Snapshot Distillation: Teacher-Student Optimization in One Generation


Title	Snapshot Distillation: Teacher-Student Optimization in One Generation
Authors	Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille
Abstract	Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches are often considerably slow due to the pipeline of training a few generations in sequence, i.e., time complexity is increased by several times. This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation. The idea of SD is very simple: instead of borrowing supervision signals from previous generations, we extract such information from earlier epochs in the same generation, meanwhile make sure that the difference between teacher and student is sufficiently large so as to prevent under-fitting. To achieve this goal, we implement SD in a cyclic learning rate policy, in which the last snapshot of each cycle is used as the teacher for all iterations in the next cycle, and the teacher signal is smoothed to provide richer information. In standard image classification benchmarks such as CIFAR100 and ILSVRC2012, SD achieves consistent accuracy gain without heavy computational overheads. We also verify that models pre-trained with SD transfers well to object detection and semantic segmentation in the PascalVOC dataset.
Tasks	Image Classification, Object Detection, Semantic Segmentation
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00123v1
PDF	http://arxiv.org/pdf/1812.00123v1.pdf
PWC	https://paperswithcode.com/paper/snapshot-distillation-teacher-student
Repo
Framework

On the Inter-relationships among Drift rate, Forgetting rate, Bias/variance profile and Error


Title	On the Inter-relationships among Drift rate, Forgetting rate, Bias/variance profile and Error
Authors	Nayyar A. Zaidi, Geoffrey I. Webb, Francois Petitjean, Germain Forestier
Abstract	We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner’s forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.
Tasks
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09354v2
PDF	http://arxiv.org/pdf/1801.09354v2.pdf
PWC	https://paperswithcode.com/paper/on-the-inter-relationships-among-drift-rate
Repo
Framework

CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach


Title	CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach
Authors	Alexis Decurninge, Luis García Ordóñez, Paul Ferrand, He Gaoning, Li Bojie, Zhang Wei, Maxime Guillaud
Abstract	We report on experimental results on the use of a learning-based approach to infer the location of a mobile user of a cellular network within a cell, for a 5G-type Massive multiple input, multiple output (MIMO) system. We describe how the sample spatial covariance matrix computed from the CSI can be used as the input to a learning algorithm which attempts to relate it to user location. We discuss several learning approaches, and analyze in depth the application of extreme learning machines, for which theoretical approximate performance benchmarks are available, to the localization problem. We validate the proposed approach using experimental data collected on a Huawei 5G testbed, provide some performance and robustness benchmarks, and discuss practical issues related to the deployment of such a technique in 5G networks.
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07447v1
PDF	http://arxiv.org/pdf/1806.07447v1.pdf
PWC	https://paperswithcode.com/paper/csi-based-outdoor-localization-for-massive
Repo
Framework

Interactive Naming for Explaining Deep Neural Networks: A Formative Study


Title	Interactive Naming for Explaining Deep Neural Networks: A Formative Study
Authors	Mandana Hamidi-Haines, Zhongang Qi, Alan Fern, Fuxin Li, Prasad Tadepalli
Abstract	We consider the problem of explaining the decisions of deep neural networks for image recognition in terms of human-recognizable visual concepts. In particular, given a test set of images, we aim to explain each classification in terms of a small number of image regions, or activation maps, which have been associated with semantic concepts by a human annotator. This allows for generating summary views of the typical reasons for classifications, which can help build trust in a classifier and/or identify example types for which the classifier may not be trusted. For this purpose, we developed a user interface for “interactive naming,” which allows a human annotator to manually cluster significant activation maps in a test set into meaningful groups called “visual concepts”. The main contribution of this paper is a systematic study of the visual concepts produced by five human annotators using the interactive naming interface. In particular, we consider the adequacy of the concepts for explaining the classification of test-set images, correspondence of the concepts to activations of individual neurons, and the inter-annotator agreement of visual concepts. We find that a large fraction of the activation maps have recognizable visual concepts, and that there is significant agreement between the different annotators about their denotations. Our work is an exploratory study of the interplay between machine learning and human recognition mediated by visualizations of the results of learning.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07150v2
PDF	http://arxiv.org/pdf/1812.07150v2.pdf
PWC	https://paperswithcode.com/paper/interactive-naming-for-explaining-deep-neural
Repo
Framework

Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback


Title	Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback
Authors	Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu
Abstract	In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results. We study active image search where human feedback is solicited exclusively in visual form, without using relative attribute annotations used by prior work which are not typically found in many datasets. In order to optimize the image selection strategy, a deep reinforcement model is trained to learn what images are informative rather than rely on hand-crafted measures typically leveraged in prior work. Additionally, we extend the recently introduced Conditional Similarity Network to incorporate global similarity in training visual embeddings, which results in more natural transitions as the user explores the learned similarity embeddings. Our experiments demonstrate the effectiveness of our approach, producing compelling results on both active image search and image attribute representation tasks.
Tasks	Image Retrieval
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08714v1
PDF	http://arxiv.org/pdf/1809.08714v1.pdf
PWC	https://paperswithcode.com/paper/give-me-a-hint-navigating-image-databases
Repo
Framework

Visual Object Categorization Based on Hierarchical Shape Motifs Learned From Noisy Point Cloud Decompositions


Title	Visual Object Categorization Based on Hierarchical Shape Motifs Learned From Noisy Point Cloud Decompositions
Authors	Christian A. Mueller, Andreas Birk
Abstract	Object shape is a key cue that contributes to the semantic understanding of objects. In this work we focus on the categorization of real-world object point clouds to particular shape types. Therein surface description and representation of object shape structure have significant influence on shape categorization accuracy, when dealing with real-world scenes featuring noisy, partial and occluded object observations. An unsupervised hierarchical learning procedure is utilized here to symbolically describe surface characteristics on multiple semantic levels. Furthermore, a constellation model is proposed that hierarchically decomposes objects. The decompositions are described as constellations of symbols (shape motifs) in a gradual order, hence reflecting shape structure from local to global, i.e., from parts over groups of parts to entire objects. The combination of this multi-level description of surfaces and the hierarchical decomposition of shapes leads to a representation which allows to conceptualize shapes. An object discrimination has been observed in experiments with seven categories featuring instances with sensor noise, occlusions as well as inter-category and intra-category similarities. Experiments include the evaluation of the proposed description and shape decomposition approach, and comparisons to Fast Point Feature Histograms, a Vocabulary Tree and a neural network-based Deep Learning method. Furthermore, experiments are conducted with alternative datasets which analyze the generalization capability of the proposed approach.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.01117v1
PDF	http://arxiv.org/pdf/1804.01117v1.pdf
PWC	https://paperswithcode.com/paper/visual-object-categorization-based-on
Repo
Framework