Paper Group ANR 79
Can DNNs Learn to Lipread Full Sentences?. Comparing phonemes and visemes with DNN-based lipreading. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. Deep Positron: A Deep Neural Network Using the Posit Number System. EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis. Playing for Depth. Extracting Co …
Can DNNs Learn to Lipread Full Sentences?
Title | Can DNNs Learn to Lipread Full Sentences? |
Authors | George Sterpu, Christian Saam, Naomi Harte |
Abstract | Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread. |
Tasks | Language Modelling, Lipreading |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11685v1 |
http://arxiv.org/pdf/1805.11685v1.pdf | |
PWC | https://paperswithcode.com/paper/can-dnns-learn-to-lipread-full-sentences |
Repo | |
Framework | |
Comparing phonemes and visemes with DNN-based lipreading
Title | Comparing phonemes and visemes with DNN-based lipreading |
Authors | Kwanchiva Thangthai, Helen L Bear, Richard Harvey |
Abstract | There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words. |
Tasks | Lipreading |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02924v1 |
http://arxiv.org/pdf/1805.02924v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-phonemes-and-visemes-with-dnn-based |
Repo | |
Framework | |
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Title | Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge |
Authors | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord |
Abstract | We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurence algorithm. The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQuAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult nature of this task. We are also releasing the ARC Corpus, a corpus of 14M science sentences relevant to the task, and implementations of the three neural baseline models tested. Can your model perform better? We pose ARC as a challenge to the community. |
Tasks | Question Answering |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05457v1 |
http://arxiv.org/pdf/1803.05457v1.pdf | |
PWC | https://paperswithcode.com/paper/think-you-have-solved-question-answering-try |
Repo | |
Framework | |
Deep Positron: A Deep Neural Network Using the Posit Number System
Title | Deep Positron: A Deep Neural Network Using the Posit Number System |
Authors | Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, Dhireesha Kudithipudi |
Abstract | The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at $\leq$8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below. |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01762v2 |
http://arxiv.org/pdf/1812.01762v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-positron-a-deep-neural-network-using-the |
Repo | |
Framework | |
EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis
Title | EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis |
Authors | Mario Graff, Sabino Miranda-Jiménez, Eric S. Tellez, Daniela Moctezuma |
Abstract | Sentiment analysis (SA) is a task related to understanding people’s feelings in written text; the starting point would be to identify the polarity level (positive, neutral or negative) of a given text, moving on to identify emotions or whether a text is humorous or not. This task has been the subject of several research competitions in a number of languages, e.g., English, Spanish, and Arabic, among others. In this contribution, we propose an SA system, namely EvoMSA, that unifies our participating systems in various SA competitions, making it domain independent and multilingual by processing text using only language-independent techniques. EvoMSA is a classifier, based on Genetic Programming, that works by combining the output of different text classifiers and text models to produce the final prediction. We analyze EvoMSA on different SA competitions to provide a global overview of its performance, and as the results show, EvoMSA is competitive obtaining top rankings in several SA competitions. Furthermore, we performed an analysis of EvoMSA’s components to measure their contribution to the performance; the idea is to facilitate a practitioner or newcomer to implement a competitive SA classifier. Finally, it is worth to mention that EvoMSA is available as open-source software. |
Tasks | Sentiment Analysis |
Published | 2018-11-29 |
URL | https://arxiv.org/abs/1812.02307v4 |
https://arxiv.org/pdf/1812.02307v4.pdf | |
PWC | https://paperswithcode.com/paper/evomsa-a-multilingual-evolutionary-approach |
Repo | |
Framework | |
Playing for Depth
Title | Playing for Depth |
Authors | Mohammad Mahdi Haji-Esmaeili, Gholamali Montazer |
Abstract | Estimating the relative depth of a scene is a significant step towards understanding the general structure of the depicted scenery, the relations of entities in the scene and their interactions. When faced with the task of estimating depth without the use of Stereo images, we are dependent on the availability of large-scale depth datasets and high-capacity models to capture the intrinsic nature of depth. Unfortunately, creating datasets of depth images is not a trivial task as the requirements for the camera mainly limits us to areas where we can provide the necessities for the camera to work. In this work, we present a new depth dataset captured from Video Games in an easy and reproducible way. The nature of open-world video games gives us the ability to capture high-quality depth maps in the wild without the constrictions of Stereo cameras. Experiments on this dataset shows that using such synthetic datasets increases the accuracy of Monocular Depth Estimation in the wild where other approaches usually fail to generalize. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2018-10-15 |
URL | http://arxiv.org/abs/1810.06268v1 |
http://arxiv.org/pdf/1810.06268v1.pdf | |
PWC | https://paperswithcode.com/paper/playing-for-depth |
Repo | |
Framework | |
Extracting Contact and Motion from Manipulation Videos
Title | Extracting Contact and Motion from Manipulation Videos |
Authors | Konstantinos Zampogiannis, Kanishka Ganguly, Cornelia Fermuller, Yiannis Aloimonos |
Abstract | When we physically interact with our environment using our hands, we touch objects and force them to move: contact and motion are defining properties of manipulation. In this paper, we present an active, bottom-up method for the detection of actor-object contacts and the extraction of moved objects and their motions in RGBD videos of manipulation actions. At the core of our approach lies non-rigid registration: we continuously warp a point cloud model of the observed scene to the current video frame, generating a set of dense 3D point trajectories. Under loose assumptions, we employ simple point cloud segmentation techniques to extract the actor and subsequently detect actor-environment contacts based on the estimated trajectories. For each such interaction, using the detected contact as an attention mechanism, we obtain an initial motion segment for the manipulated object by clustering trajectories in the contact area vicinity and then we jointly refine the object segment and estimate its 6DOF pose in all observed frames. Because of its generality and the fundamental, yet highly informative, nature of its outputs, our approach is applicable to a wide range of perception and planning tasks. We qualitatively evaluate our method on a number of input sequences and present a comprehensive robot imitation learning example, in which we demonstrate the crucial role of our outputs in developing action representations/plans from observation. |
Tasks | Imitation Learning |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.04870v3 |
http://arxiv.org/pdf/1807.04870v3.pdf | |
PWC | https://paperswithcode.com/paper/extracting-contact-and-motion-from |
Repo | |
Framework | |
Deep Learning for Causal Inference
Title | Deep Learning for Causal Inference |
Authors | Vikas Ramachandra |
Abstract | In this paper, we propose deep learning techniques for econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1. For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space. This deep learning based technique is shown to perform better than simple k nearest neighbor matching for estimating treatment effects, especially when the data points have several features/covariates but reside in a low dimensional manifold in high dimensional space. We also observe better performance than manifold learning methods for neighbor matching. 2. Propensity score matching is one specific and popular way to perform matching in order to estimate average and individual treatment effects. We propose the use of deep neural networks (DNNs) for propensity score matching, and present a network called PropensityNet for this. This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching. Code for both methods will be made available shortly on Github at: https://github.com/vikas84bf |
Tasks | Causal Inference, Dimensionality Reduction |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00149v1 |
http://arxiv.org/pdf/1803.00149v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-causal-inference |
Repo | |
Framework | |
Analyzing DNA Hybridization via machine learning
Title | Analyzing DNA Hybridization via machine learning |
Authors | Weijun Zhu |
Abstract | In DNA computing, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. In order to address this common problem, we introduce a new method based on the machine learning technique. First, a sample set is employed to train the Boosted Tree (BT) algorithm, and the corresponding model is obtained. Second, this model is used to predict classification results of molecular hybridizations. The experiments show that the average accuracy of the new method is over 94.2%, and its average efficiency is over 90839 times higher than that of the existing method. These results indicate that the new method can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.11062v2 |
http://arxiv.org/pdf/1803.11062v2.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-dna-hybridization-via-machine |
Repo | |
Framework | |
Snapshot Distillation: Teacher-Student Optimization in One Generation
Title | Snapshot Distillation: Teacher-Student Optimization in One Generation |
Authors | Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille |
Abstract | Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches are often considerably slow due to the pipeline of training a few generations in sequence, i.e., time complexity is increased by several times. This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation. The idea of SD is very simple: instead of borrowing supervision signals from previous generations, we extract such information from earlier epochs in the same generation, meanwhile make sure that the difference between teacher and student is sufficiently large so as to prevent under-fitting. To achieve this goal, we implement SD in a cyclic learning rate policy, in which the last snapshot of each cycle is used as the teacher for all iterations in the next cycle, and the teacher signal is smoothed to provide richer information. In standard image classification benchmarks such as CIFAR100 and ILSVRC2012, SD achieves consistent accuracy gain without heavy computational overheads. We also verify that models pre-trained with SD transfers well to object detection and semantic segmentation in the PascalVOC dataset. |
Tasks | Image Classification, Object Detection, Semantic Segmentation |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00123v1 |
http://arxiv.org/pdf/1812.00123v1.pdf | |
PWC | https://paperswithcode.com/paper/snapshot-distillation-teacher-student |
Repo | |
Framework | |
On the Inter-relationships among Drift rate, Forgetting rate, Bias/variance profile and Error
Title | On the Inter-relationships among Drift rate, Forgetting rate, Bias/variance profile and Error |
Authors | Nayyar A. Zaidi, Geoffrey I. Webb, Francois Petitjean, Germain Forestier |
Abstract | We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner’s forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems. |
Tasks | |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09354v2 |
http://arxiv.org/pdf/1801.09354v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-inter-relationships-among-drift-rate |
Repo | |
Framework | |
CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach
Title | CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach |
Authors | Alexis Decurninge, Luis García Ordóñez, Paul Ferrand, He Gaoning, Li Bojie, Zhang Wei, Maxime Guillaud |
Abstract | We report on experimental results on the use of a learning-based approach to infer the location of a mobile user of a cellular network within a cell, for a 5G-type Massive multiple input, multiple output (MIMO) system. We describe how the sample spatial covariance matrix computed from the CSI can be used as the input to a learning algorithm which attempts to relate it to user location. We discuss several learning approaches, and analyze in depth the application of extreme learning machines, for which theoretical approximate performance benchmarks are available, to the localization problem. We validate the proposed approach using experimental data collected on a Huawei 5G testbed, provide some performance and robustness benchmarks, and discuss practical issues related to the deployment of such a technique in 5G networks. |
Tasks | |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07447v1 |
http://arxiv.org/pdf/1806.07447v1.pdf | |
PWC | https://paperswithcode.com/paper/csi-based-outdoor-localization-for-massive |
Repo | |
Framework | |
Interactive Naming for Explaining Deep Neural Networks: A Formative Study
Title | Interactive Naming for Explaining Deep Neural Networks: A Formative Study |
Authors | Mandana Hamidi-Haines, Zhongang Qi, Alan Fern, Fuxin Li, Prasad Tadepalli |
Abstract | We consider the problem of explaining the decisions of deep neural networks for image recognition in terms of human-recognizable visual concepts. In particular, given a test set of images, we aim to explain each classification in terms of a small number of image regions, or activation maps, which have been associated with semantic concepts by a human annotator. This allows for generating summary views of the typical reasons for classifications, which can help build trust in a classifier and/or identify example types for which the classifier may not be trusted. For this purpose, we developed a user interface for “interactive naming,” which allows a human annotator to manually cluster significant activation maps in a test set into meaningful groups called “visual concepts”. The main contribution of this paper is a systematic study of the visual concepts produced by five human annotators using the interactive naming interface. In particular, we consider the adequacy of the concepts for explaining the classification of test-set images, correspondence of the concepts to activations of individual neurons, and the inter-annotator agreement of visual concepts. We find that a large fraction of the activation maps have recognizable visual concepts, and that there is significant agreement between the different annotators about their denotations. Our work is an exploratory study of the interplay between machine learning and human recognition mediated by visualizations of the results of learning. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07150v2 |
http://arxiv.org/pdf/1812.07150v2.pdf | |
PWC | https://paperswithcode.com/paper/interactive-naming-for-explaining-deep-neural |
Repo | |
Framework | |
Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback
Title | Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback |
Authors | Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu |
Abstract | In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results. We study active image search where human feedback is solicited exclusively in visual form, without using relative attribute annotations used by prior work which are not typically found in many datasets. In order to optimize the image selection strategy, a deep reinforcement model is trained to learn what images are informative rather than rely on hand-crafted measures typically leveraged in prior work. Additionally, we extend the recently introduced Conditional Similarity Network to incorporate global similarity in training visual embeddings, which results in more natural transitions as the user explores the learned similarity embeddings. Our experiments demonstrate the effectiveness of our approach, producing compelling results on both active image search and image attribute representation tasks. |
Tasks | Image Retrieval |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08714v1 |
http://arxiv.org/pdf/1809.08714v1.pdf | |
PWC | https://paperswithcode.com/paper/give-me-a-hint-navigating-image-databases |
Repo | |
Framework | |
Visual Object Categorization Based on Hierarchical Shape Motifs Learned From Noisy Point Cloud Decompositions
Title | Visual Object Categorization Based on Hierarchical Shape Motifs Learned From Noisy Point Cloud Decompositions |
Authors | Christian A. Mueller, Andreas Birk |
Abstract | Object shape is a key cue that contributes to the semantic understanding of objects. In this work we focus on the categorization of real-world object point clouds to particular shape types. Therein surface description and representation of object shape structure have significant influence on shape categorization accuracy, when dealing with real-world scenes featuring noisy, partial and occluded object observations. An unsupervised hierarchical learning procedure is utilized here to symbolically describe surface characteristics on multiple semantic levels. Furthermore, a constellation model is proposed that hierarchically decomposes objects. The decompositions are described as constellations of symbols (shape motifs) in a gradual order, hence reflecting shape structure from local to global, i.e., from parts over groups of parts to entire objects. The combination of this multi-level description of surfaces and the hierarchical decomposition of shapes leads to a representation which allows to conceptualize shapes. An object discrimination has been observed in experiments with seven categories featuring instances with sensor noise, occlusions as well as inter-category and intra-category similarities. Experiments include the evaluation of the proposed description and shape decomposition approach, and comparisons to Fast Point Feature Histograms, a Vocabulary Tree and a neural network-based Deep Learning method. Furthermore, experiments are conducted with alternative datasets which analyze the generalization capability of the proposed approach. |
Tasks | |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.01117v1 |
http://arxiv.org/pdf/1804.01117v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-object-categorization-based-on |
Repo | |
Framework | |