January 28, 2020

3588 words 17 mins read

Paper Group ANR 919

Paper Group ANR 919

Local Area Transform for Cross-Modality Correspondence Matching and Deep Scene Recognition. Simultaneous Neural Machine Translation using Connectionist Temporal Classification. Neural Machine Translation with Explicit Phrase Alignment. Precipitation Nowcasting with Satellite Imagery. Chinese Street View Text: Large-scale Chinese Text Reading with P …

Local Area Transform for Cross-Modality Correspondence Matching and Deep Scene Recognition

Title Local Area Transform for Cross-Modality Correspondence Matching and Deep Scene Recognition
Authors Seungchul Ryu
Abstract Establishing correspondences is a fundamental task in variety of image processing and computer vision applications. In particular, finding the correspondences between a non-linearly deformed image pair induced by different modality conditions is a challenging problem. This paper describes a efficient but powerful image transform called local area transform (LAT) for modality-robust correspondence estimation. Specifically, LAT transforms an image from the intensity domain to the local area domain, which is invariant under nonlinear intensity deformations, especially radiometric, photometric, and spectral deformations. In addition, robust feature descriptors are reformulated with LAT for several practical applications. Furthermore, LAT-convolution layer and Aception block are proposed and, with these novel components, deep neural network called LAT-Net is proposed especially for scene recognition task. Experimental results show that LATransformed images provide a consistency for nonlinearly deformed images, even under random intensity deformations. LAT reduces the mean absolute difference as compared to conventional methods. Furthermore, the reformulation of descriptors with LAT shows superiority to conventional methods, which is a promising result for the tasks of cross-spectral and modality correspondence matching. the local area can be considered as an alternative domain to the intensity domain to achieve robust correspondence matching, image recognition, and a lot of applications: such as feature matching, stereo matching, dense correspondence matching, image recognition, and image retrieval.
Tasks Image Retrieval, Scene Recognition, Stereo Matching, Stereo Matching Hand
Published 2019-01-03
URL http://arxiv.org/abs/1901.00927v1
PDF http://arxiv.org/pdf/1901.00927v1.pdf
PWC https://paperswithcode.com/paper/local-area-transform-for-cross-modality
Repo
Framework

Simultaneous Neural Machine Translation using Connectionist Temporal Classification

Title Simultaneous Neural Machine Translation using Connectionist Temporal Classification
Authors Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
Abstract Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input. This task faces a trade-off between translation accuracy and latency. We have to determine when we start the translation for observed inputs so far, to achieve good practical performance. In this work, we propose a neural machine translation method to determine this timing in an adaptive manner. The proposed method introduces a special token ‘’, which is generated when the translation model chooses to read the next input token instead of generating an output token. It also introduces an objective function to handle the ambiguity in wait timings that can be optimized using an algorithm called Connectionist Temporal Classification (CTC). The use of CTC enables the optimization to consider all possible output sequences including ‘’ that are equivalent to the reference translations and to choose the best one adaptively. We apply the proposed method into simultaneous translation from English to Japanese and investigate its performance and remaining problems.
Tasks Machine Translation
Published 2019-11-27
URL https://arxiv.org/abs/1911.11933v1
PDF https://arxiv.org/pdf/1911.11933v1.pdf
PWC https://paperswithcode.com/paper/simultaneous-neural-machine-translation-using
Repo
Framework

Neural Machine Translation with Explicit Phrase Alignment

Title Neural Machine Translation with Explicit Phrase Alignment
Authors Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Yang Liu
Abstract While neural machine translation (NMT) has achieved state-of-the-art translation performance, it is unable to capture the alignment between the input and output during the translation process. The lack of alignment in NMT models leads to three problems: it is hard to (1) interpret the translation process, (2) impose lexical constraints, and (3) impose structural constraints. To alleviate these problems, we propose to introduce explicit phrase alignment into the translation process of arbitrary NMT models. The key idea is to build a search space similar to that of phrase-based statistical machine translation for NMT where phrase alignment is readily available. We design a new decoding algorithm that can easily impose lexical and structural constraints. Experiments show that our approach makes the translation process of NMT more interpretable without sacrificing translation quality. In addition, our approach achieves significant improvements in lexically and structurally constrained translation tasks.
Tasks Machine Translation
Published 2019-11-26
URL https://arxiv.org/abs/1911.11520v3
PDF https://arxiv.org/pdf/1911.11520v3.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-with-explicit
Repo
Framework

Precipitation Nowcasting with Satellite Imagery

Title Precipitation Nowcasting with Satellite Imagery
Authors Vadim Lebedev, Vladimir Ivashkin, Irina Rudenko, Alexander Ganshin, Alexander Molchanov, Sergey Ovcharenko, Ruslan Grokhovetskiy, Ivan Bushmarinov, Dmitry Solomentsev
Abstract Precipitation nowcasting is a short-range forecast of rain/snow (up to 2 hours), often displayed on top of the geographical map by the weather service. Modern precipitation nowcasting algorithms rely on the extrapolation of observations by ground-based radars via optical flow techniques or neural network models. Dependent on these radars, typical nowcasting is limited to the regions around their locations. We have developed a method for precipitation nowcasting based on geostationary satellite imagery and incorporated the resulting data into the Yandex.Weather precipitation map (including an alerting service with push notifications for products in the Yandex ecosystem), thus expanding its coverage and paving the way to a truly global nowcasting service.
Tasks Optical Flow Estimation
Published 2019-05-23
URL https://arxiv.org/abs/1905.09932v1
PDF https://arxiv.org/pdf/1905.09932v1.pdf
PWC https://paperswithcode.com/paper/precipitation-nowcasting-with-satellite
Repo
Framework

Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning

Title Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning
Authors Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu
Abstract Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data. To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430,000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks. To recognize Chinese text in the wild while keeping large-scale datasets labeling cost-effective, we propose to annotate one part of the CSVT dataset (30,000 images) in locations and text labels as full annotations and add 400,000 more images, where only the corresponding text-of-interest in the regions is given as weak annotations. To exploit the rich information from the weakly annotated data, we design a text reading network in a partially supervised learning framework, which enables to localize and recognize text, learn from fully and weakly annotated data simultaneously. To localize the best matched text proposals from weakly labeled images, we propose an online proposal matching module incorporated in the whole model, spotting the keyword regions by sharing parameters for end-to-end training. Compared with fully supervised training algorithms, this model can improve the end-to-end recognition performance remarkably by 4.03% in F-score at the same labeling cost. The proposed model can also achieve state-of-the-art results on the ICDAR 2017-RCTW dataset, which demonstrates the effectiveness of the proposed partially supervised learning framework.
Tasks
Published 2019-09-17
URL https://arxiv.org/abs/1909.07808v2
PDF https://arxiv.org/pdf/1909.07808v2.pdf
PWC https://paperswithcode.com/paper/chinese-street-view-text-large-scale-chinese
Repo
Framework

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning

Title Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning
Authors Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
Abstract We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using GQSAT are complete SAT solvers that either provide a satisfying assignment or a proof of unsatisfiability, which is required for many SAT applications. The branching heuristic commonly used in SAT solvers today suffers from bad decisions during their warm-up period, whereas GQSAT has been trained to examine the structure of the particular problem instance to make better decisions at the beginning of the search. Training GQSAT is data efficient and does not require elaborate dataset preparation or feature engineering to train. We train GQSAT on small SAT problems using RL interfacing with an existing SAT solver. We show that GQSAT is able to reduce the number of iterations required to solve SAT problems by 2-3X, and it generalizes to unsatisfiable SAT instances, as well as to problems with 5X more variables than it was trained on. We also show that, to a lesser extent, it generalizes to SAT problems from different domains by evaluating it on graph coloring. Our experiments show that augmenting SAT solvers with agents trained with RL and graph neural networks can improve performance on the SAT search problem.
Tasks Feature Engineering
Published 2019-09-26
URL https://arxiv.org/abs/1909.11830v1
PDF https://arxiv.org/pdf/1909.11830v1.pdf
PWC https://paperswithcode.com/paper/improving-sat-solver-heuristics-with-graph
Repo
Framework

Slices of Attention in Asynchronous Video Job Interviews

Title Slices of Attention in Asynchronous Video Job Interviews
Authors Léo Hemamou, Ghazi Felhi, Jean-Claude Martin, Chloé Clavel
Abstract The impact of non verbal behaviour in a hiring decision remains an open question. Investigating this question is important, as it could provide a better understanding on how to train candidates for job interviews and make recruiters be aware of influential non verbal behaviour. This research has recently been accelerated due to the development of tools for the automatic analysis of social signals, and the emergence of machine learning methods. However, these studies are still mainly based on hand engineered features, which imposes a limit to the discovery of influential social signals. On the other side, deep learning methods are a promising tool to discover complex patterns without the necessity of feature engineering. In this paper, we focus on studying influential non verbal social signals in asynchronous job video interviews that are discovered by deep learning methods. We use a previously published deep learning system that aims at inferring the hirability of a candidate with regard to a sequence of interview questions. One particularity of this system is the use of attention mechanisms, which aim at identifying the relevant parts of an answer. Thus, information at a fine-grained temporal level could be extracted using global (at the interview level) annotations on hirability. While most of the deep learning systems use attention mechanisms to offer a quick visualization of slices when a rise of attention occurs, we perform an in-depth analysis to understand what happens during these moments. First, we propose a methodology to automatically extract slices where there is a rise of attention (attention slices). Second, we study the content of attention slices by comparing them with randomly sampled slices. Finally, we show that they bear significantly more information for hirability than randomly sampled slices.
Tasks Feature Engineering
Published 2019-09-19
URL https://arxiv.org/abs/1909.08845v1
PDF https://arxiv.org/pdf/1909.08845v1.pdf
PWC https://paperswithcode.com/paper/slices-of-attention-in-asynchronous-video-job
Repo
Framework

Communication and Memory Efficient Testing of Discrete Distributions

Title Communication and Memory Efficient Testing of Discrete Distributions
Authors Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, Sankeerth Rao
Abstract We study distribution testing with communication and memory constraints in the following computational models: (1) The {\em one-pass streaming model} where the goal is to minimize the sample complexity of the protocol subject to a memory constraint, and (2) A {\em distributed model} where the data samples reside at multiple machines and the goal is to minimize the communication cost of the protocol. In both these models, we provide efficient algorithms for uniformity/identity testing (goodness of fit) and closeness testing (two sample testing). Moreover, we show nearly-tight lower bounds on (1) the sample complexity of any one-pass streaming tester for uniformity, subject to the memory constraint, and (2) the communication cost of any uniformity testing protocol, in a restricted `one-pass’ model of communication. |
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.04709v1
PDF https://arxiv.org/pdf/1906.04709v1.pdf
PWC https://paperswithcode.com/paper/communication-and-memory-efficient-testing-of
Repo
Framework

Evaluating the Effectiveness of Automated Identity Masking (AIM) Methods with Human Perception and a Deep Convolutional Neural Network (CNN)

Title Evaluating the Effectiveness of Automated Identity Masking (AIM) Methods with Human Perception and a Deep Convolutional Neural Network (CNN)
Authors Kimberley D. Orsten-Hooge, Asal Baragchizadeh, Thomas P. Karnowski, David S. Bolme, Regina Ferrell, Parisa R. Jesudasen, Carlos D. Castillo, Alice J. O’Toole
Abstract Face de-identification algorithms have been developed in response to the prevalent use of public video recordings and surveillance cameras. Here, we evaluated the success of identity masking in the context of monitoring drivers as they actively operate a motor vehicle. We studied the effectiveness of eight de-identification algorithms using human perceivers and a state-of-the-art deep convolutional neural network (CNN). We used a standard face recognition experiment in which human subjects studied high-resolution (studio-style) images to learn driver identities. Subjects were tested subsequently on their ability to recognize those identities in low-resolution videos depicting the drivers operating a motor vehicle. The videos were in either unmasked format, or were masked by one of the eight de-identification algorithms. All masking algorithms lowered identification accuracy substantially, relative to the unmasked video. In all cases, identifications were made with stringent decision criteria indicating the subjects had low confidence in their decisions. When matching the identities in high-resolution still images to those in the masked videos, the CNN performed at chance. Next, we examined CNN performance on the same task, but using the unmasked videos and their masked counterparts. In this case, the network scored surprisingly well on a subset of mask conditions. We conclude that carefully tested de-identification approaches, used alone or in combination, can be an effective tool for protecting the privacy of individuals captured in videos. We note that no approach is equally effective in masking all stimuli, and that future work should examine possible methods for determining the most effective mask per individual stimulus.
Tasks Edge Detection, Face Recognition, Temporal Action Localization
Published 2019-02-19
URL https://arxiv.org/abs/1902.06967v3
PDF https://arxiv.org/pdf/1902.06967v3.pdf
PWC https://paperswithcode.com/paper/evaluating-the-effectiveness-of-automated
Repo
Framework

Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

Title Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation
Authors Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy Hospedales, Maja Pantic
Abstract Training deep neural networks with spatio-temporal (i.e., 3D) or multidimensional convolutions of higher-order is computationally challenging due to millions of unknown parameters across dozens of layers. To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters. Alternatively, new convolutional blocks, such as MobileNet, can be directly designed for efficiency. In this paper, we unify these two approaches by proposing a tensor factorization framework for efficient multidimensional (separable) convolutions of higher-order. Interestingly, the proposed framework enables a novel higher-order transduction, allowing to train a network on a given domain (e.g., 2D images or N-dimensional data in general) and using transduction to generalize to higher-order data such as videos (or (N+K)-dimensional data in general), capturing for instance temporal dynamics while preserving the learnt spatial information. We apply the proposed methodology, coined CP-Higher-Order Convolution (HO-CPConv), to spatio-temporal facial emotion analysis. Most existing facial affect models focus on static imagery and discard all temporal information. This is due to the above-mentioned burden of training 3D convolutional nets and the lack of large bodies of video data annotated by experts. We address both issues with our proposed framework. Initial training is first done on static imagery before using transduction to generalize to the temporal domain. We demonstrate superior performance on three challenging large scale affect estimation datasets, AffectNet, SEWA, and AFEW-VA.
Tasks Emotion Recognition, Image Classification
Published 2019-06-14
URL https://arxiv.org/abs/1906.06196v2
PDF https://arxiv.org/pdf/1906.06196v2.pdf
PWC https://paperswithcode.com/paper/efficient-n-dimensional-convolutions-via
Repo
Framework

Adaptive Structure-constrained Robust Latent Low-Rank Coding for Image Recovery

Title Adaptive Structure-constrained Robust Latent Low-Rank Coding for Image Recovery
Authors Zhao Zhang, Lei Wang, Sheng Li, Yang Wang, Zheng Zhang, Zhengjun Zha, Meng Wang
Abstract In this paper, we propose a robust representation learning model called Adaptive Structure-constrained Low-Rank Coding (AS-LRC) for the latent representation of data. To recover the underlying subspaces more accurately, AS-LRC seamlessly integrates an adaptive weighting based block-diagonal structure-constrained low-rank representation and the group sparse salient feature extraction into a unified framework. Specifically, AS-LRC performs the latent decomposition of given data into a low-rank reconstruction by a block-diagonal codes matrix, a group sparse locality-adaptive salient feature part and a sparse error part. To enforce the block-diagonal structures adaptive to different real datasets for the low-rank recovery, AS-LRC clearly computes an auto-weighting matrix based on the locality-adaptive features and multiplies by the low-rank coefficients for direct minimization at the same time. This encourages the codes to be block-diagonal and can avoid the tricky issue of choosing optimal neighborhood size or kernel width for the weight assignment, suffered in most local geometrical structures-preserving low-rank coding methods. In addition, our AS-LRC selects the L2,1-norm on the projection for extracting group sparse features rather than learning low-rank features by Nuclear-norm regularization, which can make learnt features robust to noise and outliers in samples, and can also make the feature coding process efficient. Extensive visualizations and numerical results demonstrate the effectiveness of our AS-LRC for image representation and recovery.
Tasks Representation Learning
Published 2019-08-21
URL https://arxiv.org/abs/1908.07860v2
PDF https://arxiv.org/pdf/1908.07860v2.pdf
PWC https://paperswithcode.com/paper/190807860
Repo
Framework

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Title Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Authors Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain
Abstract Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting.
Tasks Multi-Armed Bandits
Published 2019-10-15
URL https://arxiv.org/abs/1910.07072v2
PDF https://arxiv.org/pdf/1910.07072v2.pdf
PWC https://paperswithcode.com/paper/model-free-reinforcement-learning-in-infinite
Repo
Framework

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

Title LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Authors Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez
Abstract The effectiveness of LSTM neural networks for popular tasks such as Automatic Speech Recognition has fostered an increasing interest in LSTM inference acceleration. Due to the recurrent nature and data dependencies of LSTM computations, designing a customized architecture specifically tailored to its computation pattern is crucial for efficiency. Since LSTMs are used for a variety of tasks, generalizing this efficiency to diverse configurations, i.e., adaptiveness, is another key feature of these accelerators. In this work, we first show the problem of low resource-utilization and adaptiveness for the state-of-the-art LSTM implementations on GPU, FPGA and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism that efficiently handles the data dependencies and increases the adaptiveness of LSTM computation. To do so, we propose LSTM-Sharp as a hardware accelerator, which pipelines LSTM computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, LSTM-Sharp employs dynamic reconfigurable architecture to adapt to the model’s characteristics. LSTM-Sharp achieves 1.5x, 2.86x, and 82x speedups on average over the state-of-the-art ASIC, FPGA, and GPU implementations respectively, for different LSTM models and resource budgets. Furthermore, we provide significant energy-reduction with respect to the previous solutions, due to the low power dissipation of LSTM-Sharp (383 GFLOPs/Watt).
Tasks Speech Recognition
Published 2019-11-04
URL https://arxiv.org/abs/1911.01258v1
PDF https://arxiv.org/pdf/1911.01258v1.pdf
PWC https://paperswithcode.com/paper/lstm-sharp-an-adaptable-energy-efficient
Repo
Framework

Unsupervised Domain Adaptation Learning Algorithm for RGB-D Staircase Recognition

Title Unsupervised Domain Adaptation Learning Algorithm for RGB-D Staircase Recognition
Authors Jing Wang, Kuangen Zhang
Abstract Detection and recognition of staircase as upstairs, downstairs and negative (e.g., ladder) are the fundamental of assisting the visually impaired to travel independently in unfamiliar environments. Previous researches have focused on using massive amounts of RGB-D scene data to train traditional machine learning (ML) based models to detect and recognize the staircase. However, the performance of traditional ML techniques is limited by the amount of labeled RGB-D staircase data. In this paper, we apply an unsupervised domain adaptation approach in deep architectures to transfer knowledge learned from the labeled RGB-D stationary staircase dataset to the unlabeled RGB-D escalator dataset. By utilizing the domain adaptation method, our feedforward convolutional neural networks (CNN) based feature extractor with 5 convolution layers can achieve 100% classification accuracy on testing the labeled stationary staircase data and 80.6% classification accuracy on testing the unlabeled escalator data. We demonstrate the success of the approach for classifying staircase on two domains with a limited amount of data. To further demonstrate the effectiveness of the approach, we also validate the same CNN model without domain adaptation and compare its results with those of our proposed architecture.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-03-04
URL http://arxiv.org/abs/1903.01212v4
PDF http://arxiv.org/pdf/1903.01212v4.pdf
PWC https://paperswithcode.com/paper/unsupervised-domain-adaptation-learning
Repo
Framework

Informing Computer Vision with Optical Illusions

Title Informing Computer Vision with Optical Illusions
Authors Nasim Nematzadeh, David M. W. Powers, Trent Lewis
Abstract Illusions are fascinating and immediately catch people’s attention and interest, but they are also valuable in terms of giving us insights into human cognition and perception. A good theory of human perception should be able to explain the illusion, and a correct theory will actually give quantifiable results. We investigate here the efficiency of a computational filtering model utilised for modelling the lateral inhibition of retinal ganglion cells and their responses to a range of Geometric Illusions using isotropic Differences of Gaussian filters. This study explores the way in which illusions have been explained and shows how a simple standard model of vision based on classical receptive fields can predict the existence of these illusions as well as the degree of effect. A fundamental contribution of this work is to link bottom-up processes to higher level perception and cognition consistent with Marr’s theory of vision and edge map representation.
Tasks
Published 2019-02-08
URL http://arxiv.org/abs/1902.02922v1
PDF http://arxiv.org/pdf/1902.02922v1.pdf
PWC https://paperswithcode.com/paper/informing-computer-vision-with-optical
Repo
Framework
comments powered by Disqus