October 21, 2019

3214 words 16 mins read

Paper Group AWR 35

Paper Group AWR 35

Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. Standing Wave Decomposition Gaussian Process. Fast Graph-Cut Based Optimization for Practical Dense …

Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person

Title Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person
Authors Fania Mokhayeri, Eric Granger, Guillaume-Alexandre Bilodeau
Abstract The performance of still-to-video FR systems can decline significantly because faces captured in unconstrained operational domain (OD) over multiple video cameras have a different underlying data distribution compared to faces captured under controlled conditions in the enrollment domain (ED) with a still camera. This is particularly true when individuals are enrolled to the system using a single reference still. To improve the robustness of these systems, it is possible to augment the reference set by generating synthetic faces based on the original still. However, without knowledge of the OD, many synthetic images must be generated to account for all possible capture conditions. FR systems may, therefore, require complex implementations and yield lower accuracy when training on many less relevant images. This paper introduces an algorithm for domain-specific face synthesis (DSFS) that exploits the representative intra-class variation information available from the OD. Prior to operation, a compact set of faces from unknown persons appearing in the OD is selected through clustering in the captured condition space. The domain-specific variations of these face images are projected onto the reference stills by integrating an image-based face relighting technique inside the 3D reconstruction framework. A compact set of synthetic faces is generated that resemble individuals of interest under the capture conditions relevant to the OD. In a particular implementation based on sparse representation classification, the synthetic faces generated with the DSFS are employed to form a cross-domain dictionary that account for structured sparsity. Experimental results reveal that augmenting the reference gallery set of FR systems using the proposed DSFS approach can provide a higher level of accuracy compared to state-of-the-art approaches, with only a moderate increase in its computational complexity.
Tasks 3D Reconstruction, Face Generation, Face Recognition
Published 2018-01-06
URL http://arxiv.org/abs/1801.01974v2
PDF http://arxiv.org/pdf/1801.01974v2.pdf
PWC https://paperswithcode.com/paper/domain-specific-face-synthesis-for-video-face
Repo https://github.com/faniamokhayeri/DSFS
Framework none

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Title SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Authors Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, Martin Monperrus
Abstract This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples, and find correct patches for 14 bugs in Defects4J. It captures a wide range of repair operators without any domain-specific top-down design.
Tasks
Published 2018-12-24
URL https://arxiv.org/abs/1901.01808v3
PDF https://arxiv.org/pdf/1901.01808v3.pdf
PWC https://paperswithcode.com/paper/sequencer-sequence-to-sequence-learning-for
Repo https://github.com/kth/SequenceR
Framework none

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Title Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Authors Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Abstract Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As an image can be described in infinite ways depending on the goal and the context at hand, a higher degree of controllability is needed to apply captioning algorithms in complex scenarios. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. Given a control signal in the form of a sequence or set of image regions, we generate the corresponding caption through a recurrent architecture which predicts textual chunks explicitly grounded on regions, following the constraints of the given control. Experiments are conducted on Flickr30k Entities and on COCO Entities, an extended version of COCO in which we add grounding annotations collected in a semi-automatic manner. Results demonstrate that our method achieves state of the art performances on controllable image captioning, in terms of caption quality and diversity. Code and annotations are publicly available at: https://github.com/aimagelab/show-control-and-tell.
Tasks Image Captioning
Published 2018-11-26
URL https://arxiv.org/abs/1811.10652v3
PDF https://arxiv.org/pdf/1811.10652v3.pdf
PWC https://paperswithcode.com/paper/show-control-and-tell-a-framework-for
Repo https://github.com/aimagelab/show-control-and-tell
Framework pytorch

Standing Wave Decomposition Gaussian Process

Title Standing Wave Decomposition Gaussian Process
Authors Chi-Ken Lu, Scott Cheng-Hsin Yang, Patrick Shafto
Abstract We propose a Standing Wave Decomposition (SWD) approximation to Gaussian Process regression (GP). GP involves a costly matrix inversion operation, which limits applicability to large data analysis. For an input space that can be approximated by a grid and when correlations among data are short-ranged, the kernel matrix inversion can be replaced by analytic diagonalization using the SWD. We show that this approach applies to uni- and multi-dimensional input data, extends to include longer-range correlations, and the grid can be in a latent space and used as inducing points. Through simulations, we show that our approximate method applied to the squared exponential kernel outperforms existing methods in predictive accuracy per unit time in the regime where data are plentiful. Our SWD-GP is recommended for regression analyses where there is a relatively large amount of data and/or there are constraints on computation time.
Tasks
Published 2018-03-09
URL http://arxiv.org/abs/1803.03666v4
PDF http://arxiv.org/pdf/1803.03666v4.pdf
PWC https://paperswithcode.com/paper/standing-wave-decomposition-gaussian-process
Repo https://github.com/CoDaS-Lab/LG-SWD-GP
Framework none

Fast Graph-Cut Based Optimization for Practical Dense Deformable Registration of Volume Images

Title Fast Graph-Cut Based Optimization for Practical Dense Deformable Registration of Volume Images
Authors Simon Ekström, Filip Malmberg, Håkan Ahlström, Joel Kullberg, Robin Strand
Abstract Objective: Deformable image registration is a fundamental problem in medical image analysis, with applications such as longitudinal studies, population modeling, and atlas based image segmentation. Registration is often phrased as an optimization problem, i.e., finding a deformation field that is optimal according to a given objective function. Discrete, combinatorial, optimization techniques have successfully been employed to solve the resulting optimization problem. Specifically, optimization based on $\alpha$-expansion with minimal graph cuts has been proposed as a powerful tool for image registration. The high computational cost of the graph-cut based optimization approach, however, limits the utility of this approach for registration of large volume images. Methods: Here, we propose to accelerate graph-cut based deformable registration by dividing the image into overlapping sub-regions and restricting the $\alpha$-expansion moves to a single sub-region at a time. Results: We demonstrate empirically that this approach can achieve a large reduction in computation time – from days to minutes – with only a small penalty in terms of solution quality. Conclusion: The reduction in computation time provided by the proposed method makes graph cut based deformable registration viable for large volume images. Significance: Graph cut based image registration has previously been shown to produce excellent results, but the high computational cost has hindered the adoption of the method for registration of large medical volume images. Our proposed method lifts this restriction, requiring only a small fraction of the computational cost to produce results of comparable quality.
Tasks Combinatorial Optimization, Image Registration, Semantic Segmentation
Published 2018-10-19
URL http://arxiv.org/abs/1810.08427v1
PDF http://arxiv.org/pdf/1810.08427v1.pdf
PWC https://paperswithcode.com/paper/fast-graph-cut-based-optimization-for
Repo https://github.com/simeks/deform
Framework none

Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

Title Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition
Authors Okan Köpüklü, Neslihan Köse, Gerhard Rigoll
Abstract Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets - Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark.
Tasks Action Classification, Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published 2018-04-19
URL http://arxiv.org/abs/1804.07187v2
PDF http://arxiv.org/pdf/1804.07187v2.pdf
PWC https://paperswithcode.com/paper/motion-fused-frames-data-level-fusion
Repo https://github.com/okankop/MFF-pytorch
Framework pytorch

Learning from graphs with structural variation

Title Learning from graphs with structural variation
Authors Rune Kok Nielsen, Andreas Nugaard Holm, Aasa Feragen
Abstract We study the effect of structural variation in graph data on the predictive performance of graph kernels. To this end, we introduce a novel, noise-robust adaptation of the GraphHopper kernel and validate it on benchmark data, obtaining modestly improved predictive performance on a range of datasets. Next, we investigate the performance of the state-of-the-art Weisfeiler-Lehman graph kernel under increasing synthetic structural errors and find that the effect of introducing errors depends strongly on the dataset.
Tasks
Published 2018-06-29
URL http://arxiv.org/abs/1806.11377v1
PDF http://arxiv.org/pdf/1806.11377v1.pdf
PWC https://paperswithcode.com/paper/learning-from-graphs-with-structural
Repo https://github.com/RuneKokNielsen/graphhopper
Framework none

AST-Based Deep Learning for Detecting Malicious PowerShell

Title AST-Based Deep Learning for Detecting Malicious PowerShell
Authors Gili Rusak, Abdullah Al-Dujaili, Una-May O’Reilly
Abstract With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional neural nets to detect obfuscated malicious commands at a character level. While these representations may express salient PowerShell properties, our hypothesis is that tools from static program analysis will be more effective. We propose a hybrid approach combining traditional program analysis (in the form of abstract syntax trees) and deep learning. This poster presents preliminary results of a fundamental step in our approach: learning embeddings for nodes of PowerShell ASTs. We classify malicious scripts by family type and explore embedded program vector representations.
Tasks
Published 2018-10-03
URL http://arxiv.org/abs/1810.09230v1
PDF http://arxiv.org/pdf/1810.09230v1.pdf
PWC https://paperswithcode.com/paper/181009230
Repo https://github.com/zekiesenalp/zararli_powershell_analizi
Framework none

Foreign English Accent Adjustment by Learning Phonetic Patterns

Title Foreign English Accent Adjustment by Learning Phonetic Patterns
Authors Fedor Kitashov, Elizaveta Svitanko, Debojyoti Dutta
Abstract State-of-the-art automatic speech recognition (ASR) systems struggle with the lack of data for rare accents. For sufficiently large datasets, neural engines tend to outshine statistical models in most natural language processing problems. However, a speech accent remains a challenge for both approaches. Phonologists manually create general rules describing a speaker’s accent, but their results remain underutilized. In this paper, we propose a model that automatically retrieves phonological generalizations from a small dataset. This method leverages the difference in pronunciation between a particular dialect and General American English (GAE) and creates new accented samples of words. The proposed model is able to learn all generalizations that previously were manually obtained by phonologists. We use this statistical method to generate a million phonological variations of words from the CMU Pronouncing Dictionary and train a sequence-to-sequence RNN to recognize accented words with 59% accuracy.
Tasks Speech Recognition
Published 2018-07-09
URL http://arxiv.org/abs/1807.03625v1
PDF http://arxiv.org/pdf/1807.03625v1.pdf
PWC https://paperswithcode.com/paper/foreign-english-accent-adjustment-by-learning
Repo https://github.com/CiscoAI/accent_transfer
Framework tf

Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations

Title Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Authors Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee
Abstract Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing. We present a novel RvNN architecture that can provide dynamic compositionality by considering comprehensive syntactic information derived from both the structure and linguistic tags. Specifically, we introduce a structure-aware tag representation constructed by a separate tag-level tree-LSTM. With this, we can control the composition function of the existing word-level tree-LSTM by augmenting the representation as a supplementary input to the gate functions of the tree-LSTM. In extensive experiments, we show that models built upon the proposed architecture obtain superior or competitive performance on several sentence-level tasks such as sentiment analysis and natural language inference when compared against previous tree-structured models and other sophisticated neural models.
Tasks Natural Language Inference, Sentiment Analysis
Published 2018-09-07
URL http://arxiv.org/abs/1809.02286v2
PDF http://arxiv.org/pdf/1809.02286v2.pdf
PWC https://paperswithcode.com/paper/dynamic-compositionality-in-recursive-neural
Repo https://github.com/galsang/SATA-Tree-LSTM
Framework pytorch

MRI Reconstruction via Cascaded Channel-wise Attention Network

Title MRI Reconstruction via Cascaded Channel-wise Attention Network
Authors Qiaoying Huang, Dong Yang, Pengxiang Wu, Hui Qu, Jingru Yi, Dimitris Metaxas
Abstract We consider an MRI reconstruction problem with input of k-space data at a very low undersampled rate. This can practically benefit patient due to reduced time of MRI scan, but it is also challenging since quality of reconstruction may be compromised. Currently, deep learning based methods dominate MRI reconstruction over traditional approaches such as Compressed Sensing, but they rarely show satisfactory performance in the case of low undersampled k-space data. One explanation is that these methods treat channel-wise features equally, which results in degraded representation ability of the neural network. To solve this problem, we propose a new model called MRI Cascaded Channel-wise Attention Network (MICCAN), highlighted by three components: (i) a variant of U-net with Channel-wise Attention (UCA) module, (ii) a long skip connection and (iii) a combined loss. Our model is able to attend to salient information by filtering irrelevant features and also concentrate on high-frequency information by enforcing low-frequency information bypassed to the final output. We conduct both quantitative evaluation and qualitative analysis of our method on a cardiac dataset. The experiment shows that our method achieves very promising results in terms of three common metrics on the MRI reconstruction with low undersampled k-space data.
Tasks
Published 2018-10-18
URL http://arxiv.org/abs/1810.08229v2
PDF http://arxiv.org/pdf/1810.08229v2.pdf
PWC https://paperswithcode.com/paper/mri-reconstruction-via-cascaded-channel-wise
Repo https://github.com/charwing10/isbi2019miccan
Framework pytorch

DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention

Title DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention
Authors Aniketh Janardhan Reddy, Gil Rocha, Diego Esteves
Abstract In this paper, we describe DeFactoNLP, the system we designed for the FEVER 2018 Shared Task. The aim of this task was to conceive a system that can not only automatically assess the veracity of a claim but also retrieve evidence supporting this assessment from Wikipedia. In our approach, the Wikipedia documents whose Term Frequency-Inverse Document Frequency (TFIDF) vectors are most similar to the vector of the claim and those documents whose names are similar to those of the named entities (NEs) mentioned in the claim are identified as the documents which might contain evidence. The sentences in these documents are then supplied to a textual entailment recognition module. This module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information to assess the veracity of the claim. Various features computed using these probabilities are finally used by a Random Forest classifier to determine the overall truthfulness of the claim. The sentences which support this classification are returned as evidence. Our approach achieved a 0.4277 evidence F1-score, a 0.5136 label accuracy and a 0.3833 FEVER score.
Tasks Natural Language Inference
Published 2018-09-03
URL http://arxiv.org/abs/1809.00509v1
PDF http://arxiv.org/pdf/1809.00509v1.pdf
PWC https://paperswithcode.com/paper/defactonlp-fact-verification-using-entity
Repo https://github.com/DeFacto/DeFactoNLP
Framework tf

Network Recasting: A Universal Method for Network Architecture Transformation

Title Network Recasting: A Universal Method for Network Architecture Transformation
Authors Joonsang Yu, Sungbum Kang, Kiyoung Choi
Abstract This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.
Tasks
Published 2018-09-14
URL https://arxiv.org/abs/1809.05262v2
PDF https://arxiv.org/pdf/1809.05262v2.pdf
PWC https://paperswithcode.com/paper/network-recasting-a-universal-method-for
Repo https://github.com/joonsang-yu/Network-Recasting
Framework pytorch

Detection of Structural Change in Geographic Regions of Interest by Self Organized Mapping: Las Vegas City and Lake Mead across the Years

Title Detection of Structural Change in Geographic Regions of Interest by Self Organized Mapping: Las Vegas City and Lake Mead across the Years
Authors John M. Wandeto, Henry O. Nyongesa, Birgitta Dresp-Langley
Abstract Time-series of satellite images may reveal important data about changes in environmental conditions and natural or urban landscape structures that are of potential interest to citizens, historians, or policymakers. We applied a fast method of image analysis using Self Organized Maps (SOM) and, more specifically, the quantization error (QE), for the visualization of critical changes in satellite images of Las Vegas, generated across the years 1984-2008, a period of major restructuration of the urban landscape. As shown in our previous work, the QE from the SOM output is a reliable measure of variability in local image contents. In the present work, we use statistical trend analysis to show how the QE from SOM run on specific geographic regions of interest extracted from satellite images can be exploited to detect both the magnitude and the direction of structural change across time at a glance. Significantly correlated demographic data for the same reference time period are highlighted. The approach is fast and reliable, and can be implemented for the rapid detection of potentially critical changes in time series of large bodies of image data.
Tasks Quantization, Time Series
Published 2018-03-29
URL http://arxiv.org/abs/1803.11125v1
PDF http://arxiv.org/pdf/1803.11125v1.pdf
PWC https://paperswithcode.com/paper/detection-of-structural-change-in-geographic
Repo https://github.com/JustGlowing/minisom
Framework none

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

Title Exploit the Connectivity: Multi-Object Tracking with TrackletNet
Authors Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang
Abstract Multi-object tracking (MOT) is an important and practical task related to both surveillance systems and moving camera applications, such as autonomous driving and robotic vision. However, due to unreliable detection, occlusion and fast camera motion, tracked targets can be easily lost, which makes MOT very challenging. Most recent works treat tracking as a re-identification (Re-ID) task, but how to combine appearance and temporal features is still not well addressed. In this paper, we propose an innovative and effective tracking method called TrackletNet Tracker (TNT) that combines temporal and appearance information together as a unified framework. First, we define a graph model which treats each tracklet as a vertex. The tracklets are generated by appearance similarity with CNN features and intersection-over-union (IOU) with epipolar constraints to compensate camera movement between adjacent frames. Then, for every pair of two tracklets, the similarity is measured by our designed multi-scale TrackletNet. Afterwards, the tracklets are clustered into groups which represent individual object IDs. Our proposed TNT has the ability to handle most of the challenges in MOT, and achieve promising results on MOT16 and MOT17 benchmark datasets compared with other state-of-the-art methods.
Tasks Autonomous Driving, Multi-Object Tracking, Object Tracking
Published 2018-11-18
URL http://arxiv.org/abs/1811.07258v1
PDF http://arxiv.org/pdf/1811.07258v1.pdf
PWC https://paperswithcode.com/paper/exploit-the-connectivity-multi-object
Repo https://github.com/zhengthomastang/2018AICity_TeamUW
Framework none
comments powered by Disqus