Paper Group AWR 35
Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. Standing Wave Decomposition Gaussian Process. Fast Graph-Cut Based Optimization for Practical Dense …
Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person
Title | Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person |
Authors | Fania Mokhayeri, Eric Granger, Guillaume-Alexandre Bilodeau |
Abstract | The performance of still-to-video FR systems can decline significantly because faces captured in unconstrained operational domain (OD) over multiple video cameras have a different underlying data distribution compared to faces captured under controlled conditions in the enrollment domain (ED) with a still camera. This is particularly true when individuals are enrolled to the system using a single reference still. To improve the robustness of these systems, it is possible to augment the reference set by generating synthetic faces based on the original still. However, without knowledge of the OD, many synthetic images must be generated to account for all possible capture conditions. FR systems may, therefore, require complex implementations and yield lower accuracy when training on many less relevant images. This paper introduces an algorithm for domain-specific face synthesis (DSFS) that exploits the representative intra-class variation information available from the OD. Prior to operation, a compact set of faces from unknown persons appearing in the OD is selected through clustering in the captured condition space. The domain-specific variations of these face images are projected onto the reference stills by integrating an image-based face relighting technique inside the 3D reconstruction framework. A compact set of synthetic faces is generated that resemble individuals of interest under the capture conditions relevant to the OD. In a particular implementation based on sparse representation classification, the synthetic faces generated with the DSFS are employed to form a cross-domain dictionary that account for structured sparsity. Experimental results reveal that augmenting the reference gallery set of FR systems using the proposed DSFS approach can provide a higher level of accuracy compared to state-of-the-art approaches, with only a moderate increase in its computational complexity. |
Tasks | 3D Reconstruction, Face Generation, Face Recognition |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.01974v2 |
http://arxiv.org/pdf/1801.01974v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-specific-face-synthesis-for-video-face |
Repo | https://github.com/faniamokhayeri/DSFS |
Framework | none |
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Title | SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair |
Authors | Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, Martin Monperrus |
Abstract | This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples, and find correct patches for 14 bugs in Defects4J. It captures a wide range of repair operators without any domain-specific top-down design. |
Tasks | |
Published | 2018-12-24 |
URL | https://arxiv.org/abs/1901.01808v3 |
https://arxiv.org/pdf/1901.01808v3.pdf | |
PWC | https://paperswithcode.com/paper/sequencer-sequence-to-sequence-learning-for |
Repo | https://github.com/kth/SequenceR |
Framework | none |
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Title | Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions |
Authors | Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara |
Abstract | Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As an image can be described in infinite ways depending on the goal and the context at hand, a higher degree of controllability is needed to apply captioning algorithms in complex scenarios. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. Given a control signal in the form of a sequence or set of image regions, we generate the corresponding caption through a recurrent architecture which predicts textual chunks explicitly grounded on regions, following the constraints of the given control. Experiments are conducted on Flickr30k Entities and on COCO Entities, an extended version of COCO in which we add grounding annotations collected in a semi-automatic manner. Results demonstrate that our method achieves state of the art performances on controllable image captioning, in terms of caption quality and diversity. Code and annotations are publicly available at: https://github.com/aimagelab/show-control-and-tell. |
Tasks | Image Captioning |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10652v3 |
https://arxiv.org/pdf/1811.10652v3.pdf | |
PWC | https://paperswithcode.com/paper/show-control-and-tell-a-framework-for |
Repo | https://github.com/aimagelab/show-control-and-tell |
Framework | pytorch |
Standing Wave Decomposition Gaussian Process
Title | Standing Wave Decomposition Gaussian Process |
Authors | Chi-Ken Lu, Scott Cheng-Hsin Yang, Patrick Shafto |
Abstract | We propose a Standing Wave Decomposition (SWD) approximation to Gaussian Process regression (GP). GP involves a costly matrix inversion operation, which limits applicability to large data analysis. For an input space that can be approximated by a grid and when correlations among data are short-ranged, the kernel matrix inversion can be replaced by analytic diagonalization using the SWD. We show that this approach applies to uni- and multi-dimensional input data, extends to include longer-range correlations, and the grid can be in a latent space and used as inducing points. Through simulations, we show that our approximate method applied to the squared exponential kernel outperforms existing methods in predictive accuracy per unit time in the regime where data are plentiful. Our SWD-GP is recommended for regression analyses where there is a relatively large amount of data and/or there are constraints on computation time. |
Tasks | |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03666v4 |
http://arxiv.org/pdf/1803.03666v4.pdf | |
PWC | https://paperswithcode.com/paper/standing-wave-decomposition-gaussian-process |
Repo | https://github.com/CoDaS-Lab/LG-SWD-GP |
Framework | none |
Fast Graph-Cut Based Optimization for Practical Dense Deformable Registration of Volume Images
Title | Fast Graph-Cut Based Optimization for Practical Dense Deformable Registration of Volume Images |
Authors | Simon Ekström, Filip Malmberg, Håkan Ahlström, Joel Kullberg, Robin Strand |
Abstract | Objective: Deformable image registration is a fundamental problem in medical image analysis, with applications such as longitudinal studies, population modeling, and atlas based image segmentation. Registration is often phrased as an optimization problem, i.e., finding a deformation field that is optimal according to a given objective function. Discrete, combinatorial, optimization techniques have successfully been employed to solve the resulting optimization problem. Specifically, optimization based on $\alpha$-expansion with minimal graph cuts has been proposed as a powerful tool for image registration. The high computational cost of the graph-cut based optimization approach, however, limits the utility of this approach for registration of large volume images. Methods: Here, we propose to accelerate graph-cut based deformable registration by dividing the image into overlapping sub-regions and restricting the $\alpha$-expansion moves to a single sub-region at a time. Results: We demonstrate empirically that this approach can achieve a large reduction in computation time – from days to minutes – with only a small penalty in terms of solution quality. Conclusion: The reduction in computation time provided by the proposed method makes graph cut based deformable registration viable for large volume images. Significance: Graph cut based image registration has previously been shown to produce excellent results, but the high computational cost has hindered the adoption of the method for registration of large medical volume images. Our proposed method lifts this restriction, requiring only a small fraction of the computational cost to produce results of comparable quality. |
Tasks | Combinatorial Optimization, Image Registration, Semantic Segmentation |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08427v1 |
http://arxiv.org/pdf/1810.08427v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-graph-cut-based-optimization-for |
Repo | https://github.com/simeks/deform |
Framework | none |
Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition
Title | Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition |
Authors | Okan Köpüklü, Neslihan Köse, Gerhard Rigoll |
Abstract | Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the network. We evaluate MFFs on hand gesture recognition tasks using three video datasets - Jester, ChaLearn LAP IsoGD and NVIDIA Dynamic Hand Gesture Datasets - which require capturing long-term temporal relations of hand movements. Our approach obtains very competitive performance on Jester and ChaLearn benchmarks with the classification accuracies of 96.28% and 57.4%, respectively, while achieving state-of-the-art performance with 84.7% accuracy on NVIDIA benchmark. |
Tasks | Action Classification, Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07187v2 |
http://arxiv.org/pdf/1804.07187v2.pdf | |
PWC | https://paperswithcode.com/paper/motion-fused-frames-data-level-fusion |
Repo | https://github.com/okankop/MFF-pytorch |
Framework | pytorch |
Learning from graphs with structural variation
Title | Learning from graphs with structural variation |
Authors | Rune Kok Nielsen, Andreas Nugaard Holm, Aasa Feragen |
Abstract | We study the effect of structural variation in graph data on the predictive performance of graph kernels. To this end, we introduce a novel, noise-robust adaptation of the GraphHopper kernel and validate it on benchmark data, obtaining modestly improved predictive performance on a range of datasets. Next, we investigate the performance of the state-of-the-art Weisfeiler-Lehman graph kernel under increasing synthetic structural errors and find that the effect of introducing errors depends strongly on the dataset. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11377v1 |
http://arxiv.org/pdf/1806.11377v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-graphs-with-structural |
Repo | https://github.com/RuneKokNielsen/graphhopper |
Framework | none |
AST-Based Deep Learning for Detecting Malicious PowerShell
Title | AST-Based Deep Learning for Detecting Malicious PowerShell |
Authors | Gili Rusak, Abdullah Al-Dujaili, Una-May O’Reilly |
Abstract | With the celebrated success of deep learning, some attempts to develop effective methods for detecting malicious PowerShell programs employ neural nets in a traditional natural language processing setup while others employ convolutional neural nets to detect obfuscated malicious commands at a character level. While these representations may express salient PowerShell properties, our hypothesis is that tools from static program analysis will be more effective. We propose a hybrid approach combining traditional program analysis (in the form of abstract syntax trees) and deep learning. This poster presents preliminary results of a fundamental step in our approach: learning embeddings for nodes of PowerShell ASTs. We classify malicious scripts by family type and explore embedded program vector representations. |
Tasks | |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.09230v1 |
http://arxiv.org/pdf/1810.09230v1.pdf | |
PWC | https://paperswithcode.com/paper/181009230 |
Repo | https://github.com/zekiesenalp/zararli_powershell_analizi |
Framework | none |
Foreign English Accent Adjustment by Learning Phonetic Patterns
Title | Foreign English Accent Adjustment by Learning Phonetic Patterns |
Authors | Fedor Kitashov, Elizaveta Svitanko, Debojyoti Dutta |
Abstract | State-of-the-art automatic speech recognition (ASR) systems struggle with the lack of data for rare accents. For sufficiently large datasets, neural engines tend to outshine statistical models in most natural language processing problems. However, a speech accent remains a challenge for both approaches. Phonologists manually create general rules describing a speaker’s accent, but their results remain underutilized. In this paper, we propose a model that automatically retrieves phonological generalizations from a small dataset. This method leverages the difference in pronunciation between a particular dialect and General American English (GAE) and creates new accented samples of words. The proposed model is able to learn all generalizations that previously were manually obtained by phonologists. We use this statistical method to generate a million phonological variations of words from the CMU Pronouncing Dictionary and train a sequence-to-sequence RNN to recognize accented words with 59% accuracy. |
Tasks | Speech Recognition |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03625v1 |
http://arxiv.org/pdf/1807.03625v1.pdf | |
PWC | https://paperswithcode.com/paper/foreign-english-accent-adjustment-by-learning |
Repo | https://github.com/CiscoAI/accent_transfer |
Framework | tf |
Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Title | Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations |
Authors | Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee |
Abstract | Most existing recursive neural network (RvNN) architectures utilize only the structure of parse trees, ignoring syntactic tags which are provided as by-products of parsing. We present a novel RvNN architecture that can provide dynamic compositionality by considering comprehensive syntactic information derived from both the structure and linguistic tags. Specifically, we introduce a structure-aware tag representation constructed by a separate tag-level tree-LSTM. With this, we can control the composition function of the existing word-level tree-LSTM by augmenting the representation as a supplementary input to the gate functions of the tree-LSTM. In extensive experiments, we show that models built upon the proposed architecture obtain superior or competitive performance on several sentence-level tasks such as sentiment analysis and natural language inference when compared against previous tree-structured models and other sophisticated neural models. |
Tasks | Natural Language Inference, Sentiment Analysis |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02286v2 |
http://arxiv.org/pdf/1809.02286v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-compositionality-in-recursive-neural |
Repo | https://github.com/galsang/SATA-Tree-LSTM |
Framework | pytorch |
MRI Reconstruction via Cascaded Channel-wise Attention Network
Title | MRI Reconstruction via Cascaded Channel-wise Attention Network |
Authors | Qiaoying Huang, Dong Yang, Pengxiang Wu, Hui Qu, Jingru Yi, Dimitris Metaxas |
Abstract | We consider an MRI reconstruction problem with input of k-space data at a very low undersampled rate. This can practically benefit patient due to reduced time of MRI scan, but it is also challenging since quality of reconstruction may be compromised. Currently, deep learning based methods dominate MRI reconstruction over traditional approaches such as Compressed Sensing, but they rarely show satisfactory performance in the case of low undersampled k-space data. One explanation is that these methods treat channel-wise features equally, which results in degraded representation ability of the neural network. To solve this problem, we propose a new model called MRI Cascaded Channel-wise Attention Network (MICCAN), highlighted by three components: (i) a variant of U-net with Channel-wise Attention (UCA) module, (ii) a long skip connection and (iii) a combined loss. Our model is able to attend to salient information by filtering irrelevant features and also concentrate on high-frequency information by enforcing low-frequency information bypassed to the final output. We conduct both quantitative evaluation and qualitative analysis of our method on a cardiac dataset. The experiment shows that our method achieves very promising results in terms of three common metrics on the MRI reconstruction with low undersampled k-space data. |
Tasks | |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.08229v2 |
http://arxiv.org/pdf/1810.08229v2.pdf | |
PWC | https://paperswithcode.com/paper/mri-reconstruction-via-cascaded-channel-wise |
Repo | https://github.com/charwing10/isbi2019miccan |
Framework | pytorch |
DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention
Title | DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention |
Authors | Aniketh Janardhan Reddy, Gil Rocha, Diego Esteves |
Abstract | In this paper, we describe DeFactoNLP, the system we designed for the FEVER 2018 Shared Task. The aim of this task was to conceive a system that can not only automatically assess the veracity of a claim but also retrieve evidence supporting this assessment from Wikipedia. In our approach, the Wikipedia documents whose Term Frequency-Inverse Document Frequency (TFIDF) vectors are most similar to the vector of the claim and those documents whose names are similar to those of the named entities (NEs) mentioned in the claim are identified as the documents which might contain evidence. The sentences in these documents are then supplied to a textual entailment recognition module. This module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information to assess the veracity of the claim. Various features computed using these probabilities are finally used by a Random Forest classifier to determine the overall truthfulness of the claim. The sentences which support this classification are returned as evidence. Our approach achieved a 0.4277 evidence F1-score, a 0.5136 label accuracy and a 0.3833 FEVER score. |
Tasks | Natural Language Inference |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00509v1 |
http://arxiv.org/pdf/1809.00509v1.pdf | |
PWC | https://paperswithcode.com/paper/defactonlp-fact-verification-using-entity |
Repo | https://github.com/DeFacto/DeFactoNLP |
Framework | tf |
Network Recasting: A Universal Method for Network Architecture Transformation
Title | Network Recasting: A Universal Method for Network Architecture Transformation |
Authors | Joonsang Yu, Sungbum Kang, Kiyoung Choi |
Abstract | This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU. |
Tasks | |
Published | 2018-09-14 |
URL | https://arxiv.org/abs/1809.05262v2 |
https://arxiv.org/pdf/1809.05262v2.pdf | |
PWC | https://paperswithcode.com/paper/network-recasting-a-universal-method-for |
Repo | https://github.com/joonsang-yu/Network-Recasting |
Framework | pytorch |
Detection of Structural Change in Geographic Regions of Interest by Self Organized Mapping: Las Vegas City and Lake Mead across the Years
Title | Detection of Structural Change in Geographic Regions of Interest by Self Organized Mapping: Las Vegas City and Lake Mead across the Years |
Authors | John M. Wandeto, Henry O. Nyongesa, Birgitta Dresp-Langley |
Abstract | Time-series of satellite images may reveal important data about changes in environmental conditions and natural or urban landscape structures that are of potential interest to citizens, historians, or policymakers. We applied a fast method of image analysis using Self Organized Maps (SOM) and, more specifically, the quantization error (QE), for the visualization of critical changes in satellite images of Las Vegas, generated across the years 1984-2008, a period of major restructuration of the urban landscape. As shown in our previous work, the QE from the SOM output is a reliable measure of variability in local image contents. In the present work, we use statistical trend analysis to show how the QE from SOM run on specific geographic regions of interest extracted from satellite images can be exploited to detect both the magnitude and the direction of structural change across time at a glance. Significantly correlated demographic data for the same reference time period are highlighted. The approach is fast and reliable, and can be implemented for the rapid detection of potentially critical changes in time series of large bodies of image data. |
Tasks | Quantization, Time Series |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11125v1 |
http://arxiv.org/pdf/1803.11125v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-structural-change-in-geographic |
Repo | https://github.com/JustGlowing/minisom |
Framework | none |
Exploit the Connectivity: Multi-Object Tracking with TrackletNet
Title | Exploit the Connectivity: Multi-Object Tracking with TrackletNet |
Authors | Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang |
Abstract | Multi-object tracking (MOT) is an important and practical task related to both surveillance systems and moving camera applications, such as autonomous driving and robotic vision. However, due to unreliable detection, occlusion and fast camera motion, tracked targets can be easily lost, which makes MOT very challenging. Most recent works treat tracking as a re-identification (Re-ID) task, but how to combine appearance and temporal features is still not well addressed. In this paper, we propose an innovative and effective tracking method called TrackletNet Tracker (TNT) that combines temporal and appearance information together as a unified framework. First, we define a graph model which treats each tracklet as a vertex. The tracklets are generated by appearance similarity with CNN features and intersection-over-union (IOU) with epipolar constraints to compensate camera movement between adjacent frames. Then, for every pair of two tracklets, the similarity is measured by our designed multi-scale TrackletNet. Afterwards, the tracklets are clustered into groups which represent individual object IDs. Our proposed TNT has the ability to handle most of the challenges in MOT, and achieve promising results on MOT16 and MOT17 benchmark datasets compared with other state-of-the-art methods. |
Tasks | Autonomous Driving, Multi-Object Tracking, Object Tracking |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07258v1 |
http://arxiv.org/pdf/1811.07258v1.pdf | |
PWC | https://paperswithcode.com/paper/exploit-the-connectivity-multi-object |
Repo | https://github.com/zhengthomastang/2018AICity_TeamUW |
Framework | none |