October 20, 2019

2992 words 15 mins read

Paper Group AWR 206

Paper Group AWR 206

Contextualize, Show and Tell: A Neural Visual Storyteller. A Binary Optimization Approach for Constrained K-Means Clustering. Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization. Instance-level Human Parsing via Part Grouping Network. Efficient Dependency-Guided Named Entity Recognition. Scalable Importance Tempering an …

Contextualize, Show and Tell: A Neural Visual Storyteller

Title Contextualize, Show and Tell: A Neural Visual Storyteller
Authors Diana Gonzalez-Rico, Gibran Fuentes-Pineda
Abstract We present a neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al. (Vinyals et al., 2015). This extension relies on an encoder LSTM to compute a context vector of each story from the image sequence. This context vector is used as the first state of multiple independent decoder LSTMs, each of which generates the portion of the story corresponding to each image in the sequence by taking the image embedding as the first input. Our model showed competitive results with the METEOR metric and human ratings in the internal track of the Visual Storytelling Challenge 2018.
Tasks Visual Storytelling
Published 2018-06-03
URL http://arxiv.org/abs/1806.00738v1
PDF http://arxiv.org/pdf/1806.00738v1.pdf
PWC https://paperswithcode.com/paper/contextualize-show-and-tell-a-neural-visual
Repo https://github.com/dgonzalez-ri/neural-visual-storyteller
Framework tf

A Binary Optimization Approach for Constrained K-Means Clustering

Title A Binary Optimization Approach for Constrained K-Means Clustering
Authors Huu Le, Anders Eriksson, Thanh-Toan Do, Michael Milford
Abstract K-Means clustering still plays an important role in many computer vision problems. While the conventional Lloyd method, which alternates between centroid update and cluster assignment, is primarily used in practice, it may converge to a solution with empty clusters. Furthermore, some applications may require the clusters to satisfy a specific set of constraints, e.g., cluster sizes, must-link/cannot-link. Several methods have been introduced to solve constrained K-Means clustering. Due to the non-convex nature of K-Means, however, existing approaches may result in sub-optimal solutions that poorly approximate the true clusters. In this work, we provide a new perspective to tackle this problem. Particularly, we reconsider constrained K-Means as a Binary Optimization Problem and propose a novel optimization scheme to search for feasible solutions in the binary domain. This approach allows us to solve constrained K-Means where multiple types of constraints can be simultaneously enforced. Experimental results on synthetic and real datasets show that our method provides better clustering accuracy with faster runtime compared to several commonly used techniques.
Tasks
Published 2018-10-24
URL http://arxiv.org/abs/1810.10134v2
PDF http://arxiv.org/pdf/1810.10134v2.pdf
PWC https://paperswithcode.com/paper/a-binary-optimization-approach-for
Repo https://github.com/intellhave/BCKM
Framework none

Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization

Title Accelerated Inference in Markov Random Fields via Smooth Riemannian Optimization
Authors Siyi Hu, Luca Carlone
Abstract Markov Random Fields (MRFs) are a popular model for several pattern recognition and reconstruction problems in robotics and computer vision. Inference in MRFs is intractable in general and related work resorts to approximation algorithms. Among those techniques, semidefinite programming (SDP) relaxations have been shown to provide accurate estimates while scaling poorly with the problem size and being typically slow for practical applications. Our first contribution is to design a dual ascent method to solve standard SDP relaxations that takes advantage of the geometric structure of the problem to speed up computation. This technique, named Dual Ascent Riemannian Staircase (DARS), is able to solve large problem instances in seconds. Our second contribution is to develop a second and faster approach. The backbone of this second approach is a novel SDP relaxation combined with a fast and scalable solver based on smooth Riemannian optimization. We show that this approach, named Fast Unconstrained SEmidefinite Solver (FUSES), can solve large problems in milliseconds. Contrarily to local MRF solvers, e.g., loopy belief propagation, our approaches do not require an initial guess. Moreover, we leverage recent results from optimization theory to provide per-instance sub-optimality guarantees. We demonstrate the proposed approaches in multi-class image segmentation problems. Extensive experimental evidence shows that (i) FUSES and DARS produce near-optimal solutions, attaining an objective within 0.1% of the optimum, (ii) FUSES and DARS are remarkably faster than general-purpose SDP solvers, and FUSES is more than two orders of magnitude faster than DARS while attaining similar solution quality, (iii) FUSES is faster than local search methods while being a global solver.
Tasks Semantic Segmentation
Published 2018-10-27
URL http://arxiv.org/abs/1810.11689v2
PDF http://arxiv.org/pdf/1810.11689v2.pdf
PWC https://paperswithcode.com/paper/accelerated-inference-in-markov-random-fields
Repo https://github.com/MIT-SPARK/FUSES
Framework tf

Instance-level Human Parsing via Part Grouping Network

Title Instance-level Human Parsing via Part Grouping Network
Authors Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin
Abstract Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass. Several related works all follow the “parsing-by-detection” pipeline that heavily relies on separately trained detection models to localize instances and then performs human parsing for each instance sequentially. Nonetheless, two discrepant optimization targets of detection and parsing lead to suboptimal representation learning and error accumulation for final results. In this work, we make the first attempt to explore a detection-free Part Grouping Network (PGN) for efficiently parsing multiple people in an image in a single pass. Our PGN reformulates instance-level human parsing as two twinned sub-tasks that can be jointly learned and mutually refined via a unified network: 1) semantic part segmentation for assigning each pixel as a human part (e.g., face, arms); 2) instance-aware edge detection to group semantic parts into distinct person instances. Thus the shared intermediate representation would be endowed with capabilities in both characterizing fine-grained parts and inferring instance belongings of each part. Finally, a simple instance partition process is employed to get final results during inference. We conducted experiments on PASCAL-Person-Part dataset and our PGN outperforms all state-of-the-art methods. Furthermore, we show its superiority on a newly collected multi-person parsing dataset (CIHP) including 38,280 diverse images, which is the largest dataset so far and can facilitate more advanced human analysis. The CIHP benchmark and our source code are available at http://sysu-hcp.net/lip/.
Tasks Edge Detection, Human Parsing, Human Part Segmentation, Representation Learning
Published 2018-08-01
URL http://arxiv.org/abs/1808.00157v1
PDF http://arxiv.org/pdf/1808.00157v1.pdf
PWC https://paperswithcode.com/paper/instance-level-human-parsing-via-part
Repo https://github.com/Engineering-Course/CIHP_PGN
Framework tf

Efficient Dependency-Guided Named Entity Recognition

Title Efficient Dependency-Guided Named Entity Recognition
Authors Zhanming Jie, Aldrian Obaja Muis, Wei Lu
Abstract Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for several down-stream natural language processing (NLP) tasks such as relation extraction and event extraction. Dependency trees, on the other hand, also convey crucial semantic-level information. It has been shown previously that such information can be used to improve the performance of NER (Sasano and Kurohashi 2008, Ling and Weld 2012). In this work, we investigate on how to better utilize the structured information conveyed by dependency trees to improve the performance of NER. Specifically, unlike existing approaches which only exploit dependency information for designing local features, we show that certain global structured information of the dependency trees can be exploited when building NER models where such information can provide guided learning and inference. Through extensive experiments, we show that our proposed novel dependency-guided NER model performs competitively with models based on conventional semi-Markov conditional random fields, while requiring significantly less running time.
Tasks Named Entity Recognition, Relation Extraction
Published 2018-10-19
URL http://arxiv.org/abs/1810.08436v2
PDF http://arxiv.org/pdf/1810.08436v2.pdf
PWC https://paperswithcode.com/paper/efficient-dependency-guided-named-entity
Repo https://github.com/allanj/dependency-guided-ner
Framework none

Scalable Importance Tempering and Bayesian Variable Selection

Title Scalable Importance Tempering and Bayesian Variable Selection
Authors Giacomo Zanella, Gareth Roberts
Abstract We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that combines Markov chain Monte Carlo and importance sampling. We provide a careful theoretical analysis, including guarantees on robustness to high dimensionality, explicit comparison with standard Markov chain Monte Carlo methods and illustrations of the potential improvements in efficiency. Simple and concrete intuition is provided for when the novel scheme is expected to outperform standard schemes. When applied to Bayesian variable-selection problems, the novel algorithm is orders of magnitude more efficient than available alternative sampling schemes and enables fast and reliable fully Bayesian inferences with tens of thousand regressors.
Tasks
Published 2018-05-01
URL https://arxiv.org/abs/1805.00541v2
PDF https://arxiv.org/pdf/1805.00541v2.pdf
PWC https://paperswithcode.com/paper/scalable-importance-tempering-and-bayesian
Repo https://github.com/gzanella/TGS
Framework none

UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification

Title UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification
Authors Andreas Hanselowski, Hao Zhang, Zile Li, Daniil Sorokin, Benjamin Schiller, Claudia Schulz, Iryna Gurevych
Abstract The Fact Extraction and VERification (FEVER) shared task was launched to support the development of systems able to verify claims by extracting supporting or refuting facts from raw text. The shared task organizers provide a large-scale dataset for the consecutive steps involved in claim verification, in particular, document retrieval, fact extraction, and claim classification. In this paper, we present our claim verification pipeline approach, which, according to the preliminary results, scored third in the shared task, out of 23 competing systems. For the document retrieval, we implemented a new entity linking approach. In order to be able to rank candidate facts and classify a claim on the basis of several selected facts, we introduce two extensions to the Enhanced LSTM (ESIM).
Tasks Entity Linking, Natural Language Inference
Published 2018-09-03
URL https://arxiv.org/abs/1809.01479v5
PDF https://arxiv.org/pdf/1809.01479v5.pdf
PWC https://paperswithcode.com/paper/ukp-athene-multi-sentence-textual-entailment
Repo https://github.com/UKPLab/fever-2018-team-athene
Framework tf

A Framework of Transfer Learning in Object Detection for Embedded Systems

Title A Framework of Transfer Learning in Object Detection for Embedded Systems
Authors Ioannis Athanasiadis, Panagiotis Mousouliotis, Loukas Petrou
Abstract Transfer learning is one of the subjects undergoing intense study in the area of machine learning. In object recognition and object detection there are known experiments for the transferability of parameters, but not for neural networks which are suitable for object detection in real time embedded applications, such as the SqueezeDet neural network. We use transfer learning to accelerate the training of SqueezeDet to a new group of classes. Also, experiments are conducted to study the transferability and co-adaptation phenomena introduced by the transfer learning process. To accelerate training, we propose a new implementation of the SqueezeDet training which provides a faster pipeline for data processing and achieves 1.8 times speedup compared to the initial implementation. Finally, we created a mechanism for automatic hyperparameter optimization using an empirical method.
Tasks Hyperparameter Optimization, Object Detection, Object Recognition, Transfer Learning
Published 2018-11-12
URL http://arxiv.org/abs/1811.04863v2
PDF http://arxiv.org/pdf/1811.04863v2.pdf
PWC https://paperswithcode.com/paper/a-framework-of-transfer-learning-in-object
Repo https://github.com/supernlogn/squeezeDetTL
Framework tf

Improved Speech Enhancement with the Wave-U-Net

Title Improved Speech Enhancement with the Wave-U-Net
Authors Craig Macartney, Tillman Weyde
Abstract We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems.
Tasks Speech Enhancement, Speech Recognition
Published 2018-11-27
URL http://arxiv.org/abs/1811.11307v1
PDF http://arxiv.org/pdf/1811.11307v1.pdf
PWC https://paperswithcode.com/paper/improved-speech-enhancement-with-the-wave-u
Repo https://github.com/MattSegal/speech-enhancement
Framework pytorch

Differentiable Learning of Quantum Circuit Born Machine

Title Differentiable Learning of Quantum Circuit Born Machine
Authors Jin-Guo Liu, Lei Wang
Abstract Quantum circuit Born machines are generative models which represent the probability distribution of classical dataset as quantum pure states. Computational complexity considerations of the quantum sampling problem suggest that the quantum circuits exhibit stronger expressibility compared to classical neural networks. One can efficiently draw samples from the quantum circuits via projective measurements on qubits. However, similar to the leading implicit generative models in deep learning, such as the generative adversarial networks, the quantum circuits cannot provide the likelihood of the generated samples, which poses a challenge to the training. We devise an efficient gradient-based learning algorithm for the quantum circuit Born machine by minimizing the kerneled maximum mean discrepancy loss. We simulated generative modeling of the Bars-and-Stripes dataset and Gaussian mixture distributions using deep quantum circuits. Our experiments show the importance of circuit depth and gradient-based optimization algorithm. The proposed learning algorithm is runnable on near-term quantum device and can exhibit quantum advantages for generative modeling.
Tasks
Published 2018-04-11
URL http://arxiv.org/abs/1804.04168v1
PDF http://arxiv.org/pdf/1804.04168v1.pdf
PWC https://paperswithcode.com/paper/differentiable-learning-of-quantum-circuit
Repo https://github.com/GiggleLiu/QuantumCircuitBornMachine
Framework none

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

Title Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Authors Jason Lee, Elman Mansimov, Kyunghyun Cho
Abstract We propose a conditional non-autoregressive neural sequence model based on iterative refinement. The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. We extensively evaluate the proposed model on machine translation (En-De and En-Ro) and image caption generation, and observe that it significantly speeds up decoding while maintaining the generation quality comparable to the autoregressive counterpart.
Tasks Denoising, Latent Variable Models, Machine Translation
Published 2018-02-19
URL http://arxiv.org/abs/1802.06901v3
PDF http://arxiv.org/pdf/1802.06901v3.pdf
PWC https://paperswithcode.com/paper/deterministic-non-autoregressive-neural
Repo https://github.com/nyu-dl/dl4mt-nonauto
Framework pytorch

Learning Rate Adaptation for Federated and Differentially Private Learning

Title Learning Rate Adaptation for Federated and Differentially Private Learning
Authors Antti Koskela, Antti Honkela
Abstract We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate for the error against the gradient flow which underlies SGD, we compare the result obtained by one full step and two half-steps. The algorithm is applied in two separate frameworks: federated and differentially private learning. Using examples of deep neural networks we empirically show that the adaptive algorithm is competitive with manually tuned commonly used optimisation methods for differentially privately training. We also show that it works robustly in the case of federated learning unlike commonly used optimisation methods.
Tasks
Published 2018-09-11
URL https://arxiv.org/abs/1809.03832v3
PDF https://arxiv.org/pdf/1809.03832v3.pdf
PWC https://paperswithcode.com/paper/learning-rate-adaptation-for-federated-and
Repo https://github.com/DPBayes/ADADP
Framework pytorch

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

Title Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Authors Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi
Abstract We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a real-life visual urban environment to a goal position, and then identify in the observed image a location described in natural language to find a hidden object. The data contains 9,326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays richer use of spatial reasoning compared to related resources. Empirical analysis shows the data presents an open challenge to existing methods.
Tasks
Published 2018-11-29
URL https://arxiv.org/abs/1811.12354v6
PDF https://arxiv.org/pdf/1811.12354v6.pdf
PWC https://paperswithcode.com/paper/touchdown-natural-language-navigation-and
Repo https://github.com/clic-lab/touchdown
Framework pytorch

New Approaches to Inverse Structural Modification Theory using Random Projections

Title New Approaches to Inverse Structural Modification Theory using Random Projections
Authors Prasad Cheema, Mehrisadat M. Alamdari, Gareth A. Vio
Abstract In many contexts the modal properties of a structure change, either due to the impact of a changing environment, fatigue, or due to the presence of structural damage. For example during flight, an aircraft’s modal properties are known to change with both altitude and velocity. It is thus important to quantify these changes given only a truncated set of modal data, which is usually the case experimentally. This procedure is formally known as the generalised inverse eigenvalue problem. In this paper we experimentally show that first-order gradient-based methods that optimise objective functions defined over a modal are prohibitive due to the required small step sizes. This in turn leads to the justification of using a non-gradient, black box optimiser in the form of particle swarm optimisation. We further show how it is possible to solve such inverse eigenvalue problems in a lower dimensional space by the use of random projections, which in many cases reduces the total dimensionality of the optimisation problem by 80% to 99%. Two example problems are explored involving a ten-dimensional mass-stiffness toy problem, and a one-dimensional finite element mass-stiffness approximation for a Boeing 737-300 aircraft.
Tasks
Published 2018-12-11
URL http://arxiv.org/abs/1812.04214v1
PDF http://arxiv.org/pdf/1812.04214v1.pdf
PWC https://paperswithcode.com/paper/new-approaches-to-inverse-structural
Repo https://github.com/Lance-Q/Awesome-Embedding-Optimization-Paper-List
Framework none

Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Title Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions
Authors Zhilin Zheng, Li Sun
Abstract VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called “posterior collapse”. To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$, for a single input. We apply two separated encoders to map the input into $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$ respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code $\bm{\mathrm{z}}_u$ represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While $\bm{\mathrm{z}}_s$ is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in $\bm{\mathrm{z}}_s$ encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of $\bm{\mathrm{z}}_s$ and the class label $c$, and it can directly increase the mutual information between $\bm{\mathrm{z}}_s$ and the label $c$. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.
Tasks
Published 2018-12-22
URL http://arxiv.org/abs/1812.09502v4
PDF http://arxiv.org/pdf/1812.09502v4.pdf
PWC https://paperswithcode.com/paper/disentangling-latent-space-for-vae-by-label
Repo https://github.com/ZhilZheng/Lr-LiVAE
Framework tf
comments powered by Disqus