October 20, 2019

2880 words 14 mins read

Paper Group AWR 214

Paper Group AWR 214

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream. Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera. KONG: Kernels for ordered-neighborhood graphs. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. Superpixel Sampling Networks. ELEGANT: Exchanging Latent En …

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

Title Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream
Authors Maurits Kaptein, Paul Ketelaar
Abstract In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.
Tasks
Published 2018-02-28
URL http://arxiv.org/abs/1802.10529v1
PDF http://arxiv.org/pdf/1802.10529v1.pdf
PWC https://paperswithcode.com/paper/maximum-likelihood-estimation-of-a-finite
Repo https://github.com/MKaptein/ofmlr
Framework none

Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera

Title Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera
Authors Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai
Abstract Event-based cameras can measure intensity changes (called `{\it events}') with microsecond accuracy under high-speed motion and challenging lighting conditions. With the active pixel sensor (APS), the event camera allows simultaneous output of the intensity frames. However, the output images are captured at a relatively low frame-rate and often suffer from motion blur. A blurry image can be regarded as the integral of a sequence of latent images, while the events indicate the changes between the latent images. Therefore, we are able to model the blur-generation process by associating event data to a latent image. In this paper, we propose a simple and effective approach, the \textbf{Event-based Double Integral (EDI)} model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data. The video generation is based on solving a simple non-convex optimization problem in a single scalar variable. Experimental results on both synthetic and real images demonstrate the superiority of our EDI model and optimization method in comparison to the state-of-the-art. |
Tasks Video Generation
Published 2018-11-26
URL http://arxiv.org/abs/1811.10180v2
PDF http://arxiv.org/pdf/1811.10180v2.pdf
PWC https://paperswithcode.com/paper/bringing-a-blurry-frame-alive-at-high-frame
Repo https://github.com/panpanfei/Bringing-a-Blurry-Frame-Alive-at-High-Frame-Rate-with-an-Event-Camera
Framework none

KONG: Kernels for ordered-neighborhood graphs

Title KONG: Kernels for ordered-neighborhood graphs
Authors Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic
Abstract We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation of explicit graph feature maps using sketching techniques. We obtain precise bounds for the approximation accuracy and computational complexity of the proposed approaches and demonstrate their applicability on real datasets. In particular, our experiments demonstrate that neighborhood ordering results in more informative features. For the special case of general graphs, i.e. graphs without ordered neighborhoods, the new graph kernels yield efficient and simple algorithms for the comparison of label distributions between graphs.
Tasks
Published 2018-05-25
URL http://arxiv.org/abs/1805.10014v2
PDF http://arxiv.org/pdf/1805.10014v2.pdf
PWC https://paperswithcode.com/paper/kong-kernels-for-ordered-neighborhood-graphs
Repo https://github.com/kokiche/KONG
Framework none

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Title Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
Authors Nicholas Carlini, David Wagner
Abstract We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.
Tasks Speech Recognition
Published 2018-01-05
URL http://arxiv.org/abs/1801.01944v2
PDF http://arxiv.org/pdf/1801.01944v2.pdf
PWC https://paperswithcode.com/paper/audio-adversarial-examples-targeted-attacks
Repo https://github.com/carlini/audio_adversarial_examples
Framework tf

Superpixel Sampling Networks

Title Superpixel Sampling Networks
Authors Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
Abstract Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks. Existing superpixel algorithms are not differentiable, making them difficult to integrate into otherwise end-to-end trainable deep neural networks. We develop a new differentiable model for superpixel sampling that leverages deep networks for learning superpixel segmentation. The resulting “Superpixel Sampling Network” (SSN) is end-to-end trainable, which allows learning task-specific superpixels with flexible loss functions and has fast runtime. Extensive experimental analysis indicates that SSNs not only outperform existing superpixel algorithms on traditional segmentation benchmarks, but can also learn superpixels for other tasks. In addition, SSNs can be easily integrated into downstream deep networks resulting in performance improvements.
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.10174v1
PDF http://arxiv.org/pdf/1807.10174v1.pdf
PWC https://paperswithcode.com/paper/superpixel-sampling-networks
Repo https://github.com/NVlabs/ssn_superpixels
Framework caffe2

ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

Title ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes
Authors Taihong Xiao, Jiapeng Hong, Jinwen Ma
Abstract Recent studies on face attribute transfer have achieved great success. A lot of models are able to transfer face attributes with an input image. However, they suffer from three limitations: (1) incapability of generating image by exemplars; (2) being unable to transfer multiple face attributes simultaneously; (3) low quality of generated images, such as low-resolution or artifacts. To address these limitations, we propose a novel model which receives two images of opposite attributes as inputs. Our model can transfer exactly the same type of attributes from one image to another by exchanging certain part of their encodings. All the attributes are encoded in a disentangled manner in the latent space, which enables us to manipulate several attributes simultaneously. Besides, our model learns the residual images so as to facilitate training on higher resolution images. With the help of multi-scale discriminators for adversarial training, it can even generate high-quality images with finer details and less artifacts. We demonstrate the effectiveness of our model on overcoming the above three limitations by comparing with other methods on the CelebA face database. A pytorch implementation is available at https://github.com/Prinsphield/ELEGANT.
Tasks
Published 2018-03-28
URL http://arxiv.org/abs/1803.10562v2
PDF http://arxiv.org/pdf/1803.10562v2.pdf
PWC https://paperswithcode.com/paper/elegant-exchanging-latent-encodings-with-gan
Repo https://github.com/Prinsphield/ELEGANT
Framework pytorch

BodyNet: Volumetric Inference of 3D Human Body Shapes

Title BodyNet: Volumetric Inference of 3D Human Body Shapes
Authors Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid
Abstract Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.
Tasks
Published 2018-04-13
URL http://arxiv.org/abs/1804.04875v3
PDF http://arxiv.org/pdf/1804.04875v3.pdf
PWC https://paperswithcode.com/paper/bodynet-volumetric-inference-of-3d-human-body
Repo https://github.com/gulvarol/bodynet
Framework pytorch

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Title On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses
Authors Anish Athalye, Nicholas Carlini
Abstract Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.
Tasks
Published 2018-04-10
URL http://arxiv.org/abs/1804.03286v1
PDF http://arxiv.org/pdf/1804.03286v1.pdf
PWC https://paperswithcode.com/paper/on-the-robustness-of-the-cvpr-2018-white-box
Repo https://github.com/anishathalye/Guided-Denoise
Framework tf

Hierarchical binary CNNs for landmark localization with limited resources

Title Hierarchical binary CNNs for landmark localization with limited resources
Authors Adrian Bulat, Georgios Tzimiropoulos
Abstract Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. (e) We further provide additional results for the problem of facial part segmentation. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmark
Tasks Face Alignment, Pose Estimation
Published 2018-08-14
URL http://arxiv.org/abs/1808.04803v1
PDF http://arxiv.org/pdf/1808.04803v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-binary-cnns-for-landmark
Repo https://github.com/1996scarlet/OpenVtuber
Framework mxnet

Neural Factor Graph Models for Cross-lingual Morphological Tagging

Title Neural Factor Graph Models for Cross-lingual Morphological Tagging
Authors Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig
Abstract Morphological analysis involves predicting the syntactic traits of a word (e.g. {POS: Noun, Case: Acc, Gender: Fem}). Previous work in morphological tagging improves performance for low-resource languages (LRLs) through cross-lingual training with a high-resource language (HRL) from the same family, but is limited by the strict, often false, assumption that tag sets exactly overlap between the HRL and LRL. In this paper we propose a method for cross-lingual morphological tagging that aims to improve information sharing between languages by relaxing this assumption. The proposed model uses factorial conditional random fields with neural network potentials, making it possible to (1) utilize the expressive power of neural network representations to smooth over superficial differences in the surface forms, (2) model pairwise and transitive relationships between tags, and (3) accurately generate tag sets that are unseen or rare in the training data. Experiments on four languages from the Universal Dependencies Treebank demonstrate superior tagging accuracies over existing cross-lingual approaches.
Tasks Morphological Analysis, Morphological Tagging
Published 2018-05-11
URL http://arxiv.org/abs/1805.04570v3
PDF http://arxiv.org/pdf/1805.04570v3.pdf
PWC https://paperswithcode.com/paper/neural-factor-graph-models-for-cross-lingual
Repo https://github.com/chaitanyamalaviya/NeuralFactorGraph
Framework pytorch

Incremental kernel PCA and the Nyström method

Title Incremental kernel PCA and the Nyström method
Authors Fredrik Hallgren, Paul Northrop
Abstract Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr"om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr"om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained.
Tasks
Published 2018-01-31
URL http://arxiv.org/abs/1802.00043v1
PDF http://arxiv.org/pdf/1802.00043v1.pdf
PWC https://paperswithcode.com/paper/incremental-kernel-pca-and-the-nystrom-method
Repo https://github.com/cfjhallgren/inkpca
Framework none

Morphological analysis using a sequence decoder

Title Morphological analysis using a sequence decoder
Authors Ekin Akyürek, Erenay Dayanık, Deniz Yuret
Abstract We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g.\ an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research\footnote{See \url{https://github.com/ai-ku/Morse.jl} for a Morse implementation in Julia/Knet \cite{knet2016mlsys} and \url{https://github.com/ai-ku/TrMor2018} for the new Turkish dataset.}.
Tasks Morphological Analysis, Transfer Learning
Published 2018-05-21
URL https://arxiv.org/abs/1805.07946v2
PDF https://arxiv.org/pdf/1805.07946v2.pdf
PWC https://paperswithcode.com/paper/morphnet-a-sequence-to-sequence-model-that
Repo https://github.com/ai-ku/Morse.jl
Framework none

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Title SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
Authors Alexander Mathews, Lexing Xie, Xuming He
Abstract Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.
Tasks Image Captioning, Language Modelling
Published 2018-05-18
URL http://arxiv.org/abs/1805.07030v1
PDF http://arxiv.org/pdf/1805.07030v1.pdf
PWC https://paperswithcode.com/paper/semstyle-learning-to-generate-stylised-image
Repo https://github.com/computationalmedia/semstyle
Framework pytorch
Title SNAS: Stochastic Neural Architecture Search
Authors Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin
Abstract We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.
Tasks Neural Architecture Search
Published 2018-12-24
URL https://arxiv.org/abs/1812.09926v3
PDF https://arxiv.org/pdf/1812.09926v3.pdf
PWC https://paperswithcode.com/paper/snas-stochastic-neural-architecture-search
Repo https://github.com/SNAS-Series/SNAS-Series
Framework pytorch

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

Title Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Authors Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Abstract Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks.
Tasks Word Embeddings
Published 2018-08-27
URL http://arxiv.org/abs/1808.08780v1
PDF http://arxiv.org/pdf/1808.08780v1.pdf
PWC https://paperswithcode.com/paper/improving-cross-lingual-word-embeddings-by
Repo https://github.com/yeraidm/meemi
Framework pytorch
comments powered by Disqus