October 20, 2019

2880 words 14 mins read

Paper Group AWR 214

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream. Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera. KONG: Kernels for ordered-neighborhood graphs. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. Superpixel Sampling Networks. ELEGANT: Exchanging Latent En …

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream


Title	Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream
Authors	Maurits Kaptein, Paul Ketelaar
Abstract	In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10529v1
PDF	http://arxiv.org/pdf/1802.10529v1.pdf
PWC	https://paperswithcode.com/paper/maximum-likelihood-estimation-of-a-finite
Repo	https://github.com/MKaptein/ofmlr
Framework	none

Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera


Title	Bringing a Blurry Frame Alive at High Frame-Rate with an Event Camera
Authors	Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai
Abstract	Event-based cameras can measure intensity changes (called `{\it events}') with microsecond accuracy under high-speed motion and challenging lighting conditions. With the active pixel sensor (APS), the event camera allows simultaneous output of the intensity frames. However, the output images are captured at a relatively low frame-rate and often suffer from motion blur. A blurry image can be regarded as the integral of a sequence of latent images, while the events indicate the changes between the latent images. Therefore, we are able to model the blur-generation process by associating event data to a latent image. In this paper, we propose a simple and effective approach, the \textbf{Event-based Double Integral (EDI)} model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data. The video generation is based on solving a simple non-convex optimization problem in a single scalar variable. Experimental results on both synthetic and real images demonstrate the superiority of our EDI model and optimization method in comparison to the state-of-the-art. \|
Tasks	Video Generation
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10180v2
PDF	http://arxiv.org/pdf/1811.10180v2.pdf
PWC	https://paperswithcode.com/paper/bringing-a-blurry-frame-alive-at-high-frame
Repo	https://github.com/panpanfei/Bringing-a-Blurry-Frame-Alive-at-High-Frame-Rate-with-an-Event-Camera
Framework	none

KONG: Kernels for ordered-neighborhood graphs


Title	KONG: Kernels for ordered-neighborhood graphs
Authors	Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic
Abstract	We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation of explicit graph feature maps using sketching techniques. We obtain precise bounds for the approximation accuracy and computational complexity of the proposed approaches and demonstrate their applicability on real datasets. In particular, our experiments demonstrate that neighborhood ordering results in more informative features. For the special case of general graphs, i.e. graphs without ordered neighborhoods, the new graph kernels yield efficient and simple algorithms for the comparison of label distributions between graphs.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10014v2
PDF	http://arxiv.org/pdf/1805.10014v2.pdf
PWC	https://paperswithcode.com/paper/kong-kernels-for-ordered-neighborhood-graphs
Repo	https://github.com/kokiche/KONG
Framework	none

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text


Title	Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
Authors	Nicholas Carlini, David Wagner
Abstract	We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.
Tasks	Speech Recognition
Published	2018-01-05
URL	http://arxiv.org/abs/1801.01944v2
PDF	http://arxiv.org/pdf/1801.01944v2.pdf
PWC	https://paperswithcode.com/paper/audio-adversarial-examples-targeted-attacks
Repo	https://github.com/carlini/audio_adversarial_examples
Framework	tf

Superpixel Sampling Networks


Title	Superpixel Sampling Networks
Authors	Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
Abstract	Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks. Existing superpixel algorithms are not differentiable, making them difficult to integrate into otherwise end-to-end trainable deep neural networks. We develop a new differentiable model for superpixel sampling that leverages deep networks for learning superpixel segmentation. The resulting “Superpixel Sampling Network” (SSN) is end-to-end trainable, which allows learning task-specific superpixels with flexible loss functions and has fast runtime. Extensive experimental analysis indicates that SSNs not only outperform existing superpixel algorithms on traditional segmentation benchmarks, but can also learn superpixels for other tasks. In addition, SSNs can be easily integrated into downstream deep networks resulting in performance improvements.
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10174v1
PDF	http://arxiv.org/pdf/1807.10174v1.pdf
PWC	https://paperswithcode.com/paper/superpixel-sampling-networks
Repo	https://github.com/NVlabs/ssn_superpixels
Framework	caffe2

ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes


Title	ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes
Authors	Taihong Xiao, Jiapeng Hong, Jinwen Ma
Abstract	Recent studies on face attribute transfer have achieved great success. A lot of models are able to transfer face attributes with an input image. However, they suffer from three limitations: (1) incapability of generating image by exemplars; (2) being unable to transfer multiple face attributes simultaneously; (3) low quality of generated images, such as low-resolution or artifacts. To address these limitations, we propose a novel model which receives two images of opposite attributes as inputs. Our model can transfer exactly the same type of attributes from one image to another by exchanging certain part of their encodings. All the attributes are encoded in a disentangled manner in the latent space, which enables us to manipulate several attributes simultaneously. Besides, our model learns the residual images so as to facilitate training on higher resolution images. With the help of multi-scale discriminators for adversarial training, it can even generate high-quality images with finer details and less artifacts. We demonstrate the effectiveness of our model on overcoming the above three limitations by comparing with other methods on the CelebA face database. A pytorch implementation is available at https://github.com/Prinsphield/ELEGANT.
Tasks
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10562v2
PDF	http://arxiv.org/pdf/1803.10562v2.pdf
PWC	https://paperswithcode.com/paper/elegant-exchanging-latent-encodings-with-gan
Repo	https://github.com/Prinsphield/ELEGANT
Framework	pytorch

BodyNet: Volumetric Inference of 3D Human Body Shapes


Title	BodyNet: Volumetric Inference of 3D Human Body Shapes
Authors	Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid
Abstract	Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.
Tasks
Published	2018-04-13
URL	http://arxiv.org/abs/1804.04875v3
PDF	http://arxiv.org/pdf/1804.04875v3.pdf
PWC	https://paperswithcode.com/paper/bodynet-volumetric-inference-of-3d-human-body
Repo	https://github.com/gulvarol/bodynet
Framework	pytorch

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses


Title	On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses
Authors	Anish Athalye, Nicholas Carlini
Abstract	Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03286v1
PDF	http://arxiv.org/pdf/1804.03286v1.pdf
PWC	https://paperswithcode.com/paper/on-the-robustness-of-the-cvpr-2018-white-box
Repo	https://github.com/anishathalye/Guided-Denoise
Framework	tf

Hierarchical binary CNNs for landmark localization with limited resources


Title	Hierarchical binary CNNs for landmark localization with limited resources
Authors	Adrian Bulat, Georgios Tzimiropoulos
Abstract	Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. (e) We further provide additional results for the problem of facial part segmentation. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmark
Tasks	Face Alignment, Pose Estimation
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04803v1
PDF	http://arxiv.org/pdf/1808.04803v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-binary-cnns-for-landmark
Repo	https://github.com/1996scarlet/OpenVtuber
Framework	mxnet

Neural Factor Graph Models for Cross-lingual Morphological Tagging


Title	Neural Factor Graph Models for Cross-lingual Morphological Tagging
Authors	Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig
Abstract	Morphological analysis involves predicting the syntactic traits of a word (e.g. {POS: Noun, Case: Acc, Gender: Fem}). Previous work in morphological tagging improves performance for low-resource languages (LRLs) through cross-lingual training with a high-resource language (HRL) from the same family, but is limited by the strict, often false, assumption that tag sets exactly overlap between the HRL and LRL. In this paper we propose a method for cross-lingual morphological tagging that aims to improve information sharing between languages by relaxing this assumption. The proposed model uses factorial conditional random fields with neural network potentials, making it possible to (1) utilize the expressive power of neural network representations to smooth over superficial differences in the surface forms, (2) model pairwise and transitive relationships between tags, and (3) accurately generate tag sets that are unseen or rare in the training data. Experiments on four languages from the Universal Dependencies Treebank demonstrate superior tagging accuracies over existing cross-lingual approaches.
Tasks	Morphological Analysis, Morphological Tagging
Published	2018-05-11
URL	http://arxiv.org/abs/1805.04570v3
PDF	http://arxiv.org/pdf/1805.04570v3.pdf
PWC	https://paperswithcode.com/paper/neural-factor-graph-models-for-cross-lingual
Repo	https://github.com/chaitanyamalaviya/NeuralFactorGraph
Framework	pytorch

Incremental kernel PCA and the Nyström method


Title	Incremental kernel PCA and the Nyström method
Authors	Fredrik Hallgren, Paul Northrop
Abstract	Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr"om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr"om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1802.00043v1
PDF	http://arxiv.org/pdf/1802.00043v1.pdf
PWC	https://paperswithcode.com/paper/incremental-kernel-pca-and-the-nystrom-method
Repo	https://github.com/cfjhallgren/inkpca
Framework	none

Morphological analysis using a sequence decoder


Title	Morphological analysis using a sequence decoder
Authors	Ekin Akyürek, Erenay Dayanık, Deniz Yuret
Abstract	We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g.\ an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research\footnote{See \url{https://github.com/ai-ku/Morse.jl} for a Morse implementation in Julia/Knet \cite{knet2016mlsys} and \url{https://github.com/ai-ku/TrMor2018} for the new Turkish dataset.}.
Tasks	Morphological Analysis, Transfer Learning
Published	2018-05-21
URL	https://arxiv.org/abs/1805.07946v2
PDF	https://arxiv.org/pdf/1805.07946v2.pdf
PWC	https://paperswithcode.com/paper/morphnet-a-sequence-to-sequence-model-that
Repo	https://github.com/ai-ku/Morse.jl
Framework	none

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text


Title	SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
Authors	Alexander Mathews, Lexing Xie, Xuming He
Abstract	Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.
Tasks	Image Captioning, Language Modelling
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07030v1
PDF	http://arxiv.org/pdf/1805.07030v1.pdf
PWC	https://paperswithcode.com/paper/semstyle-learning-to-generate-stylised-image
Repo	https://github.com/computationalmedia/semstyle
Framework	pytorch

SNAS: Stochastic Neural Architecture Search


Title	SNAS: Stochastic Neural Architecture Search
Authors	Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin
Abstract	We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.
Tasks	Neural Architecture Search
Published	2018-12-24
URL	https://arxiv.org/abs/1812.09926v3
PDF	https://arxiv.org/pdf/1812.09926v3.pdf
PWC	https://paperswithcode.com/paper/snas-stochastic-neural-architecture-search
Repo	https://github.com/SNAS-Series/SNAS-Series
Framework	pytorch

Improving Cross-Lingual Word Embeddings by Meeting in the Middle


Title	Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Authors	Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Abstract	Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks.
Tasks	Word Embeddings
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08780v1
PDF	http://arxiv.org/pdf/1808.08780v1.pdf
PWC	https://paperswithcode.com/paper/improving-cross-lingual-word-embeddings-by
Repo	https://github.com/yeraidm/meemi
Framework	pytorch