July 29, 2019

3185 words 15 mins read

Paper Group AWR 91

Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models. Data Selection Strategies for Multi-Domain Sentiment Analysis. Real-Time Machine Learning: The Missing Pieces. Style Transfer in Text: Exploration and Evaluation. Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering. …

Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models


Title	Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models
Authors	Aditya Grover, Manik Dhar, Stefano Ermon
Abstract	Adversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples.
Tasks
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08868v2
PDF	http://arxiv.org/pdf/1705.08868v2.pdf
PWC	https://paperswithcode.com/paper/flow-gan-combining-maximum-likelihood-and
Repo	https://github.com/ermongroup/flow-gan
Framework	tf

Data Selection Strategies for Multi-Domain Sentiment Analysis


Title	Data Selection Strategies for Multi-Domain Sentiment Analysis
Authors	Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Abstract	Domain adaptation is important in sentiment analysis as sentiment-indicating words vary between domains. Recently, multi-domain adaptation has become more pervasive, but existing approaches train on all available source domains including dissimilar ones. However, the selection of appropriate training data is as important as the choice of algorithm. We undertake – to our knowledge for the first time – an extensive study of domain similarity metrics in the context of sentiment analysis and propose novel representations, metrics, and a new scope for data selection. We evaluate the proposed methods on two large-scale multi-domain adaptation settings on tweets and reviews and demonstrate that they consistently outperform strong random and balanced baselines, while our proposed selection strategy outperforms instance-level selection and yields the best score on a large reviews corpus.
Tasks	Domain Adaptation, Sentiment Analysis
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02426v1
PDF	http://arxiv.org/pdf/1702.02426v1.pdf
PWC	https://paperswithcode.com/paper/data-selection-strategies-for-multi-domain
Repo	https://github.com/andy-yangz/writing_style_transfer
Framework	none

Real-Time Machine Learning: The Missing Pieces


Title	Real-Time Machine Learning: The Missing Pieces
Authors	Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael I. Jordan, Ion Stoica
Abstract	Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.
Tasks	Decision Making
Published	2017-03-11
URL	http://arxiv.org/abs/1703.03924v2
PDF	http://arxiv.org/pdf/1703.03924v2.pdf
PWC	https://paperswithcode.com/paper/real-time-machine-learning-the-missing-pieces
Repo	https://github.com/richardliaw/ray
Framework	tf

Style Transfer in Text: Exploration and Evaluation


Title	Style Transfer in Text: Exploration and Evaluation
Authors	Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, Rui Yan
Abstract	Style transfer is an important problem in natural language processing (NLP). However, the progress in language style transfer is lagged behind other domains, such as computer vision, mainly because of the lack of parallel data and principle evaluation metrics. In this paper, we propose to learn style transfer with non-parallel data. We explore two models to achieve this goal, and the key idea behind the proposed models is to learn separate content representations and style representations using adversarial networks. We also propose novel evaluation metrics which measure two aspects of style transfer: transfer strength and content preservation. We access our models and the evaluation metrics on two tasks: paper-news title transfer, and positive-negative review transfer. Results show that the proposed content preservation metric is highly correlate to human judgments, and the proposed models are able to generate sentences with higher style transfer strength and similar content preservation score comparing to auto-encoder.
Tasks	Style Transfer, Text Style Transfer
Published	2017-11-18
URL	http://arxiv.org/abs/1711.06861v2
PDF	http://arxiv.org/pdf/1711.06861v2.pdf
PWC	https://paperswithcode.com/paper/style-transfer-in-text-exploration-and
Repo	https://github.com/fuzhenxin/text_style_transfer
Framework	none

Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering


Title	Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering
Authors	Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, Dacheng Tao
Abstract	Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multi-modal feature fusion that is able to capture the complex interactions between multi-modal features; 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a `co-attention’ mechanism is developed by using a deep neural network architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multi-modal feature fusion, a generalized Multi-modal Factorized High-order pooling approach (MFH) is developed to achieve more effective fusion of multi-modal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the KL (Kullback-Leibler) divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A deep neural network architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA datasets and win the runner-up in VQA Challenge 2017. \|
Tasks	Question Answering, Visual Question Answering
Published	2017-08-10
URL	https://arxiv.org/abs/1708.03619v2
PDF	https://arxiv.org/pdf/1708.03619v2.pdf
PWC	https://paperswithcode.com/paper/beyond-bilinear-generalized-multi-modal
Repo	https://github.com/yuzcccc/vqa-mfb
Framework	caffe2

Kernel method for persistence diagrams via kernel embedding and weight factor


Title	Kernel method for persistence diagrams via kernel embedding and weight factor
Authors	Genki Kusano, Kenji Fukumizu, Yasuaki Hiraoka
Abstract	Topological data analysis is an emerging mathematical concept for characterizing shapes in multi-scale data. In this field, persistence diagrams are widely used as a descriptor of the input data, and can distinguish robust and noisy topological properties. Nowadays, it is highly desired to develop a statistical framework on persistence diagrams to deal with practical data. This paper proposes a kernel method on persistence diagrams. A theoretical contribution of our method is that the proposed kernel allows one to control the effect of persistence, and, if necessary, noisy topological properties can be discounted in data analysis. Furthermore, the method provides a fast approximation technique. The method is applied into several problems including practical data in physics, and the results show the advantage compared to the existing kernel method on persistence diagrams.
Tasks	Graph Classification, Topological Data Analysis
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03472v1
PDF	http://arxiv.org/pdf/1706.03472v1.pdf
PWC	https://paperswithcode.com/paper/kernel-method-for-persistence-diagrams-via
Repo	https://github.com/genki-kusano/python-pwgk
Framework	none

End-to-end Learning of Image based Lane-Change Decision


Title	End-to-end Learning of Image based Lane-Change Decision
Authors	Seong-Gyun Jeong, Jiwon Kim, Sujung Kim, Jaesik Min
Abstract	We propose an image based end-to-end learning framework that helps lane-change decisions for human drivers and autonomous vehicles. The proposed system, Safe Lane-Change Aid Network (SLCAN), trains a deep convolutional neural network to classify the status of adjacent lanes from rear view images acquired by cameras mounted on both sides of the vehicle. Rather than depending on any explicit object detection or tracking scheme, SLCAN reads the whole input image and directly decides whether initiation of the lane-change at the moment is safe or not. We collected and annotated 77,273 rear side view images to train and test SLCAN. Experimental results show that the proposed framework achieves 96.98% classification accuracy although the test images are from unseen roadways. We also visualize the saliency map to understand which part of image SLCAN looks at for correct decisions.
Tasks	Autonomous Vehicles, Object Detection
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08211v1
PDF	http://arxiv.org/pdf/1706.08211v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-image-based-lane
Repo	https://github.com/jsgyun/SLCAN
Framework	none

Chessboard and chess piece recognition with the support of neural networks


Title	Chessboard and chess piece recognition with the support of neural networks
Authors	Maciej A. Czyzewski, Artur Laskowski, Szymon Wasik
Abstract	Chessboard and chess piece recognition is a computer vision problem that has not yet been efficiently solved. However, its solution is crucial for many experienced players who wish to compete against AI bots, but also prefer to make decisions based on the analysis of a physical chessboard. It is also important for organizers of chess tournaments who wish to digitize play for online broadcasting or ordinary players who wish to share their gameplay with friends. Typically, such digitization tasks are performed by humans or with the aid of specialized chessboards and pieces. However, neither solution is easy or convenient. To solve this problem, we propose a novel algorithm for digitizing chessboard configurations. We designed a method that is resistant to lighting conditions and the angle at which images are captured, and works correctly with numerous chessboard styles. The proposed algorithm processes pictures iteratively. During each iteration, it executes three major sub-processes: detecting straight lines, finding lattice points, and positioning the chessboard. Finally, we identify all chess pieces and generate a description of the board utilizing standard notation. For each of these steps, we designed our own algorithm that surpasses existing solutions. We support our algorithms by utilizing machine learning techniques whenever possible. The described method performs extraordinarily well and achieves an accuracy over $99.5%$ for detecting chessboard lattice points (compared to the $74%$ for the best alternative), $95%$ (compared to $60%$ for the best alternative) for positioning the chessboard in an image, and almost $95%$ for chess piece recognition.
Tasks	Object Recognition
Published	2017-08-13
URL	http://arxiv.org/abs/1708.03898v2
PDF	http://arxiv.org/pdf/1708.03898v2.pdf
PWC	https://paperswithcode.com/paper/chessboard-and-chess-piece-recognition-with
Repo	https://github.com/maciejczyzewski/kck2019
Framework	none

Scalable Multi-Domain Dialogue State Tracking


Title	Scalable Multi-Domain Dialogue State Tracking
Authors	Abhinav Rastogi, Dilek Hakkani-Tur, Larry Heck
Abstract	Dialogue state tracking (DST) is a key component of task-oriented dialogue systems. DST estimates the user’s goal at each user turn given the interaction until then. State of the art approaches for state tracking rely on deep learning methods, and represent dialogue state as a distribution over all possible slot values for each slot present in the ontology. Such a representation is not scalable when the set of possible values are unbounded (e.g., date, time or location) or dynamic (e.g., movies or usernames). Furthermore, training of such models requires labeled data, where each user turn is annotated with the dialogue state, which makes building models for new domains challenging. In this paper, we present a scalable multi-domain deep learning based approach for DST. We introduce a novel framework for state tracking which is independent of the slot value set, and represent the dialogue state as a distribution over a set of values of interest (candidate set) derived from the dialogue history or knowledge. Restricting these candidate sets to be bounded in size addresses the problem of slot-scalability. Furthermore, by leveraging the slot-independent architecture and transfer learning, we show that our proposed approach facilitates quick adaptation to new domains.
Tasks	Dialogue State Tracking, Task-Oriented Dialogue Systems, Transfer Learning
Published	2017-12-29
URL	http://arxiv.org/abs/1712.10224v2
PDF	http://arxiv.org/pdf/1712.10224v2.pdf
PWC	https://paperswithcode.com/paper/scalable-multi-domain-dialogue-state-tracking
Repo	https://github.com/google-research-datasets/simulated-dialogue
Framework	none

Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks


Title	Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks
Authors	Tom Veniat, Ludovic Denoyer
Abstract	We propose to focus on the problem of discovering neural network architectures efficient in terms of both prediction quality and cost. For instance, our approach is able to solve the following tasks: learn a neural network able to predict well in less than 100 milliseconds or learn an efficient model that fits in a 50 Mb memory. Our contribution is a novel family of models called Budgeted Super Networks (BSN). They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost and a distributed computation cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and Convolutional Neural Fabrics architectures on CIFAR-10 and CIFAR-100, at a lower cost.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00046v4
PDF	http://arxiv.org/pdf/1706.00046v4.pdf
PWC	https://paperswithcode.com/paper/learning-timememory-efficient-deep
Repo	https://github.com/TomVeniat/bsn
Framework	pytorch

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition


Title	3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition
Authors	Amirsina Torfi, Seyed Mehdi Iranmanesh, Nasser M. Nasrabadi, Jeremy Dawson
Abstract	Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features. The proposed architecture will incorporate both spatial and temporal information jointly to effectively find the correlation between temporal information for different modalities. By using a relatively small network architecture and much smaller dataset for training, our proposed method surpasses the performance of the existing similar methods for audio-visual matching which use 3D CNNs for feature representation. We also demonstrate that an effective pair selection method can significantly increase the performance. The proposed method achieves relative improvements over 20% on the Equal Error Rate (EER) and over 7% on the Average Precision (AP) in comparison to the state-of-the-art method.
Tasks	Speaker Verification, Speech Recognition
Published	2017-06-18
URL	http://arxiv.org/abs/1706.05739v5
PDF	http://arxiv.org/pdf/1706.05739v5.pdf
PWC	https://paperswithcode.com/paper/3d-convolutional-neural-networks-for-cross
Repo	https://github.com/astorfi/lip-reading-deeplearning
Framework	tf

Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball


Title	Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball
Authors	Scott Powers, Trevor Hastie, Robert Tibshirani
Abstract	We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the response categories to make better predictions. We apply our method, nuclear penalized multinomial regression (NPMR), to Major League Baseball play-by-play data to predict outcome probabilities based on batter-pitcher matchups. The interpretation of the results meshes well with subject-area expertise and also suggests a novel understanding of what differentiates players.
Tasks
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10272v1
PDF	http://arxiv.org/pdf/1706.10272v1.pdf
PWC	https://paperswithcode.com/paper/nuclear-penalized-multinomial-regression-with
Repo	https://github.com/saberpowers/npmr
Framework	none

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification


Title	Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification
Authors	Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov
Abstract	We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.
Tasks	Text Classification
Published	2017-01-23
URL	http://arxiv.org/abs/1701.06511v3
PDF	http://arxiv.org/pdf/1701.06511v3.pdf
PWC	https://paperswithcode.com/paper/aggressive-sampling-for-multi-class-to-binary
Repo	https://github.com/bikash617/Aggressive-Sampling-for-Multi-class-to-BinaryReduction
Framework	none

XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings


Title	XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Authors	Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy
Abstract	Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN (“Cross-GAN”), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we collected for this purpose is publicly available at google.github.io/cartoonset/ as a new benchmark for semantic style transfer.
Tasks	Domain Adaptation, Image-to-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05139v6
PDF	http://arxiv.org/pdf/1711.05139v6.pdf
PWC	https://paperswithcode.com/paper/xgan-unsupervised-image-to-image-translation
Repo	https://github.com/CS2470FinalProject/X-GAN
Framework	tf

Dimensionality Reduction using Similarity-induced Embeddings


Title	Dimensionality Reduction using Similarity-induced Embeddings
Authors	Nikolaos Passalis, Anastasios Tefas
Abstract	The vast majority of Dimensionality Reduction (DR) techniques rely on second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this work, a new DR framework, that can directly model the target distribution using the notion of similarity instead of distance, is introduced. The proposed framework, called Similarity Embedding Framework, can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the Similarity Embedding Framework becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised dimensionality reduction and providing out-of-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six datasets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques.
Tasks	Dimensionality Reduction
Published	2017-06-18
URL	http://arxiv.org/abs/1706.05692v3
PDF	http://arxiv.org/pdf/1706.05692v3.pdf
PWC	https://paperswithcode.com/paper/dimensionality-reduction-using-similarity
Repo	https://github.com/passalis/sef
Framework	pytorch