Paper Group AWR 91
Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models. Data Selection Strategies for Multi-Domain Sentiment Analysis. Real-Time Machine Learning: The Missing Pieces. Style Transfer in Text: Exploration and Evaluation. Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering. …
Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models
Title | Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models |
Authors | Aditya Grover, Manik Dhar, Stefano Ermon |
Abstract | Adversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples. |
Tasks | |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08868v2 |
http://arxiv.org/pdf/1705.08868v2.pdf | |
PWC | https://paperswithcode.com/paper/flow-gan-combining-maximum-likelihood-and |
Repo | https://github.com/ermongroup/flow-gan |
Framework | tf |
Data Selection Strategies for Multi-Domain Sentiment Analysis
Title | Data Selection Strategies for Multi-Domain Sentiment Analysis |
Authors | Sebastian Ruder, Parsa Ghaffari, John G. Breslin |
Abstract | Domain adaptation is important in sentiment analysis as sentiment-indicating words vary between domains. Recently, multi-domain adaptation has become more pervasive, but existing approaches train on all available source domains including dissimilar ones. However, the selection of appropriate training data is as important as the choice of algorithm. We undertake – to our knowledge for the first time – an extensive study of domain similarity metrics in the context of sentiment analysis and propose novel representations, metrics, and a new scope for data selection. We evaluate the proposed methods on two large-scale multi-domain adaptation settings on tweets and reviews and demonstrate that they consistently outperform strong random and balanced baselines, while our proposed selection strategy outperforms instance-level selection and yields the best score on a large reviews corpus. |
Tasks | Domain Adaptation, Sentiment Analysis |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02426v1 |
http://arxiv.org/pdf/1702.02426v1.pdf | |
PWC | https://paperswithcode.com/paper/data-selection-strategies-for-multi-domain |
Repo | https://github.com/andy-yangz/writing_style_transfer |
Framework | none |
Real-Time Machine Learning: The Missing Pieces
Title | Real-Time Machine Learning: The Missing Pieces |
Authors | Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael I. Jordan, Ion Stoica |
Abstract | Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application. |
Tasks | Decision Making |
Published | 2017-03-11 |
URL | http://arxiv.org/abs/1703.03924v2 |
http://arxiv.org/pdf/1703.03924v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-machine-learning-the-missing-pieces |
Repo | https://github.com/richardliaw/ray |
Framework | tf |
Style Transfer in Text: Exploration and Evaluation
Title | Style Transfer in Text: Exploration and Evaluation |
Authors | Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, Rui Yan |
Abstract | Style transfer is an important problem in natural language processing (NLP). However, the progress in language style transfer is lagged behind other domains, such as computer vision, mainly because of the lack of parallel data and principle evaluation metrics. In this paper, we propose to learn style transfer with non-parallel data. We explore two models to achieve this goal, and the key idea behind the proposed models is to learn separate content representations and style representations using adversarial networks. We also propose novel evaluation metrics which measure two aspects of style transfer: transfer strength and content preservation. We access our models and the evaluation metrics on two tasks: paper-news title transfer, and positive-negative review transfer. Results show that the proposed content preservation metric is highly correlate to human judgments, and the proposed models are able to generate sentences with higher style transfer strength and similar content preservation score comparing to auto-encoder. |
Tasks | Style Transfer, Text Style Transfer |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06861v2 |
http://arxiv.org/pdf/1711.06861v2.pdf | |
PWC | https://paperswithcode.com/paper/style-transfer-in-text-exploration-and |
Repo | https://github.com/fuzhenxin/text_style_transfer |
Framework | none |
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering
Title | Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering |
Authors | Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, Dacheng Tao |
Abstract | Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multi-modal feature fusion that is able to capture the complex interactions between multi-modal features; 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a `co-attention’ mechanism is developed by using a deep neural network architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multi-modal feature fusion, a generalized Multi-modal Factorized High-order pooling approach (MFH) is developed to achieve more effective fusion of multi-modal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the KL (Kullback-Leibler) divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A deep neural network architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA datasets and win the runner-up in VQA Challenge 2017. | |
Tasks | Question Answering, Visual Question Answering |
Published | 2017-08-10 |
URL | https://arxiv.org/abs/1708.03619v2 |
https://arxiv.org/pdf/1708.03619v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-bilinear-generalized-multi-modal |
Repo | https://github.com/yuzcccc/vqa-mfb |
Framework | caffe2 |
Kernel method for persistence diagrams via kernel embedding and weight factor
Title | Kernel method for persistence diagrams via kernel embedding and weight factor |
Authors | Genki Kusano, Kenji Fukumizu, Yasuaki Hiraoka |
Abstract | Topological data analysis is an emerging mathematical concept for characterizing shapes in multi-scale data. In this field, persistence diagrams are widely used as a descriptor of the input data, and can distinguish robust and noisy topological properties. Nowadays, it is highly desired to develop a statistical framework on persistence diagrams to deal with practical data. This paper proposes a kernel method on persistence diagrams. A theoretical contribution of our method is that the proposed kernel allows one to control the effect of persistence, and, if necessary, noisy topological properties can be discounted in data analysis. Furthermore, the method provides a fast approximation technique. The method is applied into several problems including practical data in physics, and the results show the advantage compared to the existing kernel method on persistence diagrams. |
Tasks | Graph Classification, Topological Data Analysis |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03472v1 |
http://arxiv.org/pdf/1706.03472v1.pdf | |
PWC | https://paperswithcode.com/paper/kernel-method-for-persistence-diagrams-via |
Repo | https://github.com/genki-kusano/python-pwgk |
Framework | none |
End-to-end Learning of Image based Lane-Change Decision
Title | End-to-end Learning of Image based Lane-Change Decision |
Authors | Seong-Gyun Jeong, Jiwon Kim, Sujung Kim, Jaesik Min |
Abstract | We propose an image based end-to-end learning framework that helps lane-change decisions for human drivers and autonomous vehicles. The proposed system, Safe Lane-Change Aid Network (SLCAN), trains a deep convolutional neural network to classify the status of adjacent lanes from rear view images acquired by cameras mounted on both sides of the vehicle. Rather than depending on any explicit object detection or tracking scheme, SLCAN reads the whole input image and directly decides whether initiation of the lane-change at the moment is safe or not. We collected and annotated 77,273 rear side view images to train and test SLCAN. Experimental results show that the proposed framework achieves 96.98% classification accuracy although the test images are from unseen roadways. We also visualize the saliency map to understand which part of image SLCAN looks at for correct decisions. |
Tasks | Autonomous Vehicles, Object Detection |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08211v1 |
http://arxiv.org/pdf/1706.08211v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-image-based-lane |
Repo | https://github.com/jsgyun/SLCAN |
Framework | none |
Chessboard and chess piece recognition with the support of neural networks
Title | Chessboard and chess piece recognition with the support of neural networks |
Authors | Maciej A. Czyzewski, Artur Laskowski, Szymon Wasik |
Abstract | Chessboard and chess piece recognition is a computer vision problem that has not yet been efficiently solved. However, its solution is crucial for many experienced players who wish to compete against AI bots, but also prefer to make decisions based on the analysis of a physical chessboard. It is also important for organizers of chess tournaments who wish to digitize play for online broadcasting or ordinary players who wish to share their gameplay with friends. Typically, such digitization tasks are performed by humans or with the aid of specialized chessboards and pieces. However, neither solution is easy or convenient. To solve this problem, we propose a novel algorithm for digitizing chessboard configurations. We designed a method that is resistant to lighting conditions and the angle at which images are captured, and works correctly with numerous chessboard styles. The proposed algorithm processes pictures iteratively. During each iteration, it executes three major sub-processes: detecting straight lines, finding lattice points, and positioning the chessboard. Finally, we identify all chess pieces and generate a description of the board utilizing standard notation. For each of these steps, we designed our own algorithm that surpasses existing solutions. We support our algorithms by utilizing machine learning techniques whenever possible. The described method performs extraordinarily well and achieves an accuracy over $99.5%$ for detecting chessboard lattice points (compared to the $74%$ for the best alternative), $95%$ (compared to $60%$ for the best alternative) for positioning the chessboard in an image, and almost $95%$ for chess piece recognition. |
Tasks | Object Recognition |
Published | 2017-08-13 |
URL | http://arxiv.org/abs/1708.03898v2 |
http://arxiv.org/pdf/1708.03898v2.pdf | |
PWC | https://paperswithcode.com/paper/chessboard-and-chess-piece-recognition-with |
Repo | https://github.com/maciejczyzewski/kck2019 |
Framework | none |
Scalable Multi-Domain Dialogue State Tracking
Title | Scalable Multi-Domain Dialogue State Tracking |
Authors | Abhinav Rastogi, Dilek Hakkani-Tur, Larry Heck |
Abstract | Dialogue state tracking (DST) is a key component of task-oriented dialogue systems. DST estimates the user’s goal at each user turn given the interaction until then. State of the art approaches for state tracking rely on deep learning methods, and represent dialogue state as a distribution over all possible slot values for each slot present in the ontology. Such a representation is not scalable when the set of possible values are unbounded (e.g., date, time or location) or dynamic (e.g., movies or usernames). Furthermore, training of such models requires labeled data, where each user turn is annotated with the dialogue state, which makes building models for new domains challenging. In this paper, we present a scalable multi-domain deep learning based approach for DST. We introduce a novel framework for state tracking which is independent of the slot value set, and represent the dialogue state as a distribution over a set of values of interest (candidate set) derived from the dialogue history or knowledge. Restricting these candidate sets to be bounded in size addresses the problem of slot-scalability. Furthermore, by leveraging the slot-independent architecture and transfer learning, we show that our proposed approach facilitates quick adaptation to new domains. |
Tasks | Dialogue State Tracking, Task-Oriented Dialogue Systems, Transfer Learning |
Published | 2017-12-29 |
URL | http://arxiv.org/abs/1712.10224v2 |
http://arxiv.org/pdf/1712.10224v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-multi-domain-dialogue-state-tracking |
Repo | https://github.com/google-research-datasets/simulated-dialogue |
Framework | none |
Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks
Title | Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks |
Authors | Tom Veniat, Ludovic Denoyer |
Abstract | We propose to focus on the problem of discovering neural network architectures efficient in terms of both prediction quality and cost. For instance, our approach is able to solve the following tasks: learn a neural network able to predict well in less than 100 milliseconds or learn an efficient model that fits in a 50 Mb memory. Our contribution is a novel family of models called Budgeted Super Networks (BSN). They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost and a distributed computation cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and Convolutional Neural Fabrics architectures on CIFAR-10 and CIFAR-100, at a lower cost. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00046v4 |
http://arxiv.org/pdf/1706.00046v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-timememory-efficient-deep |
Repo | https://github.com/TomVeniat/bsn |
Framework | pytorch |
3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition
Title | 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition |
Authors | Amirsina Torfi, Seyed Mehdi Iranmanesh, Nasser M. Nasrabadi, Jeremy Dawson |
Abstract | Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this work. We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features. The proposed architecture will incorporate both spatial and temporal information jointly to effectively find the correlation between temporal information for different modalities. By using a relatively small network architecture and much smaller dataset for training, our proposed method surpasses the performance of the existing similar methods for audio-visual matching which use 3D CNNs for feature representation. We also demonstrate that an effective pair selection method can significantly increase the performance. The proposed method achieves relative improvements over 20% on the Equal Error Rate (EER) and over 7% on the Average Precision (AP) in comparison to the state-of-the-art method. |
Tasks | Speaker Verification, Speech Recognition |
Published | 2017-06-18 |
URL | http://arxiv.org/abs/1706.05739v5 |
http://arxiv.org/pdf/1706.05739v5.pdf | |
PWC | https://paperswithcode.com/paper/3d-convolutional-neural-networks-for-cross |
Repo | https://github.com/astorfi/lip-reading-deeplearning |
Framework | tf |
Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball
Title | Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball |
Authors | Scott Powers, Trevor Hastie, Robert Tibshirani |
Abstract | We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the response categories to make better predictions. We apply our method, nuclear penalized multinomial regression (NPMR), to Major League Baseball play-by-play data to predict outcome probabilities based on batter-pitcher matchups. The interpretation of the results meshes well with subject-area expertise and also suggests a novel understanding of what differentiates players. |
Tasks | |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10272v1 |
http://arxiv.org/pdf/1706.10272v1.pdf | |
PWC | https://paperswithcode.com/paper/nuclear-penalized-multinomial-regression-with |
Repo | https://github.com/saberpowers/npmr |
Framework | none |
Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification
Title | Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification |
Authors | Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov |
Abstract | We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches. |
Tasks | Text Classification |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06511v3 |
http://arxiv.org/pdf/1701.06511v3.pdf | |
PWC | https://paperswithcode.com/paper/aggressive-sampling-for-multi-class-to-binary |
Repo | https://github.com/bikash617/Aggressive-Sampling-for-Multi-class-to-BinaryReduction |
Framework | none |
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Title | XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings |
Authors | Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy |
Abstract | Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN (“Cross-GAN”), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we collected for this purpose is publicly available at google.github.io/cartoonset/ as a new benchmark for semantic style transfer. |
Tasks | Domain Adaptation, Image-to-Image Translation, Style Transfer, Unsupervised Image-To-Image Translation |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.05139v6 |
http://arxiv.org/pdf/1711.05139v6.pdf | |
PWC | https://paperswithcode.com/paper/xgan-unsupervised-image-to-image-translation |
Repo | https://github.com/CS2470FinalProject/X-GAN |
Framework | tf |
Dimensionality Reduction using Similarity-induced Embeddings
Title | Dimensionality Reduction using Similarity-induced Embeddings |
Authors | Nikolaos Passalis, Anastasios Tefas |
Abstract | The vast majority of Dimensionality Reduction (DR) techniques rely on second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this work, a new DR framework, that can directly model the target distribution using the notion of similarity instead of distance, is introduced. The proposed framework, called Similarity Embedding Framework, can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the Similarity Embedding Framework becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised dimensionality reduction and providing out-of-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six datasets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques. |
Tasks | Dimensionality Reduction |
Published | 2017-06-18 |
URL | http://arxiv.org/abs/1706.05692v3 |
http://arxiv.org/pdf/1706.05692v3.pdf | |
PWC | https://paperswithcode.com/paper/dimensionality-reduction-using-similarity |
Repo | https://github.com/passalis/sef |
Framework | pytorch |