Paper Group AWR 34
Do Deep Generative Models Know What They Don’t Know?. Learning a SAT Solver from Single-Bit Supervision. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation. Adversarial Propagation and Zero-Shot Cross-Lingual Transfer …
Do Deep Generative Models Know What They Don’t Know?
Title | Do Deep Generative Models Know What They Don’t Know? |
Authors | Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan |
Abstract | A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density of the input features can be used to detect novel, out-of-distribution inputs. In this paper we challenge this assumption. We find that the density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image data sets: FashionMNIST vs MNIST, CelebA vs SVHN, ImageNet vs CIFAR-10 / CIFAR-100 / SVHN. To investigate this curious behavior, we focus analysis on flow-based generative models in particular since they are trained and evaluated via the exact marginal likelihood. We find such behavior persists even when we restrict the flows to constant-volume transformations. These transformations admit some theoretical analysis, and we show that the difference in likelihoods can be explained by the location and variances of the data and the model curvature. Our results caution against using the density estimates from deep generative models to identify inputs similar to the training distribution until their behavior for out-of-distribution inputs is better understood. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09136v3 |
http://arxiv.org/pdf/1810.09136v3.pdf | |
PWC | https://paperswithcode.com/paper/do-deep-generative-models-know-what-they-dont |
Repo | https://github.com/y0ast/Glow-PyTorch |
Framework | pytorch |
Learning a SAT Solver from Single-Bit Supervision
Title | Learning a SAT Solver from Single-Bit Supervision |
Authors | Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill |
Abstract | We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability. Although it is not competitive with state-of-the-art SAT solvers, NeuroSAT can solve problems that are substantially larger and more difficult than it ever saw during training by simply running for more iterations. Moreover, NeuroSAT generalizes to novel distributions; after training only on random SAT problems, at test time it can solve SAT problems encoding graph coloring, clique detection, dominating set, and vertex cover problems, all on a range of distributions over small random graphs. |
Tasks | |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03685v4 |
http://arxiv.org/pdf/1802.03685v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-sat-solver-from-single-bit |
Repo | https://github.com/dselsam/neurosat |
Framework | none |
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising
Title | RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising |
Authors | David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, Alexandros Karatzoglou |
Abstract | Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrepancy between improvements in offline metrics for supervised learning and the online performance of the newly proposed models. One possible reason is that we are using the wrong paradigm: when looking at the long-term cycle of collecting historical performance data, creating a new version of the recommendation model, A/B testing it and then rolling it out. We see that there a lot of commonalities with the reinforcement learning (RL) setup, where the agent observes the environment and acts upon it in order to change its state towards better states (states with higher rewards). To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of recommendation systems research, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics. |
Tasks | Product Recommendation, Recommendation Systems |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00720v2 |
http://arxiv.org/pdf/1808.00720v2.pdf | |
PWC | https://paperswithcode.com/paper/recogym-a-reinforcement-learning-environment |
Repo | https://github.com/criteo-research/reco-gym |
Framework | pytorch |
Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
Title | Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation |
Authors | Helge Rhodin, Mathieu Salzmann, Pascal Fua |
Abstract | Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed. In this paper, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without annotations. To this end, we use an encoder-decoder that predicts an image from one viewpoint given an image from another viewpoint. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over other semi-supervised methods while using as little as 1% of the labeled data. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.01110v1 |
http://arxiv.org/pdf/1804.01110v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-geometry-aware-representation |
Repo | https://github.com/hrhodin/UnsupervisedGeometryAwareRepresentationLearning |
Framework | pytorch |
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Title | Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization |
Authors | Edoardo Maria Ponti, Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen |
Abstract | Semantic specialization is the process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with an adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data. |
Tasks | Cross-Lingual Transfer, Lexical Simplification |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04163v1 |
http://arxiv.org/pdf/1809.04163v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-propagation-and-zero-shot-cross |
Repo | https://github.com/cambridgeltl/adversarial-postspec |
Framework | pytorch |
Automated Evaluation of Out-of-Context Errors
Title | Automated Evaluation of Out-of-Context Errors |
Authors | Patrick Huber, Jan Niehues, Alex Waibel |
Abstract | We present a new approach to evaluate computational models for the task of text understanding by the means of out-of-context error detection. Through the novel design of our automated modification process, existing large-scale data sources can be adopted for a vast number of text understanding tasks. The data is thereby altered on a semantic level, allowing models to be tested against a challenging set of modified text passages that require to comprise a broader narrative discourse. Our newly introduced task targets actual real-world problems of transcription and translation systems by inserting authentic out-of-context errors. The automated modification process is applied to the 2016 TEDTalk corpus. Entirely automating the process allows the adoption of complete datasets at low cost, facilitating supervised learning procedures and deeper networks to be trained and tested. To evaluate the quality of the modification algorithm a language model and a supervised binary classification model are trained and tested on the altered dataset. A human baseline evaluation is examined to compare the results with human performance. The outcome of the evaluation task indicates the difficulty to detect semantic errors for machine-learning algorithms and humans, showing that the errors cannot be identified when limited to a single sentence. |
Tasks | Language Modelling |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08983v1 |
http://arxiv.org/pdf/1803.08983v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-evaluation-of-out-of-context-errors |
Repo | https://github.com/isl-mt/SemanticWordReplacement-LREC2018 |
Framework | none |
Mapper Comparison with Wasserstein Metrics
Title | Mapper Comparison with Wasserstein Metrics |
Authors | Michael McCabe |
Abstract | The challenge of describing model drift is an open question in unsupervised learning. It can be difficult to evaluate at what point an unsupervised model has deviated beyond what would be expected from a different sample from the same population. This is particularly true for models without a probabilistic interpretation. One such family of techniques, Topological Data Analysis, and the Mapper algorithm in particular, has found use in a variety of fields, but describing model drift for Mapper graphs is an understudied area as even existing techniques for measuring distances between related constructs like graphs or simplicial complexes fail to account for the fact that Mapper graphs represent a combination of topological, metric, and density information. In this paper, we develop an optimal transport based metric which we call the Network Augmented Wasserstein Distance for evaluating distances between Mapper graphs and demonstrate the value of the metric for model drift analysis by using the metric to transform the model drift problem into an anomaly detection problem over dynamic graphs. |
Tasks | Anomaly Detection, Topological Data Analysis |
Published | 2018-12-15 |
URL | http://arxiv.org/abs/1812.06232v1 |
http://arxiv.org/pdf/1812.06232v1.pdf | |
PWC | https://paperswithcode.com/paper/mapper-comparison-with-wasserstein-metrics |
Repo | https://github.com/mikemccabe210/mapper_comparison |
Framework | none |
Named Entity Disambiguation using Deep Learning on Graphs
Title | Named Entity Disambiguation using Deep Learning on Graphs |
Authors | Alberto Cetoli, Mohammad Akbari, Stefano Bragaglia, Andrew D. O’Harney, Marc Sloan |
Abstract | We tackle \ac{NED} by comparing entities in short sentences with \wikidata{} graphs. Creating a context vector from graphs through deep learning is a challenging problem that has never been applied to \ac{NED}. Our main contribution is to present an experimental study of recent neural techniques, as well as a discussion about which graph features are most important for the disambiguation task. In addition, a new dataset (\wikidatadisamb{}) is created to allow a clean and scalable evaluation of \ac{NED} with \wikidata{} entries, and to be used as a reference in future research. In the end our results show that a \ac{Bi-LSTM} encoding of the graph triplets performs best, improving upon the baseline models and scoring an \rm{F1} value of $91.6%$ on the \wikidatadisamb{} test set |
Tasks | Entity Disambiguation |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09164v1 |
http://arxiv.org/pdf/1810.09164v1.pdf | |
PWC | https://paperswithcode.com/paper/named-entity-disambiguation-using-deep |
Repo | https://github.com/ContextScout/ned-graphs |
Framework | none |
Learning to Write with Cooperative Discriminators
Title | Learning to Write with Cooperative Discriminators |
Authors | Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, Yejin Choi |
Abstract | Recurrent Neural Networks (RNNs) are powerful autoregressive sequence models, but when used to generate natural language their output tends to be overly generic, repetitive, and self-contradictory. We postulate that the objective function optimized by RNN language models, which amounts to the overall perplexity of a text, is not expressive enough to capture the notion of communicative goals described by linguistic principles such as Grice’s Maxims. We propose learning a mixture of multiple discriminative models that can be used to complement the RNN generator and guide the decoding process. Human evaluation demonstrates that text generated by our system is preferred over that of baselines by a large margin and significantly enhances the overall coherence, style, and information content of the generated text. |
Tasks | |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06087v1 |
http://arxiv.org/pdf/1805.06087v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-write-with-cooperative |
Repo | https://github.com/ari-holtzman/l2w |
Framework | pytorch |
Segmenting root systems in X-ray computed tomography images using level sets
Title | Segmenting root systems in X-ray computed tomography images using level sets |
Authors | Amy Tabb, Keith E. Duncan, Christopher N. Topp |
Abstract | The segmentation of plant roots from soil and other growing media in X-ray computed tomography images is needed to effectively study the root system architecture without excavation. However, segmentation is a challenging problem in this context because the root and non-root regions share similar features. In this paper, we describe a method based on level sets and specifically adapted for this segmentation problem. In particular, we deal with the issues of using a level sets approach on large image volumes for root segmentation, and track active regions of the front using an occupancy grid. This method allows for straightforward modifications to a narrow-band algorithm such that excessive forward and backward movements of the front can be avoided, distance map computations in a narrow band context can be done in linear time through modification of Meijster et al.‘s distance transform algorithm, and regions of the image volume are iteratively used to estimate distributions for root versus non-root classes. Results are shown of three plant species of different maturity levels, grown in three different media. Our method compares favorably to a state-of-the-art method for root segmentation in X-ray CT image volumes. |
Tasks | |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06398v1 |
http://arxiv.org/pdf/1809.06398v1.pdf | |
PWC | https://paperswithcode.com/paper/segmenting-root-systems-in-x-ray-computed |
Repo | https://github.com/amy-tabb/tabb-level-set-segmentation |
Framework | none |
EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals
Title | EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals |
Authors | Kay Gregor Hartmann, Robin Tibor Schirrmeister, Tonio Ball |
Abstract | Generative adversarial networks (GANs) are recently highly successful in generative applications involving images and start being applied to time series data. Here we describe EEG-GAN as a framework to generate electroencephalographic (EEG) brain signals. We introduce a modification to the improved training of Wasserstein GANs to stabilize training and investigate a range of architectural choices critical for time series generation (most notably up- and down-sampling). For evaluation we consider and compare different metrics such as Inception score, Frechet inception distance and sliced Wasserstein distance, together showing that our EEG-GAN framework generated naturalistic EEG examples. It thus opens up a range of new generative application scenarios in the neuroscientific and neurological context, such as data augmentation in brain-computer interfacing tasks, EEG super-sampling, or restoration of corrupted data segments. The possibility to generate signals of a certain class and/or with specific properties may also open a new avenue for research into the underlying structure of brain signals. |
Tasks | Data Augmentation, EEG, Time Series |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01875v1 |
http://arxiv.org/pdf/1806.01875v1.pdf | |
PWC | https://paperswithcode.com/paper/eeg-gan-generative-adversarial-networks-for |
Repo | https://github.com/MichaelMurashov/ecg-testing |
Framework | tf |
A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases
Title | A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases |
Authors | C. -H. Huck Yang, Jia-Hong Huang, Fangyu Liu, Fang-Yi Chiu, Mengya Gao, Weifeng Lyu, I-Hung Lin M. D., Jesper Tegner |
Abstract | Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists. We propose a novel visual-assisted diagnosis hybrid model based on the support vector machine (SVM) and deep neural networks (DNNs). The model incorporates complementary strengths of DNNs and SVM. Furthermore, we present a new clinical retina label collection for ophthalmology incorporating 32 retina diseases classes. Using EyeNet, our model achieves 89.73% diagnosis accuracy and the model performance is comparable to the professional ophthalmologists. |
Tasks | |
Published | 2018-06-17 |
URL | http://arxiv.org/abs/1806.06423v1 |
http://arxiv.org/pdf/1806.06423v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-hybrid-machine-learning-model-for |
Repo | https://github.com/huckiyang/EyeNet |
Framework | none |
Conditional Prior Networks for Optical Flow
Title | Conditional Prior Networks for Optical Flow |
Authors | Yanchao Yang, Stefano Soatto |
Abstract | Classical computation of optical flow involves generic priors (regularizers) that capture rudimentary statistics of images, but not long-range correlations or semantics. On the other hand, fully supervised methods learn the regularity in the annotated data, without explicit regularization and with the risk of overfitting. We seek to learn richer priors on the set of possible flows that are statistically compatible with an image. Once the prior is learned in a supervised fashion, one can easily learn the full map to infer optical flow directly from two or more images, without any need for (additional) supervision. We introduce a novel architecture, called Conditional Prior Network (CPN), and show how to train it to yield a conditional prior. When used in conjunction with a simple optical flow architecture, the CPN beats all variational methods and all unsupervised learning-based ones using the same data term. It performs comparably to fully supervised ones, that however are fine-tuned to a particular dataset. Our method, on the other hand, performs well even when transferred between datasets. |
Tasks | Optical Flow Estimation |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10378v1 |
http://arxiv.org/pdf/1807.10378v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-prior-networks-for-optical-flow |
Repo | https://github.com/YanchaoYang/Conditional-Prior-Networks |
Framework | tf |
Solving the Exponential Growth of Symbolic Regression Trees in Geometric Semantic Genetic Programming
Title | Solving the Exponential Growth of Symbolic Regression Trees in Geometric Semantic Genetic Programming |
Authors | Joao Francisco B. S. Martins, Luiz Otavio V. B. Oliveira, Luis F. Miranda, Felipe Casadei, Gisele L. Pappa |
Abstract | Advances in Geometric Semantic Genetic Programming (GSGP) have shown that this variant of Genetic Programming (GP) reaches better results than its predecessor for supervised machine learning problems, particularly in the task of symbolic regression. However, by construction, the geometric semantic crossover operator generates individuals that grow exponentially with the number of generations, resulting in solutions with limited use. This paper presents a new method for individual simplification named GSGP with Reduced trees (GSGP-Red). GSGP-Red works by expanding the functions generated by the geometric semantic operators. The resulting expanded function is guaranteed to be a linear combination that, in a second step, has its repeated structures and respective coefficients aggregated. Experiments in 12 real-world datasets show that it is not only possible to create smaller and completely equivalent individuals in competitive computational time, but also to reduce the number of nodes composing them by 58 orders of magnitude, on average. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06808v1 |
http://arxiv.org/pdf/1804.06808v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-the-exponential-growth-of-symbolic |
Repo | https://github.com/laic-ufmg/GSGP-Red |
Framework | none |
DataBright: Towards a Global Exchange for Decentralized Data Ownership and Trusted Computation
Title | DataBright: Towards a Global Exchange for Decentralized Data Ownership and Trusted Computation |
Authors | David Dao, Dan Alistarh, Claudiu Musat, Ce Zhang |
Abstract | It is safe to assume that, for the foreseeable future, machine learning, especially deep learning will remain both data- and computation-hungry. In this paper, we ask: Can we build a global exchange where everyone can contribute computation and data to train the next generation of machine learning applications? We present an early, but running prototype of DataBright, a system that turns the creation of training examples and the sharing of computation into an investment mechanism. Unlike most crowdsourcing platforms, where the contributor gets paid when they submit their data, DataBright pays dividends whenever a contributor’s data or hardware is used by someone to train a machine learning model. The contributor becomes a shareholder in the dataset they created. To enable the measurement of usage, a computation platform that contributors can trust is also necessary. DataBright thus merges both a data market and a trusted computation market. We illustrate that trusted computation can enable the creation of an AI market, where each data point has an exact value that should be paid to its creator. DataBright allows data creators to retain ownership of their contribution and attaches to it a measurable value. The value of the data is given by its utility in subsequent distributed computation done on the DataBright computation market. The computation market allocates tasks and subsequent payments to pooled hardware. This leads to the creation of a decentralized AI cloud. Our experiments show that trusted hardware such as Intel SGX can be added to the usual ML pipeline with no additional costs. We use this setting to orchestrate distributed computation that enables the creation of a computation market. DataBright is available for download at https://github.com/ds3lab/databright. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04780v1 |
http://arxiv.org/pdf/1802.04780v1.pdf | |
PWC | https://paperswithcode.com/paper/databright-towards-a-global-exchange-for |
Repo | https://github.com/ds3lab/databright |
Framework | none |