Paper Group ANR 966
Neural Argument Generation Augmented with Externally Retrieved Evidence. Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue Systems. Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering. Differential Analysis of Directed Networks. Generating Photo-Realistic Training …
Neural Argument Generation Augmented with Externally Retrieved Evidence
Title | Neural Argument Generation Augmented with Externally Retrieved Evidence |
Authors | Xinyu Hua, Lu Wang |
Abstract | High quality arguments are essential elements for human reasoning and decision-making processes. However, effective argument construction is a challenging task for both human and machines. In this work, we study a novel task on automatically generating arguments of a different stance for a given statement. We propose an encoder-decoder style neural network-based argument generation model enriched with externally retrieved evidence from Wikipedia. Our model first generates a set of talking point phrases as intermediate representation, followed by a separate decoder producing the final argument based on both input and the keyphrases. Experiments on a large-scale dataset collected from Reddit show that our model constructs arguments with more topic-relevant content than a popular sequence-to-sequence generation model according to both automatic evaluation and human assessments. |
Tasks | Decision Making |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10254v1 |
http://arxiv.org/pdf/1805.10254v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-argument-generation-augmented-with |
Repo | |
Framework | |
Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue Systems
Title | Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue Systems |
Authors | Lina M. Rojas-Barahona, Stefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Milica Gasic, Bo-Hsiang Tseng, Steve Young |
Abstract | This paper presents two ways of dealing with scarce data in semantic decoding using N-Best speech recognition hypotheses. First, we learn features by using a deep learning architecture in which the weights for the unknown and known categories are jointly optimised. Second, an unsupervised method is used for further tuning the weights. Sharing weights injects prior knowledge to unknown categories. The unsupervised tuning (i.e. the risk minimisation) improves the F-Measure when recognising nearly zero-shot data on the DSTC3 corpus. This unsupervised method can be applied subject to two assumptions: the rank of the class marginal is assumed to be known and the class-conditional scores of the classifier are assumed to follow a Gaussian distribution. |
Tasks | Speech Recognition, Spoken Dialogue Systems, Zero-Shot Learning |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05484v2 |
http://arxiv.org/pdf/1806.05484v2.pdf | |
PWC | https://paperswithcode.com/paper/nearly-zero-shot-learning-for-semantic |
Repo | |
Framework | |
Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering
Title | Hindsight is Only 50/50: Unsuitability of MDP based Approximate POMDP Solvers for Multi-resolution Information Gathering |
Authors | Sankalp Arora, Sanjiban Choudhury, Sebastian Scherer |
Abstract | Partially Observable Markov Decision Processes (POMDPs) offer an elegant framework to model sequential decision making in uncertain environments. Solving POMDPs online is an active area of research and given the size of real-world problems approximate solvers are used. Recently, a few approaches have been suggested for solving POMDPs by using MDP solvers in conjunction with imitation learning. MDP based POMDP solvers work well for some cases, while catastrophically failing for others. The main failure point of such solvers is the lack of motivation for MDP solvers to gain information, since under their assumption the environment is either already known as much as it can be or the uncertainty will disappear after the next step. However for solving POMDP problems gaining information can lead to efficient solutions. In this paper we derive a set of conditions where MDP based POMDP solvers are provably sub-optimal. We then use the well-known tiger problem to demonstrate such sub-optimality. We show that multi-resolution, budgeted information gathering cannot be addressed using MDP based POMDP solvers. The contribution of the paper helps identify the properties of a POMDP problem for which the use of MDP based POMDP solvers is inappropriate, enabling better design choices. |
Tasks | Decision Making, Imitation Learning |
Published | 2018-04-07 |
URL | http://arxiv.org/abs/1804.02573v1 |
http://arxiv.org/pdf/1804.02573v1.pdf | |
PWC | https://paperswithcode.com/paper/hindsight-is-only-5050-unsuitability-of-mdp |
Repo | |
Framework | |
Differential Analysis of Directed Networks
Title | Differential Analysis of Directed Networks |
Authors | Min Ren, Dabao Zhang |
Abstract | We developed a novel statistical method to identify structural differences between networks characterized by structural equation models. We propose to reparameterize the model to separate the differential structures from common structures, and then design an algorithm with calibration and construction stages to identify these differential structures. The calibration stage serves to obtain consistent prediction by building the L2 regularized regression of each endogenous variables against pre-screened exogenous variables, correcting for potential endogeneity issue. The construction stage consistently selects and estimates both common and differential effects by undertaking L1 regularized regression of each endogenous variable against the predicts of other endogenous variables as well as its anchoring exogenous variables. Our method allows easy parallel computation at each stage. Theoretical results are obtained to establish nonasymptotic error bounds of predictions and estimates at both stages, as well as the consistency of identified common and differential effects. Our studies on synthetic data demonstrated that our proposed method performed much better than independently constructing the networks. A real data set is analyzed to illustrate the applicability of our method. |
Tasks | Calibration |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10173v3 |
http://arxiv.org/pdf/1807.10173v3.pdf | |
PWC | https://paperswithcode.com/paper/differential-analysis-of-directed-networks |
Repo | |
Framework | |
Generating Photo-Realistic Training Data to Improve Face Recognition Accuracy
Title | Generating Photo-Realistic Training Data to Improve Face Recognition Accuracy |
Authors | Daniel Sáez Trigueros, Li Meng, Margaret Hartnett |
Abstract | In this paper we investigate the feasibility of using synthetic data to augment face datasets. In particular, we propose a novel generative adversarial network (GAN) that can disentangle identity-related attributes from non-identity-related attributes. This is done by training an embedding network that maps discrete identity labels to an identity latent space that follows a simple prior distribution, and training a GAN conditioned on samples from that distribution. Our proposed GAN allows us to augment face datasets by generating both synthetic images of subjects in the training set and synthetic images of new subjects not in the training set. By using recent advances in GAN training, we show that the synthetic images generated by our model are photo-realistic, and that training with augmented datasets can indeed increase the accuracy of face recognition models as compared with models trained with real images alone. |
Tasks | Face Recognition |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1811.00112v1 |
http://arxiv.org/pdf/1811.00112v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-photo-realistic-training-data-to |
Repo | |
Framework | |
Primal Meaning Recommendation via On-line Encyclopedia
Title | Primal Meaning Recommendation via On-line Encyclopedia |
Authors | Zhiyuan Zhang, Wei Li, Jingjing Xu, Xu Sun |
Abstract | Polysemy is a very common phenomenon in modern languages. Under many circumstances, there exists a primal meaning for the expression. We define the primal meaning of an expression to be a frequently used sense of that expression from which its other frequent senses can be deduced. Many of the new appearing meanings of the expressions are either originated from a primal meaning, or are merely literal references to the original expression, e.g., apple (fruit), Apple (Inc), and Apple (movie). When constructing a knowledge base from on-line encyclopedia data, it would be more efficient to be aware of the information about the importance of the senses. In this paper, we would like to explore a way to automatically recommend the primal meaning of an expression based on the textual descriptions of the multiple senses of an expression from on-line encyclopedia websites. We propose a hybrid model that captures both the pattern of the description and the relationship between different descriptions with both weakly supervised and unsupervised models. The experiment results show that our method yields a good result with a P@1 (precision) score of 83.3 per cent, and a MAP (mean average precision) of 90.5 per cent, surpassing the UMFS-WE baseline by a big margin (P@1 is 61.1 per cent and MAP is 76.3 per cent). |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04660v2 |
http://arxiv.org/pdf/1808.04660v2.pdf | |
PWC | https://paperswithcode.com/paper/primal-meaning-recommendation-for-chinese |
Repo | |
Framework | |
A sequential guiding network with attention for image captioning
Title | A sequential guiding network with attention for image captioning |
Authors | Daouda Sow, Zengchang Qin, Mouhamed Niasse, Tao Wan |
Abstract | The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models. |
Tasks | Image Captioning |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00228v3 |
http://arxiv.org/pdf/1811.00228v3.pdf | |
PWC | https://paperswithcode.com/paper/a-sequential-guiding-network-with-attention |
Repo | |
Framework | |
Automatic Inference of Cross-modal Connection Topologies for X-CNNs
Title | Automatic Inference of Cross-modal Connection Topologies for X-CNNs |
Authors | Laurynas Karazija, Petar Veličković, Pietro Liò |
Abstract | This paper introduces a way to learn cross-modal convolutional neural network (X-CNN) architectures from a base convolutional network (CNN) and the training data to reduce the design cost and enable applying cross-modal networks in sparse data environments. Two approaches for building X-CNNs are presented. The base approach learns the topology in a data-driven manner, by using measurements performed on the base CNN and supplied data. The iterative approach performs further optimisation of the topology through a combined learning procedure, simultaneously learning the topology and training the network. The approaches were evaluated agains examples of hand-designed X-CNNs and their base variants, showing superior performance and, in some cases, gaining an additional 9% of accuracy. From further considerations, we conclude that the presented methodology takes less time than any manual approach would, whilst also significantly reducing the design complexity. The application of the methods is fully automated and implemented in Xsertion library. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00987v1 |
http://arxiv.org/pdf/1805.00987v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-inference-of-cross-modal-connection |
Repo | |
Framework | |
Too Fast Causal Inference under Causal Insufficiency
Title | Too Fast Causal Inference under Causal Insufficiency |
Authors | Mieczysław A. Kłopotek |
Abstract | Causally insufficient structures (models with latent or hidden variables, or with confounding etc.) of joint probability distributions have been subject of intense study not only in statistics, but also in various AI systems. In AI, belief networks, being representations of joint probability distribution with an underlying directed acyclic graph structure, are paid special attention due to the fact that efficient reasoning (uncertainty propagation) methods have been developed for belief network structures. Algorithms have been therefore developed to acquire the belief network structure from data. As artifacts due to variable hiding negatively influence the performance of derived belief networks, models with latent variables have been studied and several algorithms for learning belief network structure under causal insufficiency have also been developed. Regrettably, some of them are known already to be erroneous (e.g. IC algorithm of [Pearl:Verma:91]. This paper is devoted to another algorithm, the Fast Causal Inference (FCI) Algorithm of [Spirtes:93]. It is proven by a specially constructed example that this algorithm, as it stands in [Spirtes:93], is also erroneous. Fundamental reason for failure of this algorithm is the temporary introduction of non-real links between nodes of the network with the intention of later removal. While for trivial dependency structures these non-real links may be actually removed, this may not be the case for complex ones, e.g. for the case described in this paper. A remedy of this failure is proposed. |
Tasks | Causal Inference |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1806.00352v1 |
http://arxiv.org/pdf/1806.00352v1.pdf | |
PWC | https://paperswithcode.com/paper/too-fast-causal-inference-under-causal |
Repo | |
Framework | |
DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training
Title | DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training |
Authors | Nathan Kallus |
Abstract | We study optimal covariate balance for causal inferences from observational data when rich covariates and complex relationships necessitate flexible modeling with neural networks. Standard approaches such as propensity weighting and matching/balancing fail in such settings due to miscalibrated propensity nets and inappropriate covariate representations, respectively. We propose a new method based on adversarial training of a weighting and a discriminator network that effectively addresses this methodological gap. This is demonstrated through new theoretical characterizations of the method as well as empirical results using both fully connected architectures to learn complex relationships and convolutional architectures to handle image confounders, showing how this new method can enable strong causal analyses in these challenging settings. |
Tasks | Causal Inference |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05664v1 |
http://arxiv.org/pdf/1802.05664v1.pdf | |
PWC | https://paperswithcode.com/paper/deepmatch-balancing-deep-covariate |
Repo | |
Framework | |
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
Title | Human-level performance in first-person multiplayer games with population-based deep reinforcement learning |
Authors | Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel |
Abstract | Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display human-like behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the win-rate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence. |
Tasks | |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01281v1 |
http://arxiv.org/pdf/1807.01281v1.pdf | |
PWC | https://paperswithcode.com/paper/human-level-performance-in-first-person |
Repo | |
Framework | |
Expanding the Reach of Federated Learning by Reducing Client Resource Requirements
Title | Expanding the Reach of Federated Learning by Reducing Client Resource Requirements |
Authors | Sebastian Caldas, Jakub Konečny, H. Brendan McMahan, Ameet Talwalkar |
Abstract | Communication on heterogeneous edge networks is a fundamental bottleneck in Federated Learning (FL), restricting both model capacity and user participation. To address this issue, we introduce two novel strategies to reduce communication costs: (1) the use of lossy compression on the global model sent server-to-client; and (2) Federated Dropout, which allows users to efficiently train locally on smaller subsets of the global model and also provides a reduction in both client-to-server communication and local computation. We empirically show that these strategies, combined with existing compression approaches for client-to-server communication, collectively provide up to a $14\times$ reduction in server-to-client communication, a $1.7\times$ reduction in local computation, and a $28\times$ reduction in upload communication, all without degrading the quality of the final model. We thus comprehensively reduce FL’s impact on client device resources, allowing higher capacity models to be trained, and a more diverse set of users to be reached. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07210v2 |
http://arxiv.org/pdf/1812.07210v2.pdf | |
PWC | https://paperswithcode.com/paper/expanding-the-reach-of-federated-learning-by |
Repo | |
Framework | |
Properties of interaction networks, structure coefficients, and benefit-to-cost ratios
Title | Properties of interaction networks, structure coefficients, and benefit-to-cost ratios |
Authors | Hendrik Richter |
Abstract | In structured populations the spatial arrangement of cooperators and defectors on the interaction graph together with the structure of the graph itself determines the game dynamics and particularly whether or not fixation of cooperation (or defection) is favored. For a single cooperator (and a single defector) and a network described by a regular graph the question of fixation can be addressed by a single parameter, the structure coefficient. As this quantity is generic for any regular graph, we may call it the generic structure coefficient. For two and more cooperators (or several defectors) fixation properties can also be assigned by structure coefficients. These structure coefficients, however, depend on the arrangement of cooperators and defectors which we may interpret as a configuration of the game. Moreover, the coefficients are specific for a given interaction network modeled as regular graph, which is why we may call them specific structure coefficients. In this paper, we study how specific structure coefficients vary over interaction graphs and link the distributions obtained over different graphs to spectral properties of interaction networks. We also discuss implications for the benefit-to-cost ratios of donation games. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11359v2 |
http://arxiv.org/pdf/1805.11359v2.pdf | |
PWC | https://paperswithcode.com/paper/properties-of-interaction-networks-structure |
Repo | |
Framework | |
Neural Named Entity Recognition from Subword Units
Title | Neural Named Entity Recognition from Subword Units |
Authors | Abdalghani Abujabal, Judith Gaspers |
Abstract | Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models are not able to learn morphological or phonological representations. To remedy the above shortcomings, we adopt a neural solution based on bidirectional LSTMs and conditional random fields, where we rely on subword units, namely characters, phonemes, and bytes. For each word in an utterance, our model learns a representation from each of the subword units. We conducted experiments in a real-world large-scale setting for the use case of a voice-controlled device covering four languages with up to 5.5M utterances per language. Our experiments show that (1) with increasing training data, performance of models trained solely on subword units becomes closer to that of models with dedicated word-level embeddings (91.35 vs 93.92 F1 for English), while using a much smaller vocabulary size (332 vs 74K), (2) subword units enhance models with dedicated word-level embeddings, and (3) combining different subword units improves performance. |
Tasks | Named Entity Recognition, Spoken Language Understanding |
Published | 2018-08-22 |
URL | https://arxiv.org/abs/1808.07364v3 |
https://arxiv.org/pdf/1808.07364v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-named-entity-recognition-from-subword |
Repo | |
Framework | |
Energy Efficient Hadamard Neural Networks
Title | Energy Efficient Hadamard Neural Networks |
Authors | T. Ceren Deveci, Serdar Cakir, A. Enis Cetin |
Abstract | Deep learning has made significant improvements at many image processing tasks in recent years, such as image classification, object recognition and object detection. Convolutional neural networks (CNN), which is a popular deep learning architecture designed to process data in multiple array form, show great success to almost all detection & recognition problems and computer vision tasks. However, the number of parameters in a CNN is too high such that the computers require more energy and larger memory size. In order to solve this problem, we propose a novel energy efficient model Binary Weight and Hadamard-transformed Image Network (BWHIN), which is a combination of Binary Weight Network (BWN) and Hadamard-transformed Image Network (HIN). It is observed that energy efficiency is achieved with a slight sacrifice at classification accuracy. Among all energy efficient networks, our novel ensemble model outperforms other energy efficient models. |
Tasks | Image Classification, Object Detection, Object Recognition |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05421v1 |
http://arxiv.org/pdf/1805.05421v1.pdf | |
PWC | https://paperswithcode.com/paper/energy-efficient-hadamard-neural-networks |
Repo | |
Framework | |