Paper Group ANR 644
Maximum Likelihood Estimation for Learning Populations of Parameters. UDS–DFKI Submission to the WMT2019 Similar Language Translation Shared Task. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation. Learning Key-Value Store Design. Online E …
Maximum Likelihood Estimation for Learning Populations of Parameters
Title | Maximum Likelihood Estimation for Learning Populations of Parameters |
Authors | Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade |
Abstract | Consider a setting with $N$ independent individuals, each with an unknown parameter, $p_i \in [0, 1]$ drawn from some unknown distribution $P^\star$. After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i \sim \text{Binomial}(t, p_i)$ per individual, our objective is to accurately estimate $P^\star$. This problem arises in numerous domains, including the social sciences, psychology, health-care, and biology, where the size of the population under study is usually large while the number of observations per individual is often limited. Our main result shows that, in the regime where $t \ll N$, the maximum likelihood estimator (MLE) is both statistically minimax optimal and efficiently computable. Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover’s distance (between the estimated and true distributions). More generally, in an exponentially large interval of $t$ beyond $c \log{N}$, the MLE achieves the minimax error bound of $\mathcal{O}(\frac{1}{\sqrt{t\log N}})$. In contrast, regardless of how large $N$ is, the naive “plug-in” estimator for this problem only achieves the sub-optimal error of $\Theta(\frac{1}{\sqrt{t}})$. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04553v1 |
http://arxiv.org/pdf/1902.04553v1.pdf | |
PWC | https://paperswithcode.com/paper/maximum-likelihood-estimation-for-learning |
Repo | |
Framework | |
UDS–DFKI Submission to the WMT2019 Similar Language Translation Shared Task
Title | UDS–DFKI Submission to the WMT2019 Similar Language Translation Shared Task |
Authors | Santanu Pal, Marcos Zampieri, Josef van Genabith |
Abstract | In this paper we present the UDS-DFKI system submitted to the Similar Language Translation shared task at WMT 2019. The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish. Participants could choose to participate in any of these three tracks and submit system outputs in any translation direction. We report the results obtained by our system in translating from Czech to Polish and comment on the impact of out-of-domain test data in the performance of our system. UDS-DFKI achieved competitive performance ranking second among ten teams in Czech to Polish translation. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.06138v1 |
https://arxiv.org/pdf/1908.06138v1.pdf | |
PWC | https://paperswithcode.com/paper/uds-dfki-submission-to-the-wmt2019-similar |
Repo | |
Framework | |
Graph Matching Networks for Learning the Similarity of Graph Structured Objects
Title | Graph Matching Networks for Learning the Similarity of Graph Structured Objects |
Authors | Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, Pushmeet Kohli |
Abstract | This paper addresses the challenging problem of retrieval and matching of graph structured objects, and makes two key contributions. First, we demonstrate how Graph Neural Networks (GNN), which have emerged as an effective model for various supervised prediction problems defined on structured data, can be trained to produce embedding of graphs in vector spaces that enables efficient similarity reasoning. Second, we propose a novel Graph Matching Network model that, given a pair of graphs as input, computes a similarity score between them by jointly reasoning on the pair through a new cross-graph attention-based matching mechanism. We demonstrate the effectiveness of our models on different domains including the challenging problem of control-flow-graph based function similarity search that plays an important role in the detection of vulnerabilities in software systems. The experimental analysis demonstrates that our models are not only able to exploit structure in the context of similarity learning but they can also outperform domain-specific baseline systems that have been carefully hand-engineered for these problems. |
Tasks | Graph Matching |
Published | 2019-04-29 |
URL | https://arxiv.org/abs/1904.12787v2 |
https://arxiv.org/pdf/1904.12787v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-matching-networks-for-learning-the-1 |
Repo | |
Framework | |
Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation
Title | Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation |
Authors | Arthur Bražinskas, Mirella Lapata, Ivan Titov |
Abstract | Summarization of opinions is the process of automatically creating text summaries that reflect subjective information expressed in input documents, such as product reviews. While most previous research in opinion summarization has focused on the extractive setting, i.e. selecting fragments of the input documents to produce a summary, we let the model generate novel sentences and hence produce fluent text. Supervised abstractive summarization methods typically rely on large quantities of document-summary pairs which are expensive to acquire. In contrast, we consider the unsupervised setting, in other words, we do not use any summaries in training. We define a generative model for a multi-product review collection. Intuitively, we want to design such a model that, when generating a new review given a set of other reviews of the product, we can control the `amount of novelty’ going into the new review or, equivalently, vary the degree of deviation from the input reviews. At test time, when generating summaries, we force the novelty to be minimal, and produce a text reflecting consensus opinions. We capture this intuition by defining a hierarchical variational autoencoder model. Both individual reviews and products they correspond to are associated with stochastic latent codes, and the review generator (‘decoder’) has direct access to the text of input reviews through the pointer-generator mechanism. In experiments on Amazon and Yelp data, we show that in this model by setting at test time the review’s latent code to its mean, we produce fluent and coherent summaries. | |
Tasks | Abstractive Text Summarization |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02247v1 |
https://arxiv.org/pdf/1911.02247v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-multi-document-opinion |
Repo | |
Framework | |
Learning Key-Value Store Design
Title | Learning Key-Value Store Design |
Authors | Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, Zichen Zhu |
Abstract | We introduce the concept of design continuums for the data layout of key-value stores. A design continuum unifies major distinct data structure designs under the same model. The critical insight and potential long-term impact is that such unifying models 1) render what we consider up to now as fundamentally different data structures to be seen as views of the very same overall design space, and 2) allow seeing new data structure designs with performance properties that are not feasible by existing designs. The core intuition behind the construction of design continuums is that all data structures arise from the very same set of fundamental design principles, i.e., a small set of data layout design concepts out of which we can synthesize any design that exists in the literature as well as new ones. We show how to construct, evaluate, and expand, design continuums and we also present the first continuum that unifies major data structure designs, i.e., B+tree, B-epsilon-tree, LSM-tree, and LSH-table. The practical benefit of a design continuum is that it creates a fast inference engine for the design of data structures. For example, we can predict near instantly how a specific design change in the underlying storage of a data system would affect performance, or reversely what would be the optimal data structure (from a given set of designs) given workload characteristics and a memory budget. In turn, these properties allow us to envision a new class of self-designing key-value stores with a substantially improved ability to adapt to workload and hardware changes by transitioning between drastically different data structure designs to assume a diverse set of performance properties at will. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05443v1 |
https://arxiv.org/pdf/1907.05443v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-key-value-store-design |
Repo | |
Framework | |
Online Explanation Generation for Human-Robot Teaming
Title | Online Explanation Generation for Human-Robot Teaming |
Authors | Mehrdad Zakershahrak, Ze Gong, Nikhillesh Sadassivam, Yu Zhang |
Abstract | As AI becomes an integral part of our lives, the development of explainable AI, embodied in the decision-making process of an AI or robotic agent, becomes imperative. For a robotic teammate, the ability to generate explanations to explain its behavior is one of the key requirements of explainable agency. Prior work on explanation generation focuses on supporting the rationale behind the robot’s decision (or behavior). These approaches, however, fail to consider the mental workload needed to understand the received explanation. In other words, the human teammate is expected to understand any explanation provided no matter how much information is presented. In this work, we argue that explanations, especially ones of a complex nature, should be made in an online fashion during the execution, which helps spread out the information to be explained and thus reduce the mental workload of humans in highly demanding tasks. However, a challenge here is that the different parts of an explanation may be dependent on each other, which must be taken into account when generating online explanations. To this end, a general formulation of online explanation generation is presented with three variations satisfying different properties. The new explanation generation methods are based on a model reconciliation setting introduced in our prior work. We evaluate our methods both with human subjects in a standard planning competition (IPC) domain, using NASA Task Load Index (TLX), as well as in simulation with ten different problems across two IPC domains. Results strongly suggest that our methods not only generate explanations that are perceived as less cognitively demanding and much preferred over the baselines but also are computationally efficient. |
Tasks | Decision Making |
Published | 2019-03-15 |
URL | https://arxiv.org/abs/1903.06418v5 |
https://arxiv.org/pdf/1903.06418v5.pdf | |
PWC | https://paperswithcode.com/paper/online-explanation-generation-for-human-robot |
Repo | |
Framework | |
Approaching Machine Learning Fairness through Adversarial Network
Title | Approaching Machine Learning Fairness through Adversarial Network |
Authors | Xiaoqian Wang, Heng Huang |
Abstract | Fairness is becoming a rising concern w.r.t. machine learning model performance. Especially for sensitive fields such as criminal justice and loan decision, eliminating the prediction discrimination towards a certain group of population (characterized by sensitive features like race and gender) is important for enhancing the trustworthiness of model. In this paper, we present a new general framework to improve machine learning fairness. The goal of our model is to minimize the influence of sensitive feature from the perspectives of both the data input and the predictive model. In order to achieve this goal, we reformulate the data input by removing the sensitive information and strengthen model fairness by minimizing the marginal contribution of the sensitive feature. We propose to learn the non-sensitive input via sampling among features and design an adversarial network to minimize the dependence between the reformulated input and the sensitive information. Extensive experiments on three benchmark datasets suggest that our model achieve better results than related state-of-the-art methods with respect to both fairness metrics and prediction performance. |
Tasks | |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03013v1 |
https://arxiv.org/pdf/1909.03013v1.pdf | |
PWC | https://paperswithcode.com/paper/approaching-machine-learning-fairness-through |
Repo | |
Framework | |
Representation Learning for Discovering Phonemic Tone Contours
Title | Representation Learning for Discovering Phonemic Tone Contours |
Authors | Bai Li, Jing Yi Xie, Frank Rudzicz |
Abstract | Tone is a prosodic feature used to distinguish words in many languages, some of which are endangered and scarcely documented. In this work, we use unsupervised representation learning to identify probable clusters of syllables that share the same phonemic tone. Our method extracts the pitch for each syllable, then trains a convolutional autoencoder to learn a low dimensional representation for each contour. We then apply the mean shift algorithm to cluster tones in high-density regions of the latent space. Furthermore, by feeding the centers of each cluster into the decoder, we produce a prototypical contour that represents each cluster. We apply this method to spoken multi-syllable words in Mandarin Chinese and Cantonese and evaluate how closely our clusters match the ground truth tone categories. Finally, we discuss some difficulties with our approach, including contextual tone variation and allophony effects. |
Tasks | Representation Learning, Unsupervised Representation Learning |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08987v1 |
https://arxiv.org/pdf/1910.08987v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-for-discovering |
Repo | |
Framework | |
Unsupervised Representation for EHR Signals and Codes as Patient Status Vector
Title | Unsupervised Representation for EHR Signals and Codes as Patient Status Vector |
Authors | Sajad Darabi, Mohammad Kachuee, Majid Sarrafzadeh |
Abstract | Effective modeling of electronic health records presents many challenges as they contain large amounts of irregularity most of which are due to the varying procedures and diagnosis a patient may have. Despite the recent progress in machine learning, unsupervised learning remains largely at open, especially in the healthcare domain. In this work, we present a two-step unsupervised representation learning scheme to summarize the multi-modal clinical time series consisting of signals and medical codes into a patient status vector. First, an auto-encoder step is used to reduce sparse medical codes and clinical time series into a distributed representation. Subsequently, the concatenation of the distributed representations is further fine-tuned using a forecasting task. We evaluate the usefulness of the representation on two downstream tasks: mortality and readmission. Our proposed method shows improved generalization performance for both short duration ICU visits and long duration ICU visits. |
Tasks | Representation Learning, Time Series, Unsupervised Representation Learning |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.01803v1 |
https://arxiv.org/pdf/1910.01803v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-representation-for-ehr-signals |
Repo | |
Framework | |
Learning Structural Graph Layouts and 3D Shapes for Long Span Bridges 3D Reconstruction
Title | Learning Structural Graph Layouts and 3D Shapes for Long Span Bridges 3D Reconstruction |
Authors | Fangqiao Hu, Jin Zhao, Yong Hunag, Hui Li |
Abstract | A learning-based 3D reconstruction method for long-span bridges is proposed in this paper. 3D reconstruction generates a 3D computer model of a real object or scene from images, it involves many stages and open problems. Existing point-based methods focus on generating 3D point clouds and their reconstructed polygonal mesh or fitting-based geometrical models in urban scenes civil structures reconstruction within Manhattan world constrains and have made great achievements. Difficulties arise when an attempt is made to transfer these systems to structures with complex topology and part relations like steel trusses and long-span bridges, this could be attributed to point clouds are often unevenly distributed with noise and suffer from occlusions and incompletion, recovering a satisfactory 3D model from these highly unstructured point clouds in a bottom-up pattern while preserving the geometrical and topological properties makes enormous challenge to existing algorithms. Considering the prior human knowledge that these structures are in conformity to regular spatial layouts in terms of components, a learning-based topology-aware 3D reconstruction method which can obtain high-level structural graph layouts and low-level 3D shapes from images is proposed in this paper. We demonstrate the feasibility of this method by testing on two real long-span steel truss cable-stayed bridges. |
Tasks | 3D Reconstruction, Generating 3D Point Clouds |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03387v1 |
https://arxiv.org/pdf/1907.03387v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-structural-graph-layouts-and-3d |
Repo | |
Framework | |
Distributed interference cancellation in multi-agent scenarios
Title | Distributed interference cancellation in multi-agent scenarios |
Authors | Mahdi Shamsi, Alireza Moslemi Haghighi, Farokh Marvasti |
Abstract | This paper considers the problem of detecting impaired and noisy nodes over network. In a distributed algorithm, lots of processing units are incorporating and communicating with each other to reach a global goal. Due to each one’s state in the shared environment, they can help the other nodes or mislead them (due to noise or a deliberate attempt). Previous works mainly focused on proper locating agents and weight assignment based on initial environment state to minimize malfunctioning of noisy nodes. We propose an algorithm to be able to adapt sharing weights according to behavior of the agents. Applying the introduced algorithm to a multi-agent RL scenario and the well-known diffusion LMS demonstrates its capability and generality. |
Tasks | |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10109v1 |
https://arxiv.org/pdf/1910.10109v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-interference-cancellation-in |
Repo | |
Framework | |
Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace
Title | Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace |
Authors | Dimitrios Kollias, Stefanos Zafeiriou |
Abstract | Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task. Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a small part is manually annotated, iv) they contain a small number of subjects, or v) they are not annotated for all main behavior tasks (valence-arousal estimation, action unit detection and basic expression classification). To address these, we substantially extend the largest available in-the-wild database (Aff-Wild) to study continuous emotions such as valence and arousal. Furthermore, we annotate parts of the database with basic expressions and action units. As a consequence, for the first time, this allows the joint study of all three types of behavior states. We call this database Aff-Wild2. We conduct extensive experiments with CNN and CNN-RNN architectures that use visual and audio modalities; these networks are trained on Aff-Wild2 and their performance is then evaluated on 10 publicly available emotion databases. We show that the networks achieve state-of-the-art performance for the emotion recognition tasks. Additionally, we adapt the ArcFace loss function in the emotion recognition context and use it for training two new networks on Aff-Wild2 and then re-train them in a variety of diverse expression recognition databases. The networks are shown to improve the existing state-of-the-art. The database, emotion recognition models and source code are available at http://ibug.doc.ic.ac.uk/resources/aff-wild2. |
Tasks | Action Unit Detection, Emotion Recognition, Multi-Task Learning |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1910.04855v1 |
https://arxiv.org/pdf/1910.04855v1.pdf | |
PWC | https://paperswithcode.com/paper/expression-affect-action-unit-recognition-aff |
Repo | |
Framework | |
Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation
Title | Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation |
Authors | Rusheng Zhang, Romain Leteurtre, Benjamin Striner, Ammar Alanazi, Abdullah Alghafis, Ozan K. Tonguz |
Abstract | Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems that can optimize traffic signals based on limited detected information could be a cost-efficient solution for mitigating traffic congestion in the future. In this paper, we focus on a particular problem in PD-ITSC - adaptation to changing environments. To this end, we investigate different reinforcement learning algorithms, including Q-learning, Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), and Actor-Critic with Kronecker-Factored Trust Region (ACKTR). Our findings suggest that RL algorithms can find optimal strategies under partial vehicle detection; however, policy-based algorithms can adapt to changing environments more efficiently than value-based algorithms. We use these findings to draw conclusions about the value of different models for PD-ITSC systems. |
Tasks | Q-Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10808v1 |
https://arxiv.org/pdf/1910.10808v1.pdf | |
PWC | https://paperswithcode.com/paper/partially-detected-intelligent-traffic-signal |
Repo | |
Framework | |
Semi-supervised Learning for Word Sense Disambiguation
Title | Semi-supervised Learning for Word Sense Disambiguation |
Authors | Darío Garigliotti |
Abstract | This work is a study of the impact of multiple aspects in a classic unsupervised word sense disambiguation algorithm. We identify relevant factors in a decision rule algorithm, including the initial labeling of examples, the formalization of the rule confidence, and the criteria for accepting a decision rule. Some of these factors are only implicitly considered in the original literature. We then propose a lightly supervised version of the algorithm, and employ a pseudo-word-based strategy to evaluate the impact of these factors. The obtained performances are comparable with those of highly optimized formulations of the word sense disambiguation method. |
Tasks | Word Sense Disambiguation |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09641v1 |
https://arxiv.org/pdf/1908.09641v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-for-word-sense |
Repo | |
Framework | |
Semantic Hierarchy Preserving Deep Hashing for Large-scale Image Retrieval
Title | Semantic Hierarchy Preserving Deep Hashing for Large-scale Image Retrieval |
Authors | Xuefei Zhe, Le Ou-Yang, Shifeng Chen, Hong Yan |
Abstract | Convolutional neural networks have been widely used in content-based image retrieval. To better deal with large-scale data, the deep hashing model is proposed as an effective method, which maps an image to a binary code that can be used for hashing search. However, most existing deep hashing models only utilize fine-level semantic labels or convert them to similar/dissimilar labels for training. The natural semantic hierarchy structures are ignored in the training stage of the deep hashing model. In this paper, we present an effective algorithm to train a deep hashing model that can preserve a semantic hierarchy structure for large-scale image retrieval. Experiments on two datasets show that our method improves the fine-level retrieval performance. Meanwhile, our model achieves state-of-the-art results in terms of hierarchical retrieval. |
Tasks | Content-Based Image Retrieval, Image Retrieval |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11259v2 |
http://arxiv.org/pdf/1901.11259v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-hierarchy-preserving-deep-hashing |
Repo | |
Framework | |