Paper Group ANR 243
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs. Dialogue Act Segmentation for Vietnamese Human-Human Conversational Texts. Demystifying Relational Latent Representations. A Greedy Part Assignment Algorithm for Real-time Multi-person 2D Pose Estimation. Induction of Interpretable Possibilistic Logic Theories from Relational Dat …
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
Title | Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs |
Authors | Alon Brutzkus, Amir Globerson |
Abstract | Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations. |
Tasks | |
Published | 2017-02-26 |
URL | http://arxiv.org/abs/1702.07966v1 |
http://arxiv.org/pdf/1702.07966v1.pdf | |
PWC | https://paperswithcode.com/paper/globally-optimal-gradient-descent-for-a |
Repo | |
Framework | |
Dialogue Act Segmentation for Vietnamese Human-Human Conversational Texts
Title | Dialogue Act Segmentation for Vietnamese Human-Human Conversational Texts |
Authors | Thi Lan Ngo, Khac Linh Pham, Minh Son Cao, Son Bao Pham, Xuan Hieu Phan |
Abstract | Dialog act identification plays an important role in understanding conversations. It has been widely applied in many fields such as dialogue systems, automatic machine translation, automatic speech recognition, and especially useful in systems with human-computer natural language dialogue interfaces such as virtual assistants and chatbots. The first step of identifying dialog act is identifying the boundary of the dialog act in utterances. In this paper, we focus on segmenting the utterance according to the dialog act boundaries, i.e. functional segments identification, for Vietnamese utterances. We investigate carefully functional segment identification in two approaches: (1) machine learning approach using maximum entropy (ME) and conditional random fields (CRFs); (2) deep learning approach using bidirectional Long Short-Term Memory (LSTM) with a CRF layer (Bi-LSTM-CRF) on two different conversational datasets: (1) Facebook messages (Message data); (2) transcription from phone conversations (Phone data). To the best of our knowledge, this is the first work that applies deep learning based approach to dialog act segmentation. As the results show, deep learning approach performs appreciably better as to compare with traditional machine learning approaches. Moreover, it is also the first study that tackles dialog act and functional segment identification for Vietnamese. |
Tasks | Machine Translation, Speech Recognition |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04765v1 |
http://arxiv.org/pdf/1708.04765v1.pdf | |
PWC | https://paperswithcode.com/paper/dialogue-act-segmentation-for-vietnamese |
Repo | |
Framework | |
Demystifying Relational Latent Representations
Title | Demystifying Relational Latent Representations |
Authors | Sebastijan Dumančić, Hendrik Blockeel |
Abstract | Latent features learned by deep learning approaches have proven to be a powerful tool for machine learning. They serve as a data abstraction that makes learning easier by capturing regularities in data explicitly. Their benefits motivated their adaptation to relational learning context. In our previous work, we introduce an approach that learns relational latent features by means of clustering instances and their relations. The major drawback of latent representations is that they are often black-box and difficult to interpret. This work addresses these issues and shows that (1) latent features created by clustering are interpretable and capture interesting properties of data; (2) they identify local regions of instances that match well with the label, which partially explains their benefit; and (3) although the number of latent features generated by this approach is large, often many of them are highly redundant and can be removed without hurting performance much. |
Tasks | Relational Reasoning |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05785v3 |
http://arxiv.org/pdf/1705.05785v3.pdf | |
PWC | https://paperswithcode.com/paper/demystifying-relational-latent |
Repo | |
Framework | |
A Greedy Part Assignment Algorithm for Real-time Multi-person 2D Pose Estimation
Title | A Greedy Part Assignment Algorithm for Real-time Multi-person 2D Pose Estimation |
Authors | Srenivas Varadarajan, Parual Datta, Omesh Tickoo |
Abstract | Human pose-estimation in a multi-person image involves detection of various body parts and grouping them into individual person clusters. While the former task is challenging due to mutual occlusions, the combinatorial complexity of the latter task is very high. We propose a greedy part assignment algorithm that exploits the inherent structure of the human body to achieve a lower complexity, compared to any of the prior published works. This is accomplished by (i) reducing the number of part-candidates using the estimated number of people in the image, (ii) doing a greedy sequential assignment of part-classes, following the kinematic chain from head to ankle (iii) doing a greedy assignment of parts in each part-class set, to person-clusters (iv) limiting the candidate person clusters to the most proximal clusters using human anthropometric data and (v) using only a specific subset of pre-assigned parts for establishing pairwise structural constraints. We show that, these steps result in a sparse body parts relationship graph and reduces the complexity. We also propose methods for improving the accuracy of pose-estimation by (i) spawning person-clusters from any unassigned significant body part and (ii) suppressing hallucinated parts. On the MPII multi-person pose database, pose-estimation using the proposed method takes only 0.14 seconds per image. We show that, our proposed algorithm, by using a large spatial and structural context, achieves the state-of-the-art accuracy on both MPII and WAF multi-person pose datasets, demonstrating the robustness of our approach. |
Tasks | Pose Estimation |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09182v1 |
http://arxiv.org/pdf/1708.09182v1.pdf | |
PWC | https://paperswithcode.com/paper/a-greedy-part-assignment-algorithm-for-real |
Repo | |
Framework | |
Induction of Interpretable Possibilistic Logic Theories from Relational Data
Title | Induction of Interpretable Possibilistic Logic Theories from Relational Data |
Authors | Ondrej Kuzelka, Jesse Davis, Steven Schockaert |
Abstract | The field of Statistical Relational Learning (SRL) is concerned with learning probabilistic models from relational data. Learned SRL models are typically represented using some kind of weighted logical formulas, which make them considerably more interpretable than those obtained by e.g. neural networks. In practice, however, these models are often still difficult to interpret correctly, as they can contain many formulas that interact in non-trivial ways and weights do not always have an intuitive meaning. To address this, we propose a new SRL method which uses possibilistic logic to encode relational models. Learned models are then essentially stratified classical theories, which explicitly encode what can be derived with a given level of certainty. Compared to Markov Logic Networks (MLNs), our method is faster and produces considerably more interpretable models. |
Tasks | Relational Reasoning |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07095v1 |
http://arxiv.org/pdf/1705.07095v1.pdf | |
PWC | https://paperswithcode.com/paper/induction-of-interpretable-possibilistic |
Repo | |
Framework | |
Stability Enhanced Large-Margin Classifier Selection
Title | Stability Enhanced Large-Margin Classifier Selection |
Authors | Will Wei Sun, Guang Cheng, Yufeng Liu |
Abstract | Stability is an important aspect of a classification procedure because unstable predictions can potentially reduce users’ trust in a classification system and also harm the reproducibility of scientific conclusions. The major goal of our work is to introduce a novel concept of classification instability, i.e., decision boundary instability (DBI), and incorporate it with the generalization error (GE) as a standard for selecting the most accurate and stable classifier. Specifically, we implement a two-stage algorithm: (i) initially select a subset of classifiers whose estimated GEs are not significantly different from the minimal estimated GE among all the candidate classifiers; (ii) the optimal classifier is chosen as the one achieving the minimal DBI among the subset selected in stage (i). This general selection principle applies to both linear and nonlinear classifiers. Large-margin classifiers are used as a prototypical example to illustrate the above idea. Our selection method is shown to be consistent in the sense that the optimal classifier simultaneously achieves the minimal GE and the minimal DBI. Various simulations and real examples further demonstrate the advantage of our method over several alternative approaches. |
Tasks | |
Published | 2017-01-20 |
URL | http://arxiv.org/abs/1701.05672v1 |
http://arxiv.org/pdf/1701.05672v1.pdf | |
PWC | https://paperswithcode.com/paper/stability-enhanced-large-margin-classifier |
Repo | |
Framework | |
Inferring Networked Device Categories from Low-Level Activity Indicators
Title | Inferring Networked Device Categories from Low-Level Activity Indicators |
Authors | Kyumars Sheykh Esmaili, Jaideep Chandrashekar, Pascal Le Guyadec |
Abstract | We study the problem of inferring the type of a networked device in a home network by leveraging low level traffic activity indicators seen at commodity home gateways. We analyze a dataset of detailed device network activity obtained from 240 subscriber homes of a large European ISP and extract a number of traffic and spatial fingerprints for individual devices. We develop a two level taxonomy to describe devices onto which we map individual devices using a number of heuristics. We leverage the heuristically derived labels to train classifiers that distinguish device classes based on the traffic and spatial fingerprints of a device. Our results show an accuracy level up to 91% for the coarse level category and up to 84% for the fine grained category. By incorporating information from other sources (e.g., MAC OUI), we are able to further improve accuracy to above 97% and 92%, respectively. Finally, we also extract a set of simple and human-readable rules that concisely capture the behaviour of these distinct device categories. |
Tasks | |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00348v1 |
http://arxiv.org/pdf/1709.00348v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-networked-device-categories-from |
Repo | |
Framework | |
Geospatial Semantics
Title | Geospatial Semantics |
Authors | Yingjie Hu |
Abstract | Geospatial semantics is a broad field that involves a variety of research areas. The term semantics refers to the meaning of things, and is in contrast with the term syntactics. Accordingly, studies on geospatial semantics usually focus on understanding the meaning of geographic entities as well as their counterparts in the cognitive and digital world, such as cognitive geographic concepts and digital gazetteers. Geospatial semantics can also facilitate the design of geographic information systems (GIS) by enhancing the interoperability of distributed systems and developing more intelligent interfaces for user interactions. During the past years, a lot of research has been conducted, approaching geospatial semantics from different perspectives, using a variety of methods, and targeting different problems. Meanwhile, the arrival of big geo data, especially the large amount of unstructured text data on the Web, and the fast development of natural language processing methods enable new research directions in geospatial semantics. This chapter, therefore, provides a systematic review on the existing geospatial semantic research. Six major research areas are identified and discussed, including semantic interoperability, digital gazetteers, geographic information retrieval, geospatial Semantic Web, place semantics, and cognitive geographic concepts. |
Tasks | Information Retrieval |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03550v2 |
http://arxiv.org/pdf/1707.03550v2.pdf | |
PWC | https://paperswithcode.com/paper/geospatial-semantics |
Repo | |
Framework | |
A New Rank Constraint on Multi-view Fundamental Matrices, and its Application to Camera Location Recovery
Title | A New Rank Constraint on Multi-view Fundamental Matrices, and its Application to Camera Location Recovery |
Authors | Soumyadip Sengupta, Tal Amir, Meirav Galun, Tom Goldstein, David W. Jacobs, Amit Singer, Ronen Basri |
Abstract | Accurate estimation of camera matrices is an important step in structure from motion algorithms. In this paper we introduce a novel rank constraint on collections of fundamental matrices in multi-view settings. We show that in general, with the selection of proper scale factors, a matrix formed by stacking fundamental matrices between pairs of images has rank 6. Moreover, this matrix forms the symmetric part of a rank 3 matrix whose factors relate directly to the corresponding camera matrices. We use this new characterization to produce better estimations of fundamental matrices by optimizing an L1-cost function using Iterative Re-weighted Least Squares and Alternate Direction Method of Multiplier. We further show that this procedure can improve the recovery of camera locations, particularly in multi-view settings in which fewer images are available. |
Tasks | |
Published | 2017-02-10 |
URL | http://arxiv.org/abs/1702.03023v1 |
http://arxiv.org/pdf/1702.03023v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-rank-constraint-on-multi-view |
Repo | |
Framework | |
Interpretable Graph-Based Semi-Supervised Learning via Flows
Title | Interpretable Graph-Based Semi-Supervised Learning via Flows |
Authors | Raif M. Rustamov, James T. Klosowski |
Abstract | In this paper, we consider the interpretability of the foundational Laplacian-based semi-supervised learning approaches on graphs. We introduce a novel flow-based learning framework that subsumes the foundational approaches and additionally provides a detailed, transparent, and easily understood expression of the learning process in terms of graph flows. As a result, one can visualize and interactively explore the precise subgraph along which the information from labeled nodes flows to an unlabeled node of interest. Surprisingly, the proposed framework avoids trading accuracy for interpretability, but in fact leads to improved prediction accuracy, which is supported both by theoretical considerations and empirical results. The flow-based framework guarantees the maximum principle by construction and can handle directed graphs in an out-of-the-box manner. |
Tasks | |
Published | 2017-09-14 |
URL | http://arxiv.org/abs/1709.04764v1 |
http://arxiv.org/pdf/1709.04764v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-graph-based-semi-supervised |
Repo | |
Framework | |
Reliable Decision Support using Counterfactual Models
Title | Reliable Decision Support using Counterfactual Models |
Authors | Peter Schulam, Suchi Saria |
Abstract | Decision-makers are faced with the challenge of estimating what is likely to happen when they take an action. For instance, if I choose not to treat this patient, are they likely to die? Practitioners commonly use supervised learning algorithms to fit predictive models that help decision-makers reason about likely future outcomes, but we show that this approach is unreliable, and sometimes even dangerous. The key issue is that supervised learning algorithms are highly sensitive to the policy used to choose actions in the training data, which causes the model to capture relationships that do not generalize. We propose using a different learning objective that predicts counterfactuals instead of predicting outcomes under an existing action policy as in supervised learning. To support decision-making in temporal settings, we introduce the Counterfactual Gaussian Process (CGP) to predict the counterfactual future progression of continuous-time trajectories under sequences of future actions. We demonstrate the benefits of the CGP on two important decision-support tasks: risk prediction and “what if?” reasoning for individualized treatment planning. |
Tasks | Decision Making |
Published | 2017-03-30 |
URL | http://arxiv.org/abs/1703.10651v4 |
http://arxiv.org/pdf/1703.10651v4.pdf | |
PWC | https://paperswithcode.com/paper/reliable-decision-support-using |
Repo | |
Framework | |
Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model
Title | Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model |
Authors | Priyadarshini Panda, Narayan Srinivasa |
Abstract | A fundamental challenge in machine learning today is to build a model that can learn from few examples. Here, we describe a reservoir based spiking neural model for learning to recognize actions with a limited number of labeled videos. First, we propose a novel encoding, inspired by how microsaccades influence visual perception, to extract spike information from raw video data while preserving the temporal correlation across different frames. Using this encoding, we show that the reservoir generalizes its rich dynamical activity toward signature action/movements enabling it to learn from few training examples. We evaluate our approach on the UCF-101 dataset. Our experiments demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5 accuracy, respectively, on the 101-class data while requiring just 8 video examples per class for training. Our results establish a new benchmark for action recognition from limited video examples for spiking neural models while yielding competetive accuracy with respect to state-of-the-art non-spiking neural models. |
Tasks | Temporal Action Localization |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07354v1 |
http://arxiv.org/pdf/1710.07354v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-recognize-actions-from-limited |
Repo | |
Framework | |
Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder
Title | Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder |
Authors | Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio |
Abstract | We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation. We develop a model that plans ahead when it computes alignments between the source and target sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the strategic attentive reader and writer (STRAW) model. Our proposed model is end-to-end trainable with fully differentiable operations. We show that it outperforms a strong baseline on three character-level decoder neural machine translation on WMT’15 corpus. Our analysis demonstrates that our model can compute qualitatively intuitive alignments and achieves superior performance with fewer parameters. |
Tasks | Machine Translation |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1706.05087v2 |
http://arxiv.org/pdf/1706.05087v2.pdf | |
PWC | https://paperswithcode.com/paper/plan-attend-generate-character-level-neural-1 |
Repo | |
Framework | |
Joint Cuts and Matching of Partitions in One Graph
Title | Joint Cuts and Matching of Partitions in One Graph |
Authors | Tianshu Yu, Junchi Yan, Jieyi Zhao, Baoxin Li |
Abstract | As two fundamental problems, graph cuts and graph matching have been investigated over decades, resulting in vast literature in these two topics respectively. However the way of jointly applying and solving graph cuts and matching receives few attention. In this paper, we first formalize the problem of simultaneously cutting a graph into two partitions i.e. graph cuts and establishing their correspondence i.e. graph matching. Then we develop an optimization algorithm by updating matching and cutting alternatively, provided with theoretical analysis. The efficacy of our algorithm is verified on both synthetic dataset and real-world images containing similar regions or structures. |
Tasks | Graph Matching |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09584v1 |
http://arxiv.org/pdf/1711.09584v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-cuts-and-matching-of-partitions-in-one |
Repo | |
Framework | |
Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems
Title | Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems |
Authors | Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana |
Abstract | Due to the increasing usage of machine learning (ML) techniques in security- and safety-critical domains, such as autonomous systems and medical diagnosis, ensuring correct behavior of ML systems, especially for different corner cases, is of growing importance. In this paper, we propose a generic framework for evaluating security and robustness of ML systems using different real-world safety properties. We further design, implement and evaluate VeriVis, a scalable methodology that can verify a diverse set of safety properties for state-of-the-art computer vision systems with only blackbox access. VeriVis leverage different input space reduction techniques for efficient verification of different safety properties. VeriVis is able to find thousands of safety violations in fifteen state-of-the-art computer vision systems including ten Deep Neural Networks (DNNs) such as Inception-v3 and Nvidia’s Dave self-driving system with thousands of neurons as well as five commercial third-party vision APIs including Google vision and Clarifai for twelve different safety properties. Furthermore, VeriVis can successfully verify local safety properties, on average, for around 31.7% of the test images. VeriVis finds up to 64.8x more violations than existing gradient-based methods that, unlike VeriVis, cannot ensure non-existence of any violations. Finally, we show that retraining using the safety violations detected by VeriVis can reduce the average number of violations up to 60.2%. |
Tasks | Medical Diagnosis |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01785v3 |
http://arxiv.org/pdf/1712.01785v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-practical-verification-of-machine |
Repo | |
Framework | |