Paper Group ANR 138
Embedding Syntax and Semantics of Prepositions via Tensor Decomposition. Stable safe screening and structured dictionaries for faster L1 regularization. Modeling Attention Flow on Graphs. Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds. Order Effects for Queries in Intelligent Systems. Matrix Co-com …
Embedding Syntax and Semantics of Prepositions via Tensor Decomposition
Title | Embedding Syntax and Semantics of Prepositions via Tensor Decomposition |
Authors | Hongyu Gong, Suma Bhat, Pramod Viswanath |
Abstract | Prepositions are among the most frequent words in English and play complex roles in the syntax and semantics of sentences. Not surprisingly, they pose well-known difficulties in automatic processing of sentences (prepositional attachment ambiguities and idiosyncratic uses in phrases). Existing methods on preposition representation treat prepositions no different from content words (e.g., word2vec and GloVe). In addition, recent studies aiming at solving prepositional attachment and preposition selection problems depend heavily on external linguistic resources and use dataset-specific word representations. In this paper we use word-triple counts (one of the triples being a preposition) to capture a preposition’s interaction with its attachment and complement. We then derive preposition embeddings via tensor decomposition on a large unlabeled corpus. We reveal a new geometry involving Hadamard products and empirically demonstrate its utility in paraphrasing phrasal verbs. Furthermore, our preposition embeddings are used as simple features in two challenging downstream tasks: preposition selection and prepositional attachment disambiguation. We achieve results comparable to or better than the state-of-the-art on multiple standardized datasets. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09389v1 |
http://arxiv.org/pdf/1805.09389v1.pdf | |
PWC | https://paperswithcode.com/paper/embedding-syntax-and-semantics-of |
Repo | |
Framework | |
Stable safe screening and structured dictionaries for faster L1 regularization
Title | Stable safe screening and structured dictionaries for faster L1 regularization |
Authors | Cassio Fraga Dantas, Rémi Gribonval |
Abstract | In this paper, we propose a way to combine two acceleration techniques for the $\ell_{1}$-regularized least squares problem: safe screening tests, which allow to eliminate useless dictionary atoms; and the use of fast structured approximations of the dictionary matrix. To do so, we introduce a new family of screening tests, termed stable screening, which can cope with approximation errors on the dictionary atoms while keeping the safety of the test (i.e. zero risk of rejecting atoms belonging to the solution support). Some of the main existing screening tests are extended to this new framework. The proposed algorithm consists in using a coarser (but faster) approximation of the dictionary at the initial iterations and then switching to better approximations until eventually adopting the original dictionary. A systematic switching criterion based on the duality gap saturation and the screening ratio is derived.Simulation results show significant reductions in both computational complexity and execution times for a wide range of tested scenarios. |
Tasks | |
Published | 2018-12-17 |
URL | https://arxiv.org/abs/1812.06635v3 |
https://arxiv.org/pdf/1812.06635v3.pdf | |
PWC | https://paperswithcode.com/paper/stable-safe-screening-and-structured |
Repo | |
Framework | |
Modeling Attention Flow on Graphs
Title | Modeling Attention Flow on Graphs |
Authors | Xiaoran Xu, Songpeng Zu, Chengliang Gao, Yuan Zhang, Wei Feng |
Abstract | Real-world scenarios demand reasoning about process, more than final outcome prediction, to discover latent causal chains and better understand complex systems. It requires the learning algorithms to offer both accurate predictions and clear interpretations. We design a set of trajectory reasoning tasks on graphs with only the source and the destination observed. We present the attention flow mechanism to explicitly model the reasoning process, leveraging the relational inductive biases by basing our models on graph networks. We study the way attention flow can effectively act on the underlying information flow implemented by message passing. Experiments demonstrate that the attention flow driven by and interacting with graph networks can provide higher accuracy in prediction and better interpretation for trajectory reasoning. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00497v2 |
http://arxiv.org/pdf/1811.00497v2.pdf | |
PWC | https://paperswithcode.com/paper/modeling-attention-flow-on-graphs |
Repo | |
Framework | |
Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds
Title | Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds |
Authors | Santosh Vempala, John Wilmes |
Abstract | We study the complexity of training neural network models with one hidden nonlinear activation layer and an output weighted sum layer. We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs. We give an agnostic learning guarantee for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error (in $2$-norm) of the best approximation of the target function using a polynomial of degree at most $k$. Moreover, for any $k$, the size of the network and number of iterations needed are both bounded by $n^{O(k)}\log(1/\epsilon)$. In particular, this applies to training networks of unbiased sigmoids and ReLUs. We also rigorously explain the empirical finding that gradient descent discovers lower frequency Fourier components before higher frequency components. We complement this result with nearly matching lower bounds in the Statistical Query model. GD fits well in the SQ framework since each training step is determined by an expectation over the input distribution. We show that any SQ algorithm that achieves significant improvement over a constant function with queries of tolerance some inverse polynomial in the input dimensionality $n$ must use $n^{\Omega(k)}$ queries even when the target functions are restricted to a set of $n^{O(k)}$ degree-$k$ polynomials, and the input distribution is uniform over the unit sphere; for this class the information-theoretic lower bound is only $\Theta(k \log n)$. Our approach for both parts is based on spherical harmonics. We view gradient descent as an operator on the space of functions, and study its dynamics. An essential tool is the Funk-Hecke theorem, which explains the eigenfunctions of this operator in the case of the mean squared loss. |
Tasks | |
Published | 2018-05-07 |
URL | https://arxiv.org/abs/1805.02677v3 |
https://arxiv.org/pdf/1805.02677v3.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-for-one-hidden-layer-neural |
Repo | |
Framework | |
Order Effects for Queries in Intelligent Systems
Title | Order Effects for Queries in Intelligent Systems |
Authors | Subhash Kak |
Abstract | This paper examines common assumptions regarding the decision-making internal environment for intelligent agents and investigates issues related to processing of memory and belief states to help obtain better understanding of the responses. In specific, we consider order effects and discuss both classical and non-classical explanations for them. We also consider implicit cognition and explore if certain inaccessible states may be best modeled as quantum states. We propose that the hypothesis that quantum states are at the basis of order effects be tested on large databases such as those related to medical treatment and drug efficacy. A problem involving a maze network is considered and comparisons made between classical and quantum decision scenarios for it. |
Tasks | Decision Making |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02759v1 |
http://arxiv.org/pdf/1804.02759v1.pdf | |
PWC | https://paperswithcode.com/paper/order-effects-for-queries-in-intelligent |
Repo | |
Framework | |
Matrix Co-completion for Multi-label Classification with Missing Features and Labels
Title | Matrix Co-completion for Multi-label Classification with Missing Features and Labels |
Authors | Miao Xu, Gang Niu, Bo Han, Ivor W. Tsang, Zhi-Hua Zhou, Masashi Sugiyama |
Abstract | We consider a challenging multi-label classification problem where both feature matrix $\X$ and label matrix $\Y$ have missing entries. An existing method concatenated $\X$ and $\Y$ as $[\X; \Y]$ and applied a matrix completion (MC) method to fill the missing entries, under the assumption that $[\X; \Y]$ is of low-rank. However, since entries of $\Y$ take binary values in the multi-label setting, it is unlikely that $\Y$ is of low-rank. Moreover, such assumption implies a linear relationship between $\X$ and $\Y$ which may not hold in practice. In this paper, we consider a latent matrix $\Z$ that produces the probability $\sigma(Z_{ij})$ of generating label $Y_{ij}$, where $\sigma(\cdot)$ is nonlinear. Considering label correlation, we assume $[\X; \Z]$ is of low-rank, and propose an MC algorithm based on subgradient descent named co-completion (COCO) motivated by elastic net and one-bit MC. We give a theoretical bound on the recovery effect of COCO and demonstrate its practical usefulness through experiments. |
Tasks | Matrix Completion, Multi-Label Classification |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09156v1 |
http://arxiv.org/pdf/1805.09156v1.pdf | |
PWC | https://paperswithcode.com/paper/matrix-co-completion-for-multi-label |
Repo | |
Framework | |
Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising
Title | Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising |
Authors | Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Guang Li, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, Ge Wang |
Abstract | Computed tomography (CT) is a popular medical imaging modality in clinical applications. At the same time, the x-ray radiation dose associated with CT scans raises public concerns due to its potential risks to the patients. Over the past years, major efforts have been dedicated to the development of Low-Dose CT (LDCT) methods. However, the radiation dose reduction compromises the signal-to-noise ratio (SNR), leading to strong noise and artifacts that down-grade CT image quality. In this paper, we propose a novel 3D noise reduction method, called Structure-sensitive Multi-scale Generative Adversarial Net (SMGAN), to improve the LDCT image quality. Specifically, we incorporate three-dimensional (3D) volumetric information to improve the image quality. Also, different loss functions for training denoising models are investigated. Experiments show that the proposed method can effectively preserve structural and texture information from normal-dose CT (NDCT) images, and significantly suppress noise and artifacts. Qualitative visual assessments by three experienced radiologists demonstrate that the proposed method retrieves more detailed information, and outperforms competing methods. |
Tasks | Computed Tomography (CT), Denoising |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00587v3 |
http://arxiv.org/pdf/1805.00587v3.pdf | |
PWC | https://paperswithcode.com/paper/structure-sensitive-multi-scale-deep-neural |
Repo | |
Framework | |
Bayesian approach to model-based extrapolation of nuclear observables
Title | Bayesian approach to model-based extrapolation of nuclear observables |
Authors | Léo Neufcourt, Yuchen Cao, Witold Nazarewicz, Frederi Viens |
Abstract | The mass, or binding energy, is the basis property of the atomic nucleus. It determines its stability, and reaction and decay rates. Quantifying the nuclear binding is important for understanding the origin of elements in the universe. The astrophysical processes responsible for the nucleosynthesis in stars often take place far from the valley of stability, where experimental masses are not known. In such cases, missing nuclear information must be provided by theoretical predictions using extreme extrapolations. Bayesian machine learning techniques can be applied to improve predictions by taking full advantage of the information contained in the deviations between experimental and calculated masses. We consider 10 global models based on nuclear Density Functional Theory as well as two more phenomenological mass models. The emulators of S2n residuals and credibility intervals defining theoretical error bars are constructed using Bayesian Gaussian processes and Bayesian neural networks. We consider a large training dataset pertaining to nuclei whose masses were measured before 2003. For the testing datasets, we considered those exotic nuclei whose masses have been determined after 2003. We then carried out extrapolations towards the 2n dripline. While both Gaussian processes and Bayesian neural networks reduce the rms deviation from experiment significantly, GP offers a better and much more stable performance. The increase in the predictive power is quite astonishing: the resulting rms deviations from experiment on the testing dataset are similar to those of more phenomenological models. The empirical coverage probability curves we obtain match very well the reference values which is highly desirable to ensure honesty of uncertainty quantification, and the estimated credibility intervals on predictions make it possible to evaluate predictive power of individual models. |
Tasks | Gaussian Processes |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00552v3 |
http://arxiv.org/pdf/1806.00552v3.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-approach-to-model-based |
Repo | |
Framework | |
Chinese Herbal Recognition based on Competitive Attentional Fusion of Multi-hierarchies Pyramid Features
Title | Chinese Herbal Recognition based on Competitive Attentional Fusion of Multi-hierarchies Pyramid Features |
Authors | Yingxue Xu, Guihua Wen, Yang Hu, Mingnan Luo, Dan Dai, Yishan Zhuang |
Abstract | Convolution neural netwotks (CNNs) are successfully applied in image recognition task. In this study, we explore the approach of automatic herbal recognition with CNNs and build the standard Chinese herbs datasets firstly. According to the characteristics of herbal images, we proposed the competitive attentional fusion pyramid networks to model the features of herbal image, which mdoels the relationship of feature maps from different levels, and re-weights multi-level channels with channel-wise attention mechanism. In this way, we can dynamically adjust the weight of feature maps from various layers, according to the visual characteristics of each herbal image. Moreover, we also introduce the spatial attention to recalibrate the misaligned features caused by sampling in features amalgamation. Extensive experiments are conducted on our proposed datasets and validate the superior performance of our proposed models. The Chinese herbs datasets will be released upon acceptance to facilitate the research of Chinese herbal recognition. |
Tasks | |
Published | 2018-12-23 |
URL | http://arxiv.org/abs/1812.09648v1 |
http://arxiv.org/pdf/1812.09648v1.pdf | |
PWC | https://paperswithcode.com/paper/chinese-herbal-recognition-based-on |
Repo | |
Framework | |
Automating Personnel Rostering by Learning Constraints Using Tensors
Title | Automating Personnel Rostering by Learning Constraints Using Tensors |
Authors | Mohit Kumar, Stefano Teso, Luc De Raedt |
Abstract | Many problems in operations research require that constraints be specified in the model. Determining the right constraints is a hard and laborsome task. We propose an approach to automate this process using artificial intelligence and machine learning principles. So far there has been only little work on learning constraints within the operations research community. We focus on personnel rostering and scheduling problems in which there are often past schedules available and show that it is possible to automatically learn constraints from such examples. To realize this, we adapted some techniques from the constraint programming community and we have extended them in order to cope with multidimensional examples. The method uses a tensor representation of the example, which helps in capturing the dimensionality as well as the structure of the example, and applies tensor operations to find the constraints that are satisfied by the example. To evaluate the proposed algorithm, we used constraints from the Nurse Rostering Competition and generated solutions that satisfy these constraints; these solutions were then used as examples to learn constraints. Experiments demonstrate that the proposed algorithm is capable of producing human readable constraints that capture the underlying characteristics of the examples. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11375v1 |
http://arxiv.org/pdf/1805.11375v1.pdf | |
PWC | https://paperswithcode.com/paper/automating-personnel-rostering-by-learning |
Repo | |
Framework | |
HierLPR: Decision making in hierarchical multi-label classification with local precision rates
Title | HierLPR: Decision making in hierarchical multi-label classification with local precision rates |
Authors | Christine Ho, Yuting Ye, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang |
Abstract | In this article we propose a novel ranking algorithm, referred to as HierLPR, for the multi-label classification problem when the candidate labels follow a known hierarchical structure. HierLPR is motivated by a new metric called eAUC that we design to assess the ranking of classification decisions. This metric, associated with the hit curve and local precision rate, emphasizes the accuracy of the first calls. We show that HierLPR optimizes eAUC under the tree constraint and some light assumptions on the dependency between the nodes in the hierarchy. We also provide a strategy to make calls for each node based on the ordering produced by HierLPR, with the intent of controlling FDR or maximizing F-score. The performance of our proposed methods is demonstrated on synthetic datasets as well as a real example of disease diagnosis using NCBI GEO datasets. In these cases, HierLPR shows a favorable result over competing methods in the early part of the precision-recall curve. |
Tasks | Decision Making, Multi-Label Classification |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.07954v1 |
http://arxiv.org/pdf/1810.07954v1.pdf | |
PWC | https://paperswithcode.com/paper/hierlpr-decision-making-in-hierarchical-multi |
Repo | |
Framework | |
Constructing Category-Specific Models for Monocular Object-SLAM
Title | Constructing Category-Specific Models for Monocular Object-SLAM |
Authors | Parv Parkhiya, Rishabh Khawad, J. Krishna Murthy, Brojeshwar Bhowmick, K. Madhava Krishna |
Abstract | We present a new paradigm for real-time object-oriented SLAM with a monocular camera. Contrary to previous approaches, that rely on object-level models, we construct category-level models from CAD collections which are now widely available. To alleviate the need for huge amounts of labeled data, we develop a rendering pipeline that enables synthesis of large datasets from a limited amount of manually labeled data. Using data thus synthesized, we learn category-level models for object deformations in 3D, as well as discriminative object features in 2D. These category models are instance-independent and aid in the design of object landmark observations that can be incorporated into a generic monocular SLAM framework. Where typical object-SLAM approaches usually solve only for object and camera poses, we also estimate object shape on-the-fly, allowing for a wide range of objects from the category to be present in the scene. Moreover, since our 2D object features are learned discriminatively, the proposed object-SLAM system succeeds in several scenarios where sparse feature-based monocular SLAM fails due to insufficient features or parallax. Also, the proposed category-models help in object instance retrieval, useful for Augmented Reality (AR) applications. We evaluate the proposed framework on multiple challenging real-world scenes and show — to the best of our knowledge — first results of an instance-independent monocular object-SLAM system and the benefits it enjoys over feature-based SLAM methods. |
Tasks | |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09292v1 |
http://arxiv.org/pdf/1802.09292v1.pdf | |
PWC | https://paperswithcode.com/paper/constructing-category-specific-models-for |
Repo | |
Framework | |
Fast Conditional Independence Test for Vector Variables with Large Sample Sizes
Title | Fast Conditional Independence Test for Vector Variables with Large Sample Sizes |
Authors | Krzysztof Chalupka, Pietro Perona, Frederick Eberhardt |
Abstract | We present and evaluate the Fast (conditional) Independence Test (FIT) – a nonparametric conditional independence test. The test is based on the idea that when $P(X \mid Y, Z) = P(X \mid Y)$, $Z$ is not useful as a feature to predict $X$, as long as $Y$ is also a regressor. On the contrary, if $P(X \mid Y, Z) \neq P(X \mid Y)$, $Z$ might improve prediction results. FIT applies to thousand-dimensional random variables with a hundred thousand samples in a fraction of the time required by alternative methods. We provide an extensive evaluation that compares FIT to six extant nonparametric independence tests. The evaluation shows that FIT has low probability of making both Type I and Type II errors compared to other tests, especially as the number of available samples grows. Our implementation of FIT is publicly available. |
Tasks | |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02747v1 |
http://arxiv.org/pdf/1804.02747v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-conditional-independence-test-for-vector |
Repo | |
Framework | |
Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks
Title | Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks |
Authors | Raphael Tang, Gefei Yang, Hong Wei, Yajie Mao, Ferhan Ture, Jimmy Lin |
Abstract | Voice-enabled commercial products are ubiquitous, typically enabled by lightweight on-device keyword spotting (KWS) and full automatic speech recognition (ASR) in the cloud. ASR systems require significant computational resources in training and for inference, not to mention copious amounts of annotated speech data. KWS systems, on the other hand, are less resource-intensive but have limited capabilities. On the Comcast Xfinity X1 entertainment platform, we explore a middle ground between ASR and KWS: We introduce a novel, resource-efficient neural network for voice query recognition that is much more accurate than state-of-the-art CNNs for KWS, yet can be easily trained and deployed with limited resources. On an evaluation dataset representing the top 200 voice queries, we achieve a low false alarm rate of 1% and a query error rate of 6%. Our model performs inference 8.24x faster than the current ASR system. |
Tasks | Keyword Spotting, Speech Recognition, Voice Query Recognition |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.07754v1 |
http://arxiv.org/pdf/1812.07754v1.pdf | |
PWC | https://paperswithcode.com/paper/streaming-voice-query-recognition-using |
Repo | |
Framework | |
Weighted Community Detection and Data Clustering Using Message Passing
Title | Weighted Community Detection and Data Clustering Using Message Passing |
Authors | Cheng Shi, Yanchen Liu, Pan Zhang |
Abstract | Grouping objects into clusters based on similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message passing algorithms and spectral algorithms proposed for unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to Potts model at the critical temperature of spin glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks and use extensive numerical experiments to illustrate the advantage of our method over existing algorithms. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem when the data was generated by mixture models in the sparse regime we show that our method works to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which reduce heavily the computation complexity in dense-networks but gives almost the same performance as belief propagation. |
Tasks | Bayesian Inference, Community Detection |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.09829v1 |
http://arxiv.org/pdf/1801.09829v1.pdf | |
PWC | https://paperswithcode.com/paper/weighted-community-detection-and-data |
Repo | |
Framework | |