Paper Group AWR 243
Fashion Image Retrieval with Capsule Networks. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. Efficient Covariance Estimation from Temporal Data. Neumann Networks for Inverse Problems in Imaging. The Regression Tsetlin Machine: A Tsetlin Machine for Continuous Output Problems. Using Local Knowledge Graph Cons …
Fashion Image Retrieval with Capsule Networks
Title | Fashion Image Retrieval with Capsule Networks |
Authors | Furkan Kınlı, Barış Özcan, Furkan Kıraç |
Abstract | In this study, we investigate in-shop clothing retrieval performance of densely-connected Capsule Networks with dynamic routing. To achieve this, we propose Triplet-based design of Capsule Network architecture with two different feature extraction methods. In our design, Stacked-convolutional (SC) and Residual-connected (RC) blocks are used to form the input of capsule layers. Experimental results show that both of our designs outperform all variants of the baseline study, namely FashionNet, without relying on the landmark information. Moreover, when compared to the SOTA architectures on clothing retrieval, our proposed Triplet Capsule Networks achieve comparable recall rates only with half of parameters used in the SOTA architectures. |
Tasks | Image Retrieval |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09943v1 |
https://arxiv.org/pdf/1908.09943v1.pdf | |
PWC | https://paperswithcode.com/paper/fashion-image-retrieval-with-capsule-networks |
Repo | https://github.com/birdortyedi/image-retrieval-with-capsules |
Framework | tf |
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
Title | Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation |
Authors | Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang |
Abstract | We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard. |
Tasks | Text-To-Sql |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08205v2 |
https://arxiv.org/pdf/1905.08205v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-complex-text-to-sql-in-cross-domain |
Repo | https://github.com/tin-chata/IRnet |
Framework | none |
Efficient Covariance Estimation from Temporal Data
Title | Efficient Covariance Estimation from Temporal Data |
Authors | Hrayr Harutyunyan, Daniel Moyer, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan |
Abstract | Estimating the covariance structure of multivariate time series is a fundamental problem with a wide-range of real-world applications – from financial modeling to fMRI analysis. Despite significant recent advances, current state-of-the-art methods are still severely limited in terms of scalability, and do not work well in high-dimensional undersampled regimes. In this work we propose a novel method called Temporal Correlation Explanation, or T-CorEx, that (a) has linear time and memory complexity with respect to the number of variables, and can scale to very large temporal datasets that are not tractable with existing methods; (b) gives state-of-the-art results in highly undersampled regimes on both synthetic and real-world datasets; and (c) makes minimal assumptions about the character of the dynamics of the system. T-CorEx optimizes an information-theoretic objective function to learn a latent factor graphical model for each time period and applies two regularization techniques to induce temporal consistency of estimates. We perform extensive evaluation of T-Corex using both synthetic and real-world data and demonstrate that it can be used for detecting sudden changes in the underlying covariance matrix, capturing transient correlations and analyzing extremely high-dimensional complex multivariate time series such as high-resolution fMRI data. |
Tasks | Time Series |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13276v1 |
https://arxiv.org/pdf/1905.13276v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-covariance-estimation-from-temporal |
Repo | https://github.com/harhro94/T-CorEx |
Framework | pytorch |
Neumann Networks for Inverse Problems in Imaging
Title | Neumann Networks for Inverse Problems in Imaging |
Authors | Davis Gilton, Greg Ongie, Rebecca Willett |
Abstract | Many challenging image processing tasks can be described by an ill-posed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Traditional inverse problem solvers minimize a cost function consisting of a data-fit term, which measures how well an image matches the observations, and a regularizer, which reflects prior knowledge and promotes images with desirable properties like smoothness. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional regularizers. We present an end-to-end, data-driven method of solving inverse problems inspired by the Neumann series, which we call a Neumann network. Rather than unroll an iterative optimization algorithm, we truncate a Neumann series which directly solves the linear inverse problem with a data-driven nonlinear regularizer. The Neumann network architecture outperforms traditional inverse problem solution methods, model-free deep learning approaches, and state-of-the-art unrolled iterative methods on standard datasets. Finally, when the images belong to a union of subspaces and under appropriate assumptions on the forward model, we prove there exists a Neumann network configuration that well-approximates the optimal oracle estimator for the inverse problem and demonstrate empirically that the trained Neumann network has the form predicted by theory. |
Tasks | Deblurring |
Published | 2019-01-13 |
URL | https://arxiv.org/abs/1901.03707v2 |
https://arxiv.org/pdf/1901.03707v2.pdf | |
PWC | https://paperswithcode.com/paper/neumann-networks-for-inverse-problems-in |
Repo | https://github.com/dgilton/neumann_networks_code |
Framework | tf |
The Regression Tsetlin Machine: A Tsetlin Machine for Continuous Output Problems
Title | The Regression Tsetlin Machine: A Tsetlin Machine for Continuous Output Problems |
Authors | K. Darshana Abeyrathna, Ole-Christoffer Granmo, Lei Jiao, Morten Goodwin |
Abstract | The recently introduced Tsetlin Machine (TM) has provided competitive pattern classification accuracy in several benchmarks, composing patterns with easy-to-interpret conjunctive clauses in propositional logic. In this paper, we go beyond pattern classification by introducing a new type of TMs, namely, the Regression Tsetlin Machine (RTM). In all brevity, we modify the inner inference mechanism of the TM so that input patterns are transformed into a single continuous output, rather than to distinct categories. We achieve this by: (1) using the conjunctive clauses of the TM to capture arbitrarily complex patterns; (2) mapping these patterns to a continuous output through a novel voting and normalization mechanism; and (3) employing a feedback scheme that updates the TM clauses to minimize the regression error. The feedback scheme uses a new activation probability function that stabilizes the updating of clauses, while the overall system converges towards an accurate input-output mapping. The performance of the RTM is evaluated using six different artificial datasets with and without noise, in comparison with the Classic Tsetlin Machine (CTM) and the Multiclass Tsetlin Machine (MTM). Our empirical results indicate that the RTM obtains the best training and testing results for both noisy and noise-free datasets, with a smaller number of clauses. This, in turn, translates to higher regression accuracy, using significantly less computational resources. |
Tasks | |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04206v2 |
https://arxiv.org/pdf/1905.04206v2.pdf | |
PWC | https://paperswithcode.com/paper/the-regression-tsetlin-machine-a-tsetlin |
Repo | https://github.com/cair/regression-tsetlin-machine |
Framework | none |
Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs
Title | Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs |
Authors | Angela Fan, Claire Gardent, Chloe Braud, Antoine Bordes |
Abstract | Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved text portions. |
Tasks | Document Summarization, graph construction, Multi-Document Summarization, Question Answering |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08435v1 |
https://arxiv.org/pdf/1910.08435v1.pdf | |
PWC | https://paperswithcode.com/paper/using-local-knowledge-graph-construction-to |
Repo | https://github.com/denisewong1/ASX300 |
Framework | tf |
Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes
Title | Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes |
Authors | Amit Moscovich, Amit Halevi, Joakim andén, Amit Singer |
Abstract | Single-particle electron cryomicroscopy is an essential tool for high-resolution 3D reconstruction of proteins and other biological macromolecules. An important challenge in cryo-EM is the reconstruction of non-rigid molecules with parts that move and deform. Traditional reconstruction methods fail in these cases, resulting in smeared reconstructions of the moving parts. This poses a major obstacle for structural biologists, who need high-resolution reconstructions of entire macromolecules, moving parts included. To address this challenge, we present a new method for the reconstruction of macromolecules exhibiting continuous heterogeneity. The proposed method uses projection images from multiple viewing directions to construct a graph Laplacian through which the manifold of three-dimensional conformations is analyzed. The 3D molecular structures are then expanded in a basis of Laplacian eigenvectors, using a novel generalized tomographic reconstruction algorithm to compute the expansion coefficients. These coefficients, which we name spectral volumes, provide a high-resolution visualization of the molecular dynamics. We provide a theoretical analysis and evaluate the method empirically on several simulated data sets. |
Tasks | 3D Reconstruction |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.01898v2 |
https://arxiv.org/pdf/1907.01898v2.pdf | |
PWC | https://paperswithcode.com/paper/cryo-em-reconstruction-of-continuous |
Repo | https://github.com/PrincetonUniversity/specvols |
Framework | none |
Bounding Singular Values of Convolution Layers
Title | Bounding Singular Values of Convolution Layers |
Authors | Sahil Singla, Soheil Feizi |
Abstract | In deep neural networks, the spectral norm of the Jacobian of a layer bounds the factor by which the norm of a signal changes during forward or backward propagation. Spectral norm regularization has also been shown to improve the generalization and robustness of deep networks. However, existing methods to compute the spectral norm of the jacobian of convolution layers either rely on heuristics (but are efficient in computation) or are exact (but computationally expensive to be used during training). In this work, we resolve these issues by deriving an upper bound on the spectral norm of a standard 2D multi-channel convolution layer. Our method provides a provable bound that is differentiable and can be computed efficiently during training with negligible overhead. We show that our spectral bound is an effective regularizer and can be used to bound the lipschitz constant and the curvature (eigenvalues of the Hessian) of neural network. Through experiments on MNIST and CIFAR-10, we demonstrate the effectiveness of our spectral bound in improving the generalization and provable robustness of deep networks against adversarial examples. Our code is available at \url{https://github.com/singlasahil14/CONV-SV}. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10258v1 |
https://arxiv.org/pdf/1911.10258v1.pdf | |
PWC | https://paperswithcode.com/paper/bounding-singular-values-of-convolution |
Repo | https://github.com/singlasahil14/CONV-SV |
Framework | pytorch |
Pragmatically Informative Text Generation
Title | Pragmatically Informative Text Generation |
Authors | Sheng Shen, Daniel Fried, Jacob Andreas, Dan Klein |
Abstract | We improve the informativeness of models for conditional text generation using techniques from computational pragmatics. These techniques formulate language production as a game between speakers and listeners, in which a speaker should generate output text that a listener can use to correctly identify the original input that the text describes. While such approaches are widely used in cognitive science and grounded language learning, they have received less attention for more standard language generation tasks. We consider two pragmatic modeling methods for text generation: one where pragmatics is imposed by information preservation, and another where pragmatics is imposed by explicit modeling of distractors. We find that these methods improve the performance of strong existing systems for abstractive summarization and generation from structured meaning representations. |
Tasks | Abstractive Text Summarization, Data-to-Text Generation, Text Generation |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01301v2 |
http://arxiv.org/pdf/1904.01301v2.pdf | |
PWC | https://paperswithcode.com/paper/pragmatically-informative-text-generation |
Repo | https://github.com/reallygooday/60daysofudacity |
Framework | pytorch |
Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery
Title | Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery |
Authors | Jiachen Yang, Igor Borovikov, Hongyuan Zha |
Abstract | Human players in professional team sports achieve high level coordination by dynamically choosing complementary skills and executing primitive actions to perform these skills. As a step toward creating intelligent agents with this capability for fully cooperative multi-agent settings, we propose a two-level hierarchical multi-agent reinforcement learning (MARL) algorithm with unsupervised skill discovery. Agents learn useful and distinct skills at the low level via independent Q-learning, while they learn to select complementary latent skill variables at the high level via centralized multi-agent training with an extrinsic team reward. The set of low-level skills emerges from an intrinsic reward that solely promotes the decodability of latent skill variables from the trajectory of a low-level skill, without the need for hand-crafted rewards for each skill. For scalable decentralized execution, each agent independently chooses latent skill variables and primitive actions based on local observations. Our overall method enables the use of general cooperative MARL algorithms for training high level policies and single-agent RL for training low level skills. Experiments on a stochastic high dimensional team game show the emergence of useful skills and cooperative team play. The interpretability of the learned skills show the promise of the proposed method for achieving human-AI cooperation in team sports games. |
Tasks | Multi-agent Reinforcement Learning, Q-Learning |
Published | 2019-12-07 |
URL | https://arxiv.org/abs/1912.03558v2 |
https://arxiv.org/pdf/1912.03558v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-cooperative-multi-agent |
Repo | https://github.com/011235813/hierarchical-marl |
Framework | tf |
Sparse and noisy LiDAR completion with RGB guidance and uncertainty
Title | Sparse and noisy LiDAR completion with RGB guidance and uncertainty |
Authors | Wouter Van Gansbeke, Davy Neven, Bert De Brabandere, Luc Van Gool |
Abstract | This work proposes a new method to accurately complete sparse LiDAR maps guided by RGB images. For autonomous vehicles and robotics the use of LiDAR is indispensable in order to achieve precise depth predictions. A multitude of applications depend on the awareness of their surroundings, and use depth cues to reason and react accordingly. On the one hand, monocular depth prediction methods fail to generate absolute and precise depth maps. On the other hand, stereoscopic approaches are still significantly outperformed by LiDAR based approaches. The goal of the depth completion task is to generate dense depth predictions from sparse and irregular point clouds which are mapped to a 2D plane. We propose a new framework which extracts both global and local information in order to produce proper depth maps. We argue that simple depth completion does not require a deep network. However, we additionally propose a fusion method with RGB guidance from a monocular camera in order to leverage object information and to correct mistakes in the sparse input. This improves the accuracy significantly. Moreover, confidence masks are exploited in order to take into account the uncertainty in the depth predictions from each modality. This fusion method outperforms the state-of-the-art and ranks first on the KITTI depth completion benchmark. Our code with visualizations is available. |
Tasks | Autonomous Vehicles, Depth Completion, Depth Estimation |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05356v1 |
http://arxiv.org/pdf/1902.05356v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-and-noisy-lidar-completion-with-rgb |
Repo | https://github.com/wvangansbeke/Sparse-Depth-Completion |
Framework | pytorch |
COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning
Title | COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning |
Authors | Wenxiao Wang, Cong Fu, Jishun Guo, Deng Cai, Xiaofei He |
Abstract | Neural network compression empowers the effective yet unwieldy deep convolutional neural networks (CNN) to be deployed in resource-constrained scenarios. Most state-of-the-art approaches prune the model in filter-level according to the “importance” of filters. Despite their success, we notice they suffer from at least two of the following problems: 1) The redundancy among filters is not considered because the importance is evaluated independently. 2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer. Consequently, we must manually specify layer-wise pruning ratios. 3) They are prone to generate sub-optimal solutions because they neglect the inequality between reducing parameters and reducing computational cost. Reducing the same number of parameters in different positions in the network may reduce different computational cost. To address the above problems, we develop a novel algorithm named as COP (correlation-based pruning), which can detect the redundant filters efficiently. We enable the cross-layer filter comparison through global normalization. We add parameter-quantity and computational-cost regularization terms to the importance, which enables the users to customize the compression according to their preference (smaller or faster). Extensive experiments have shown COP outperforms the others significantly. The code is released at https://github.com/ZJULearning/COP. |
Tasks | Model Compression, Neural Network Compression |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10337v1 |
https://arxiv.org/pdf/1906.10337v1.pdf | |
PWC | https://paperswithcode.com/paper/cop-customized-deep-model-compression-via |
Repo | https://github.com/ZJULearning/COP |
Framework | tf |
A Dilated Inception Network for Visual Saliency Prediction
Title | A Dilated Inception Network for Visual Saliency Prediction |
Authors | Sheng Yang, Guosheng Lin, Qiuping Jiang, Weisi Lin |
Abstract | Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we proposed an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a probability distribution prediction task for global saliency inference instead of a typical pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time. |
Tasks | Saliency Prediction |
Published | 2019-04-07 |
URL | https://arxiv.org/abs/1904.03571v2 |
https://arxiv.org/pdf/1904.03571v2.pdf | |
PWC | https://paperswithcode.com/paper/a-dilated-inception-network-for-visual |
Repo | https://github.com/ysyscool/DINet |
Framework | tf |
Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition
Title | Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition |
Authors | Devraj Mandal, Sanath Narayan, Saikumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao |
Abstract | Generalized zero-shot action recognition is a challenging problem, where the task is to recognize new action categories that are unavailable during the training stage, in addition to the seen action categories. Existing approaches suffer from the inherent bias of the learned classifier towards the seen action categories. As a consequence, unseen category samples are incorrectly classified as belonging to one of the seen action categories. In this paper, we set out to tackle this issue by arguing for a separate treatment of seen and unseen action categories in generalized zero-shot action recognition. We introduce an out-of-distribution detector that determines whether the video features belong to a seen or unseen action category. To train our out-of-distribution detector, video features for unseen action categories are synthesized using generative adversarial networks trained on seen action category features. To the best of our knowledge, we are the first to propose an out-of-distribution detector based GZSL framework for action recognition in videos. Experiments are performed on three action recognition datasets: Olympic Sports, HMDB51 and UCF101. For generalized zero-shot action recognition, our proposed approach outperforms the baseline (f-CLSWGAN) with absolute gains (in classification accuracy) of 7.0%, 3.4%, and 4.9%, respectively, on these datasets. |
Tasks | Action Recognition In Videos, Out-of-Distribution Detection, Temporal Action Localization |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08703v2 |
https://arxiv.org/pdf/1904.08703v2.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-for-generalized |
Repo | https://github.com/naraysa/gzsl-od |
Framework | pytorch |
Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision
Title | Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision |
Authors | Fredrik K. Gustafsson, Martin Danelljan, Thomas B. Schön |
Abstract | While Deep Neural Networks (DNNs) have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, e.g. in automotive applications. In Bayesian deep learning, predictive uncertainty is often decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a DNN output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl. |
Tasks | Depth Completion, Semantic Segmentation |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01620v2 |
https://arxiv.org/pdf/1906.01620v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-scalable-bayesian-deep-learning |
Repo | https://github.com/fregu856/evaluating_bdl |
Framework | pytorch |