February 1, 2020

3099 words 15 mins read

Paper Group AWR 318

Paper Group AWR 318

Joint Super-Resolution and Alignment of Tiny Faces. FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints. Improving Polyphonic Music Models with Feature-Rich Encoding. Neural-encoding Human Experts’ Domain Knowledge to Warm Start Reinforcement Learning. Lightweight Feature Fusion Network for Single Image …

Joint Super-Resolution and Alignment of Tiny Faces

Title Joint Super-Resolution and Alignment of Tiny Faces
Authors Yu Yin, Joseph P. Robinson, Yulun Zhang, Yun Fu
Abstract Super-resolution (SR) and landmark localization of tiny faces are highly correlated tasks. On the one hand, landmark localization could obtain higher accuracy with faces of high-resolution (HR). On the other hand, face SR would benefit from prior knowledge of facial attributes such as landmarks. Thus, we propose a joint alignment and SR network to simultaneously detect facial landmarks and super-resolve tiny faces. More specifically, a shared deep encoder is applied to extract features for both tasks by leveraging complementary information. To exploit the representative power of the hierarchical encoder, intermediate layers of a shared feature extraction module are fused to form efficient feature representations. The fused features are then fed to task-specific modules to detect landmarks and super-resolve face images in parallel. Extensive experiments demonstrate that the proposed model significantly outperforms the state-of-the-art in both landmark localization and SR of faces. We show a large improvement for landmark localization of tiny faces (i.e., 1616). Furthermore, the proposed framework yields comparable results for landmark localization on low-resolution (LR) faces (i.e., 6464) to existing methods on HR (i.e., 256*256). As for SR, the proposed method recovers sharper edges and more details from LR face images than other state-of-the-art methods, which we demonstrate qualitatively and quantitatively.
Tasks Super-Resolution
Published 2019-11-19
URL https://arxiv.org/abs/1911.08566v1
PDF https://arxiv.org/pdf/1911.08566v1.pdf
PWC https://paperswithcode.com/paper/joint-super-resolution-and-alignment-of-tiny
Repo https://github.com/YuYin1/JASRNet
Framework none

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

Title FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints
Authors Anastasios Kyrillidis, Anshumali Shrivastava, Moshe Y. Vardi, Zhiwei Zhang
Abstract The Boolean SATisfiability problem (SAT) is of central importance in computer science. Although SAT is known to be NP-complete, progress on the engineering side, especially that of Conflict-Driven Clause Learning (CDCL) and Local Search SAT solvers, has been remarkable. Yet, while SAT solvers aimed at solving industrial-scale benchmarks in Conjunctive Normal Form (CNF) have become quite mature, SAT solvers that are effective on other types of constraints, e.g., cardinality constraints and XORs, are less well studied; a general approach to handling non-CNF constraints is still lacking. In addition, previous work indicated that for specific classes of benchmarks, the running time of extant SAT solvers depends heavily on properties of the formula and details of encoding, instead of the scale of the benchmarks, which adds uncertainty to expectations of running time. To address the issues above, we design FourierSAT, an incomplete SAT solver based on Fourier analysis of Boolean functions, a technique to represent Boolean functions by multilinear polynomials. By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints. The idea is to leverage gradient information to guide the search process in the direction of local improvements. Empirical results demonstrate that FourierSAT is more robust than other solvers on certain classes of benchmarks.
Tasks
Published 2019-12-02
URL https://arxiv.org/abs/1912.01032v2
PDF https://arxiv.org/pdf/1912.01032v2.pdf
PWC https://paperswithcode.com/paper/fouriersat-a-fourier-expansion-based
Repo https://github.com/vardigroup/FourierSAT
Framework none

Improving Polyphonic Music Models with Feature-Rich Encoding

Title Improving Polyphonic Music Models with Feature-Rich Encoding
Authors Omar Peracha
Abstract This paper explores sequential modeling of polyphonic music with deep neural networks. While recent breakthroughs have focussed on network architecture, we demonstrate that the representation of the sequence can make an equally significant contribution to the performance of the model as measured by validation set loss. By extracting salient features inherent to the dataset, the model can either be conditioned on these features or trained to predict said features as extra components of the sequences being modeled. We show that training a neural network to predict a seemingly more complex sequence, with extra features included in the series being modeled, can improve overall model performance significantly. We first introduce TonicNet, a GRU-based model trained to initially predict the chord at a given time-step before then predicting the notes of each voice at that time-step, in contrast with the typical approach of predicting only the notes. We then evaluate TonicNet on the canonical JSB Chorales dataset and obtain state-of-the-art results.
Tasks Music Generation, Music Modeling
Published 2019-11-26
URL https://arxiv.org/abs/1911.11775v1
PDF https://arxiv.org/pdf/1911.11775v1.pdf
PWC https://paperswithcode.com/paper/improving-polyphonic-music-models-with
Repo https://github.com/omarperacha/TonicNet
Framework pytorch

Neural-encoding Human Experts’ Domain Knowledge to Warm Start Reinforcement Learning

Title Neural-encoding Human Experts’ Domain Knowledge to Warm Start Reinforcement Learning
Authors Andrew Silva, Matthew Gombolay
Abstract Deep reinforcement learning has seen great success across a breadth of tasks, such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily available knowledge from domain experts that could help “warm start” the learning process. Further, learning from demonstration techniques are not yet efficient enough to infer this knowledge through sampling-based mechanisms in large state and action spaces. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions amenable to gradient descent and show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of reinforcement learning challenges. We further conduct a user study to solicit expert policies from a variety of humans and find that humans are able to specify policies that provide a higher quality reward both before and after training relative to baseline methods, demonstrating the utility of our approach.
Tasks Imitation Learning
Published 2019-02-15
URL https://arxiv.org/abs/1902.06007v3
PDF https://arxiv.org/pdf/1902.06007v3.pdf
PWC https://paperswithcode.com/paper/prolonets-neural-encoding-human-experts
Repo https://github.com/andrewsilva9/ProLoNets
Framework pytorch

Lightweight Feature Fusion Network for Single Image Super-Resolution

Title Lightweight Feature Fusion Network for Single Image Super-Resolution
Authors Wenming Yang, Wei Wang, Xuechen Zhang, Shuifa Sun, Qingmin Liao
Abstract Single image super-resolution(SISR) has witnessed great progress as convolutional neural network(CNN) gets deeper and wider. However, enormous parameters hinder its application to real world problems. In this letter, We propose a lightweight feature fusion network (LFFN) that can fully explore multi-scale contextual information and greatly reduce network parameters while maximizing SISR results. LFFN is built on spindle blocks and a softmax feature fusion module (SFFM). Specifically, a spindle block is composed of a dimension extension unit, a feature exploration unit and a feature refinement unit. The dimension extension layer expands low dimension to high dimension and implicitly learns the feature maps which is suitable for the next unit. The feature exploration unit performs linear and nonlinear feature exploration aimed at different feature maps. The feature refinement layer is used to fuse and refine features. SFFM fuses the features from different modules in a self-adaptive learning manner with softmax function, making full use of hierarchical information with a small amount of parameter cost. Both qualitative and quantitative experiments on benchmark datasets show that LFFN achieves favorable performance against state-of-the-art methods with similar parameters.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-02-15
URL http://arxiv.org/abs/1902.05694v2
PDF http://arxiv.org/pdf/1902.05694v2.pdf
PWC https://paperswithcode.com/paper/lightweight-feature-fusion-network-for-single
Repo https://github.com/qibao77/LFFN
Framework tf

Using Multi-Sense Vector Embeddings for Reverse Dictionaries

Title Using Multi-Sense Vector Embeddings for Reverse Dictionaries
Authors Michael A. Hedderich, Andrew Yates, Dietrich Klakow, Gerard de Melo
Abstract Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.
Tasks
Published 2019-04-02
URL http://arxiv.org/abs/1904.01451v1
PDF http://arxiv.org/pdf/1904.01451v1.pdf
PWC https://paperswithcode.com/paper/using-multi-sense-vector-embeddings-for
Repo https://github.com/uds-lsv/Multi-Sense-Embeddings-Reverse-Dictionaries
Framework none

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

Title Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
Authors Hongyin Luo, Lan Jiang, Yonatan Belinkov, James Glass
Abstract Common language models typically predict the next word given the context. In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase. The model does not require any linguistic annotation of phrase segmentation. Instead, we define syntactic heights and phrase segmentation rules, enabling the model to automatically induce phrases, recognize their task-specific heads, and generate phrase embeddings in an unsupervised learning manner. Our method can easily be applied to language models with different network architectures since an independent module is used for phrase induction and context-phrase alignment, and no change is required in the underlying language modeling network. Experiments have shown that our model outperformed several strong baseline models on different data sets. We achieved a new state-of-the-art performance of 17.4 perplexity on the Wikitext-103 dataset. Additionally, visualizing the outputs of the phrase induction module showed that our model is able to learn approximate phrase-level structural knowledge without any annotation.
Tasks Language Modelling
Published 2019-06-04
URL https://arxiv.org/abs/1906.01702v1
PDF https://arxiv.org/pdf/1906.01702v1.pdf
PWC https://paperswithcode.com/paper/improving-neural-language-models-by
Repo https://github.com/luohongyin/PILM
Framework pytorch

Multi-Task Attention-Based Semi-Supervised Learning for Medical Image Segmentation

Title Multi-Task Attention-Based Semi-Supervised Learning for Medical Image Segmentation
Authors Shuai Chen, Gerda Bortsova, Antonio Garcia-Uceda Juarez, Gijs van Tulder, Marleen de Bruijne
Abstract We propose a novel semi-supervised image segmentation method that simultaneously optimizes a supervised segmentation and an unsupervised reconstruction objectives. The reconstruction objective uses an attention mechanism that separates the reconstruction of image areas corresponding to different classes. The proposed approach was evaluated on two applications: brain tumor and white matter hyperintensities segmentation. Our method, trained on unlabeled and a small number of labeled images, outperformed supervised CNNs trained with the same number of images and CNNs pre-trained on unlabeled data. In ablation experiments, we observed that the proposed attention mechanism substantially improves segmentation performance. We explore two multi-task training strategies: joint training and alternating training. Alternating training requires fewer hyperparameters and achieves a better, more stable performance than joint training. Finally, we analyze the features learned by different methods and find that the attention mechanism helps to learn more discriminative features in the deeper layers of encoders.
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2019-07-29
URL https://arxiv.org/abs/1907.12303v1
PDF https://arxiv.org/pdf/1907.12303v1.pdf
PWC https://paperswithcode.com/paper/multi-task-attention-based-semi-supervised
Repo https://github.com/ShuaiChenBIGR/MASSL-segmentation-framework
Framework pytorch

Data-to-text Generation with Entity Modeling

Title Data-to-text Generation with Entity Modeling
Authors Ratish Puduppully, Li Dong, Mirella Lapata
Abstract Recent approaches to data-to-text generation have shown great promise thanks to the use of large-scale datasets and the application of neural network architectures which are trained end-to-end. These models rely on representation learning to select content appropriately, structure it coherently, and verbalize it grammatically, treating entities as nothing more than vocabulary tokens. In this work we propose an entity-centric neural architecture for data-to-text generation. Our model creates entity-specific representations which are dynamically updated. Text is generated conditioned on the data input and entity memory representations using hierarchical attention at each time step. We present experiments on the RotoWire benchmark and a (five times larger) new dataset on the baseball domain which we create. Our results show that the proposed model outperforms competitive baselines in automatic and human evaluation.
Tasks Data-to-Text Generation, Representation Learning, Text Generation
Published 2019-06-07
URL https://arxiv.org/abs/1906.03221v1
PDF https://arxiv.org/pdf/1906.03221v1.pdf
PWC https://paperswithcode.com/paper/data-to-text-generation-with-entity-modeling
Repo https://github.com/ratishsp/data2text-entity-py
Framework pytorch

Conformalized Quantile Regression

Title Conformalized Quantile Regression
Authors Yaniv Romano, Evan Patterson, Emmanuel J. Candès
Abstract Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to heteroscedasticity. It combines conformal prediction with classical quantile regression, inheriting the advantages of both. We establish a theoretical guarantee of valid coverage, supplemented by extensive experiments on popular regression datasets. We compare the efficiency of conformalized quantile regression to other conformal methods, showing that our method tends to produce shorter intervals.
Tasks
Published 2019-05-08
URL https://arxiv.org/abs/1905.03222v1
PDF https://arxiv.org/pdf/1905.03222v1.pdf
PWC https://paperswithcode.com/paper/conformalized-quantile-regression
Repo https://github.com/yromano/cqr
Framework pytorch

MimickNet, Matching Clinical Post-Processing Under Realistic Black-Box Constraints

Title MimickNet, Matching Clinical Post-Processing Under Realistic Black-Box Constraints
Authors Ouwen Huang, Will Long, Nick Bottenus, Gregg E. Trahey, Sina Farsiu, Mark L. Palmeri
Abstract Image post-processing is used in clinical-grade ultrasound scanners to improve image quality (e.g., reduce speckle noise and enhance contrast). These post-processing techniques vary across manufacturers and are generally kept proprietary, which presents a challenge for researchers looking to match current clinical-grade workflows. We introduce a deep learning framework, MimickNet, that transforms raw conventional delay-and-summed (DAS) beams into the approximate post-processed images found on clinical-grade scanners. Training MimickNet only requires post-processed image samples from a scanner of interest without the need for explicit pairing to raw DAS data. This flexibility allows it to hypothetically approximate any manufacturer’s post-processing without access to the pre-processed data. MimickNet generates images with an average similarity index measurement (SSIM) of 0.930$\pm$0.0892 on a 300 cineloop test set, and it generalizes to cardiac cineloops outside of our train-test distribution achieving an SSIM of 0.967$\pm$0.002. We also explore the theoretical SSIM achievable by evaluating MimickNet performance when trained under gray-box constraints (i.e., when both pre-processed and post-processed images are available). To our knowledge, this is the first work to establish deep learning models that closely approximate current clinical-grade ultrasound post-processing under realistic black-box constraints where before and after post-processing data is unavailable. MimickNet serves as a clinical post-processing baseline for future works in ultrasound image formation to compare against. To this end, we have made the MimickNet software open source.
Tasks
Published 2019-08-15
URL https://arxiv.org/abs/1908.05782v1
PDF https://arxiv.org/pdf/1908.05782v1.pdf
PWC https://paperswithcode.com/paper/mimicknet-matching-clinical-post-processing
Repo https://github.com/ouwen/mimicknet
Framework tf

Machine learning method for single trajectory characterization

Title Machine learning method for single trajectory characterization
Authors Gorka Muñoz-Gil, Miguel Angel Garcia-March, Carlo Manzo, José D. Martín-Guerrero, Maciej Lewenstein
Abstract In order to study transport in complex environments, it is extremely important to determine the physical mechanism underlying diffusion, and precisely characterize its nature and parameters. Often, this task is strongly impacted by data consisting of trajectories with short length and limited localization precision. In this paper, we propose a machine learning method based on a random forest architecture, which is able to associate even very short trajectories to the underlying diffusion mechanism with a high accuracy. In addition, the method is able to classify the motion according to normal or anomalous diffusion, and determine its anomalous exponent with a small error. The method provides highly accurate outputs even when working with very short trajectories and in the presence of experimental noise. We further demonstrate the application of transfer learning to experimental and simulated data not included in the training/testing dataset. This allows for a full, high-accuracy characterization of experimental trajectories without the need of any prior information.
Tasks Transfer Learning
Published 2019-03-07
URL https://arxiv.org/abs/1903.02850v2
PDF https://arxiv.org/pdf/1903.02850v2.pdf
PWC https://paperswithcode.com/paper/machine-learning-method-for-single-trajectory
Repo https://github.com/gorkamunoz/RF-Single-Trajectory-Characterization
Framework none

Deep Sparse Representation-based Classification

Title Deep Sparse Representation-based Classification
Authors Mahdi Abavisani, Vishal M. Patel
Abstract We present a transductive deep learning-based formulation for the sparse representation-based classification (SRC) method. The proposed network consists of a convolutional autoencoder along with a fully-connected layer. The role of the autoencoder network is to learn robust deep features for classification. On the other hand, the fully-connected layer, which is placed in between the encoder and the decoder networks, is responsible for finding the sparse representation. The estimated sparse codes are then used for classification. Various experiments on three different datasets show that the proposed network leads to sparse representations that give better classification results than state-of-the-art SRC methods. The source code is available at: github.com/mahdiabavisani/DSRC.
Tasks Image Classification, Semi-Supervised Image Classification, Sparse Representation-based Classification
Published 2019-04-24
URL http://arxiv.org/abs/1904.11093v1
PDF http://arxiv.org/pdf/1904.11093v1.pdf
PWC https://paperswithcode.com/paper/190411093
Repo https://github.com/mahdiabavisani/DSRC
Framework tf

ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Title ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
Authors Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan
Abstract Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.
Tasks
Published 2019-04-08
URL http://arxiv.org/abs/1904.04084v1
PDF http://arxiv.org/pdf/1904.04084v1.pdf
PWC https://paperswithcode.com/paper/contextdesc-local-descriptor-augmentation
Repo https://github.com/lzx551402/contextdesc
Framework tf

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

Title Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly
Authors Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song
Abstract Is it possible to learn policies for robotic assembly that can generalize to new objects? We explore this idea in the context of the kit assembly task. Since classic methods rely heavily on object pose estimation, they often struggle to generalize to new objects without 3D CAD models or task-specific training data. In this work, we propose to formulate the kit assembly task as a shape matching problem, where the goal is to learn a shape descriptor that establishes geometric correspondences between object surfaces and their target placement locations from visual input. This formulation enables the model to acquire a broader understanding of how shapes and surfaces fit together for assembly – allowing it to generalize to new objects and kits. To obtain training data for our model, we present a self-supervised data-collection pipeline that obtains ground truth object-to-placement correspondences by disassembling complete kits. Our resulting real-world system, Form2Fit, learns effective pick and place strategies for assembling objects into a variety of kits – achieving $90%$ average success rates under different initial conditions (e.g. varying object and kit poses), $94%$ success under new configurations of multiple kits, and over $86%$ success with completely new objects and kits.
Tasks Pose Estimation
Published 2019-10-30
URL https://arxiv.org/abs/1910.13675v1
PDF https://arxiv.org/pdf/1910.13675v1.pdf
PWC https://paperswithcode.com/paper/form2fit-learning-shape-priors-for
Repo https://github.com/kevinzakka/form2fit
Framework pytorch
comments powered by Disqus