Paper Group ANR 666
Cross-task pre-training for acoustic scene classification. Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation. Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation. Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation. Clustering wit …
Cross-task pre-training for acoustic scene classification
Title | Cross-task pre-training for acoustic scene classification |
Authors | Ruixiong Zhang, Wei Zou, Xiangang Li |
Abstract | Acoustic scene classification(ASC) and acoustic event detection(AED) are different but related tasks. Acoustic scenes can be shaped by occurred acoustic events which can provide useful information in training ASC tasks. However, most of the datasets are provided without either the acoustic event or scene labels. Therefore, We explored cross-task pre-training mechanism to utilize acoustic event information extracted from the pre-trained model to optimize the ASC task. We present three cross-task pre-training architectures and evaluated them in feature-based and fine-tuning strategies on two datasets respectively: TAU Urban Acoustic Scenes 2019 dataset and TUT Acoustic Scenes 2017 dataset. Results have shown that cross-task pre-training mechanism can significantly improve the performance of ASC tasks and the performance of our best model improved relatively 9.5% in the TAU Urban Acoustic Scenes 2019 dataset, and also improved 10% in the TUT Acoustic Scenes 2017 dataset compared with the official baseline. |
Tasks | Acoustic Scene Classification, Scene Classification |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09935v1 |
https://arxiv.org/pdf/1910.09935v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-task-pre-training-for-acoustic-scene |
Repo | |
Framework | |
Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation
Title | Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation |
Authors | Ruirui Li, Mingming Li, Jiacheng Li, Yating Zhou |
Abstract | We develop a connection sensitive attention U-Net(CSAU) for accurate retinal vessel segmentation. This method improves the recent attention U-Net for semantic segmentation with four key improvements: (1) connection sensitive loss that models the structure properties to improve the accuracy of pixel-wise segmentation; (2) attention gate with novel neural network structure and concatenating DOWN-Link to effectively learn better attention weights on fine vessels; (3) integration of connection sensitive loss and attention gate to further improve the accuracy on detailed vessels by additionally concatenating attention weights to features before output; (4) metrics of connection sensitive accuracy to reflect the segmentation performance on boundaries and thin vessels. Our method can effectively improve state-of-the-art vessel segmentation methods that suffer from difficulties in presence of abnormalities, bifurcation and microvascular. This connection sensitive loss tightly integrates with the proposed attention U-Net to accurately (i) segment retinal vessels, and (ii) reserve the connectivity of thin vessels by modeling the structural properties. Our method achieves the leading position on DRIVE, STARE and HRF datasets among the state-of-the-art methods. |
Tasks | Retinal Vessel Segmentation, Semantic Segmentation |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05558v2 |
http://arxiv.org/pdf/1903.05558v2.pdf | |
PWC | https://paperswithcode.com/paper/connection-sensitive-attention-u-net-for |
Repo | |
Framework | |
Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation
Title | Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation |
Authors | Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann |
Abstract | The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.10853v1 |
https://arxiv.org/pdf/1910.10853v1.pdf | |
PWC | https://paperswithcode.com/paper/circulant-binary-convolutional-networks-1 |
Repo | |
Framework | |
Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation
Title | Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation |
Authors | Alberto Poncelas, Gideon Maillette de Buy Wenniger, Andy Way |
Abstract | Machine Translation models are trained to translate a variety of documents from one language into another. However, models specifically trained for a particular characteristics of the documents tend to perform better. Fine-tuning is a technique for adapting an NMT model to some domain. In this work, we want to use this technique to adapt the model to a given test set. In particular, we are using transductive data selection algorithms which take advantage the information of the test set to retrieve sentences from a larger parallel set. In cases where the model is available at translation time (when the test set is provided), it can be adapted with a small subset of data, thereby achieving better performance than a generic model or a domain-adapted model. |
Tasks | Machine Translation |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09532v3 |
https://arxiv.org/pdf/1908.09532v3.pdf | |
PWC | https://paperswithcode.com/paper/transductive-data-selection-algorithms-for |
Repo | |
Framework | |
Clustering with Jointly Learned Nonlinear Transforms Over Discriminating Min-Max Similarity/Dissimilarity Assignment
Title | Clustering with Jointly Learned Nonlinear Transforms Over Discriminating Min-Max Similarity/Dissimilarity Assignment |
Authors | Dimche Kostadinov, Behrooz Razeghi, Taras Holotyak, Slava Voloshynovskiy |
Abstract | This paper presents a novel clustering concept that is based on jointly learned nonlinear transforms (NTs) with priors on the information loss and the discrimination. We introduce a clustering principle that is based on evaluation of a parametric min-max measure for the discriminative prior. The decomposition of the prior measure allows to break down the assignment into two steps. In the first step, we apply NTs to a data point in order to produce candidate NT representations. In the second step, we preform the actual assignment by evaluating the parametric measure over the candidate NT representations. Numerical experiments on image clustering task validate the potential of the proposed approach. The evaluation shows advantages in comparison to the state-of-the-art clustering methods. |
Tasks | Image Clustering |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10760v1 |
http://arxiv.org/pdf/1901.10760v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-with-jointly-learned-nonlinear |
Repo | |
Framework | |
Computing Committor Functions for the Study of Rare Events Using Deep Learning
Title | Computing Committor Functions for the Study of Rare Events Using Deep Learning |
Authors | Qianxiao Li, Bo Lin, Weiqing Ren |
Abstract | The committor function is a central object of study in understanding transitions between metastable states in complex systems. However, computing the committor function for realistic systems at low temperatures is a challenging task, due to the curse of dimensionality and the scarcity of transition data. In this paper, we introduce a computational approach that overcomes these issues and achieves good performance on complex benchmark problems with rough energy landscapes. The new approach combines deep learning, data sampling and feature engineering techniques. This establishes an alternative practical method for studying rare transition events between metastable states in complex, high dimensional systems. |
Tasks | Feature Engineering |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06285v1 |
https://arxiv.org/pdf/1906.06285v1.pdf | |
PWC | https://paperswithcode.com/paper/computing-committor-functions-for-the-study-1 |
Repo | |
Framework | |
CRDN: Cascaded Residual Dense Networks for Dynamic MR Imaging with Edge-enhanced Loss Constraint
Title | CRDN: Cascaded Residual Dense Networks for Dynamic MR Imaging with Edge-enhanced Loss Constraint |
Authors | Ziwen Ke, Shanshan Wang, Huitao Cheng, Leslie Ying, Qiegen Liu, Hairong Zheng, Dong Liang |
Abstract | Dynamic magnetic resonance (MR) imaging has generated great research interest, as it can provide both spatial and temporal information for clinical diagnosis. However, slow imaging speed or long scanning time is still one of the challenges for dynamic MR imaging. Most existing methods reconstruct Dynamic MR images from incomplete k-space data under the guidance of compressed sensing (CS) or low rank theory, which suffer from long iterative reconstruction time. Recently, deep learning has shown great potential in accelerating dynamic MR. Our previous work proposed a dynamic MR imaging method with both k-space and spatial prior knowledge integrated via multi-supervised network training. Nevertheless, there was still a certain degree of smooth in the reconstructed images at high acceleration factors. In this work, we propose cascaded residual dense networks for dynamic MR imaging with edge-enhance loss constraint, dubbed as CRDN. Specifically, the cascaded residual dense networks fully exploit the hierarchical features from all the convolutional layers with both local and global feature fusion. We further utilize the total variation (TV) loss function, which has the edge enhancement properties, for training the networks. |
Tasks | |
Published | 2019-01-18 |
URL | http://arxiv.org/abs/1901.06111v1 |
http://arxiv.org/pdf/1901.06111v1.pdf | |
PWC | https://paperswithcode.com/paper/crdn-cascaded-residual-dense-networks-for |
Repo | |
Framework | |
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Title | Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS |
Authors | Mutian He, Yan Deng, Lei He |
Abstract | Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-to-sequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00672v3 |
https://arxiv.org/pdf/1906.00672v3.pdf | |
PWC | https://paperswithcode.com/paper/190600672 |
Repo | |
Framework | |
Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles
Title | Bypass Enhancement RGB Stream Model for Pedestrian Action Recognition of Autonomous Vehicles |
Authors | Dong Cao, Lisha Xu |
Abstract | Pedestrian action recognition and intention prediction is one of the core issues in the field of autonomous driving. In this research field, action recognition is one of the key technologies. A large number of scholars have done a lot of work to im-prove the accuracy of the algorithm for the task. However, there are relatively few studies and improvements in the computational complexity of algorithms and sys-tem real-time. In the autonomous driving application scenario, the real-time per-formance and ultra-low latency of the algorithm are extremely important evalua-tion indicators, which are directly related to the availability and safety of the au-tonomous driving system. To this end, we construct a bypass enhanced RGB flow model, which combines the previous two-branch algorithm to extract RGB feature information and optical flow feature information respectively. In the train-ing phase, the two branches are merged by distillation method, and the bypass enhancement is combined in the inference phase to ensure accuracy. The real-time behavior of the behavior recognition algorithm is significantly improved on the premise that the accuracy does not decrease. Experiments confirm the superiority and effectiveness of our algorithm. |
Tasks | Autonomous Driving, Autonomous Vehicles, Optical Flow Estimation |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05674v2 |
https://arxiv.org/pdf/1908.05674v2.pdf | |
PWC | https://paperswithcode.com/paper/bypass-enhancement-rgb-stream-model-for |
Repo | |
Framework | |
mlVIRNET: Multilevel Variational Image Registration Network
Title | mlVIRNET: Multilevel Variational Image Registration Network |
Authors | Alessa Hering, Bram van Ginneken, Stefan Heldmann |
Abstract | We present a novel multilevel approach for deep learning based image registration. Recently published deep learning based registration methods have shown promising results for a wide range of tasks. However, these algorithms are still limited to relatively small deformations. Our method addresses this shortcoming by introducing a multilevel framework, which computes deformation fields on different scales, similar to conventional methods. Thereby, a coarse-level alignment is obtained first, which is subsequently improved on finer levels. We demonstrate our method on the complex task of inhale-to-exhale lung registration. We show that the use of a deep learning multilevel approach leads to significantly better registration results. |
Tasks | Image Registration |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.10084v1 |
https://arxiv.org/pdf/1909.10084v1.pdf | |
PWC | https://paperswithcode.com/paper/190910084 |
Repo | |
Framework | |
A simple approach to design quantum neural networks and its applications to kernel-learning methods
Title | A simple approach to design quantum neural networks and its applications to kernel-learning methods |
Authors | Changpeng Shao |
Abstract | We give an explicit simple method to build quantum neural networks (QNNs) to solve classification problems. Besides the input (state preparation) and output (amplitude estimation), it has one hidden layer which uses a tensor product of $\log M$ two-dimensional rotations to introduce $\log M$ weights. Here $M$ is the number of training samples. We also have an efficient method to prepare the quantum states of the training samples. By the quantum-classical hybrid method or the variational method, the training algorithm of this QNN is easy to accomplish in a quantum computer. The idea is inspired by the kernel methods and the radial basis function (RBF) networks. In turn, the construction of QNN provides new findings in the design of RBF networks. As an application, we introduce a quantum-inspired RBF network, in which the number of weight parameters is $\log M$. Numerical tests indicate that the performance of this neural network in solving classification problems improves when $M$ increases. Since using exponentially fewer parameters, more advanced optimization methods (e.g. Newton’s method) can be used to train this network. Finally, about the convex optimization problem to train support vector machines, we use a similar idea to reduce the number of variables, which equals $M$, to $\log M$. |
Tasks | |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08798v1 |
https://arxiv.org/pdf/1910.08798v1.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-approach-to-design-quantum-neural |
Repo | |
Framework | |
ApproxNet: Content and Contention Aware Video Analytics System for the Edge
Title | ApproxNet: Content and Contention Aware Video Analytics System for the Edge |
Authors | Ran Xu, Jinkyu Koo, Rakesh Kumar, Peter Bai, Subrata Mitra, Ganga Meghanath, Saurabh Bagchi |
Abstract | Videos take lot of time to transport over the network, hence running analytics on live video at the edge devices, right where it was captured has become an important system driver. However these edge devices, e.g., IoT devices, surveillance cameras, AR/VR gadgets are resource constrained. This makes it impossible to run state-of-the-art heavy Deep Neural Networks (DNNs) on them and yet provide low and stable latency under various circumstances, such as, changes in the resource availability on the device, the content characteristics, or requirements from the user. In this paper we introduce ApproxNet, a video analytics system for the edge. It enables novel dynamic approximation techniques to achieve desired inference latency and accuracy trade-off under different system conditions and resource contentions, variations in the complexity of the video contents and user requirements. It achieves this by enabling two approximation knobs within a single DNN model, rather than creating and maintaining an ensemble of models (such as in MCDNN [Mobisys-16]). Ensemble models run into memory issues on the lightweight devices and incur large switching penalties among the models in response to runtime changes. We show that ApproxNet can adapt seamlessly at runtime to video content changes and changes in system dynamics to provide low and stable latency for object detection on a video stream. We compare the accuracy and the latency to ResNet [2015], MCDNN, and MobileNets [Google-2017]. |
Tasks | Object Detection |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1909.02068v2 |
https://arxiv.org/pdf/1909.02068v2.pdf | |
PWC | https://paperswithcode.com/paper/approxnet-content-and-contention-aware-video |
Repo | |
Framework | |
Astroalign: A Python module for astronomical image registration
Title | Astroalign: A Python module for astronomical image registration |
Authors | Martin Beroiz, Juan B. Cabral, Bruno Sanchez |
Abstract | We present an algorithm implemented in the astroalign Python module for image registration in astronomy. Our module does not rely on WCS information and instead matches 3-point asterisms (triangles) on the images to find the most accurate linear transformation between the two. It is especially useful in the context of aligning images prior to stacking or performing difference image analysis. Astroalign can match images of different point-spread functions, seeing, and atmospheric conditions. |
Tasks | Image Registration |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02946v1 |
https://arxiv.org/pdf/1909.02946v1.pdf | |
PWC | https://paperswithcode.com/paper/astroalign-a-python-module-for-astronomical |
Repo | |
Framework | |
Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management
Title | Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management |
Authors | Shipra Agrawal, Randy Jia |
Abstract | We consider a stochastic inventory control problem under censored demands, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near-optimality of a simple class of policies called ``base-stock policies’’ for the underlying Markov Decision Process (MDP), as well as convexity of long run average-cost under those policies. We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound regret of the algorithm when compared to the best base-stock policy. We utilize the convexity properties and a newly derived bound on bias of base-stock policies to establish a connection to stochastic convex bandit optimization. Our main contribution is a learning algorithm with a regret bound of $\tilde{O}(L\sqrt{T}+D)$ for the inventory control problem. Here $L$ is the fixed and known lead time, and $D$ is an unknown parameter of the demand distribution described roughly as the number of time steps needed to generate enough demand for depleting one unit of inventory. Notably, even though the state space of the underlying MDP is continuous and $L$-dimensional, our regret bounds depend linearly on $L$. Our results significantly improve the previously best known regret bounds for this problem where the dependence on $L$ was exponential and many further assumptions on demand distribution were required. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex cost functions. | |
Tasks | |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04337v1 |
https://arxiv.org/pdf/1905.04337v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-structured-mdps-with-convex-cost |
Repo | |
Framework | |
The Referential Reader: A Recurrent Entity Network for Anaphora Resolution
Title | The Referential Reader: A Recurrent Entity Network for Anaphora Resolution |
Authors | Fei Liu, Luke Zettlemoyer, Jacob Eisenstein |
Abstract | We present a new architecture for storing and accessing entity mentions during online text processing. While reading the text, entity references are identified, and may be stored by either updating or overwriting a cell in a fixed-length memory. The update operation implies coreference with the other mentions that are stored in the same cell; the overwrite operation causes these mentions to be forgotten. By encoding the memory operations as differentiable gates, it is possible to train the model end-to-end, using both a supervised anaphora resolution objective as well as a supplementary language modeling objective. Evaluation on a dataset of pronoun-name anaphora demonstrates strong performance with purely incremental text processing. |
Tasks | Language Modelling |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01541v2 |
https://arxiv.org/pdf/1902.01541v2.pdf | |
PWC | https://paperswithcode.com/paper/the-referential-reader-a-recurrent-entity |
Repo | |
Framework | |