October 18, 2019

3008 words 15 mins read

Paper Group ANR 425

A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network. Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space. Real Time Monitoring of Social Media and Digital Press. Reconstruction and Registration of Large-Scale Medical Scene Using Point Clouds Data from Different Modalities. Meteorologists …

A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network


Title	A Multiscale Image Denoising Algorithm Based On Dilated Residual Convolution Network
Authors	Chang Liu, Zhaowei Shang, Anyong Qin
Abstract	Image denoising is a classical problem in low level computer vision. Model-based optimization methods and deep learning approaches have been the two main strategies for solving the problem. Model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming. In contrast, deep learning methods have fast testing speed but the performance of these CNNs is still inferior. To address this issue, here we propose a novel deep residual learning model that combines the dilated residual convolution and multi-scale convolution groups. Due to the complex patterns and structures of inside an image, the multiscale convolution group is utilized to learn those patterns and enlarge the receptive field. Specifically, the residual connection and batch normalization are utilized to speed up the training process and maintain the denoising performance. In order to decrease the gridding artifacts, we integrate the hybrid dilated convolution design into our model. To this end, this paper aims to train a lightweight and effective denoiser based on multiscale convolution group. Experimental results have demonstrated that the enhanced denoiser can not only achieve promising denoising results, but also become a strong competitor in practical application.
Tasks	Denoising, Image Denoising
Published	2018-12-21
URL	http://arxiv.org/abs/1812.09131v1
PDF	http://arxiv.org/pdf/1812.09131v1.pdf
PWC	https://paperswithcode.com/paper/a-multiscale-image-denoising-algorithm-based
Repo
Framework

Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space


Title	Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
Authors	Ermo Wei, Drew Wicke, Sean Luke
Abstract	We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09656v1
PDF	http://arxiv.org/pdf/1810.09656v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-approaches-for-reinforcement
Repo
Framework


Title	Real Time Monitoring of Social Media and Digital Press
Authors	Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri
Abstract	Talaia is a platform for monitoring social media and digital press. A configurable crawler gathers content with respect to user defined domains or topics. Crawled data is processed by means of the EliXa Sentiment Analysis system. A Django powered interface provides data visualization for a user-based analysis of the data. This paper presents the architecture of the system and describes in detail its different components. To prove the validity of the approach, two real use cases are accounted for: one in the cultural domain and one in the political domain. Evaluation for the sentiment analysis task in both scenarios is also provided, showing the capacity for domain adaptation.
Tasks	Domain Adaptation, Sentiment Analysis
Published	2018-09-28
URL	http://arxiv.org/abs/1810.00647v2
PDF	http://arxiv.org/pdf/1810.00647v2.pdf
PWC	https://paperswithcode.com/paper/real-time-monitoring-of-social-media-and
Repo
Framework

Reconstruction and Registration of Large-Scale Medical Scene Using Point Clouds Data from Different Modalities


Title	Reconstruction and Registration of Large-Scale Medical Scene Using Point Clouds Data from Different Modalities
Authors	Ke Wang, Han Song, Jiahui Zhang, Xinran Zhang, Hongen Liao
Abstract	Sensing the medical scenario can ensure the safety during the surgical operations. So, in this regard, a monitor platform which can obtain the accurate location information of the surgery room is desperately needed. Compared to 2D camera image, 3D data contains more information of distance and direction. Therefore, 3D sensors are more suitable to be used in surgical scene monitoring. However, each 3D sensor has its own limitations. For example, Lidar (Light Detection and Ranging) can detect large-scale environment with high precision, but the point clouds or depth maps are very sparse. As for commodity RGBD sensors, such as Kinect, can accurately capture denser data, but limited to a small range from 0.5 to 4.5m. So, a proper method which can address these problems for fusing different modalities data is important. In this paper, we proposed a method which can fuse different modalities 3D data to get a large-scale and dense point cloud. The key contributions of our work are as follows. First, we proposed a 3D data collecting system to reconstruct the medical scenes. By fusing the Lidar and Kinect data, a large-scale medical scene with more details can be reconstructed. Second, we proposed a location-based fast point clouds registration algorithm to deal with different modality datasets.
Tasks
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01318v1
PDF	http://arxiv.org/pdf/1809.01318v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-and-registration-of-large
Repo
Framework

Meteorologists and Students: A resource for language grounding of geographical descriptors


Title	Meteorologists and Students: A resource for language grounding of geographical descriptors
Authors	Alejandro Ramos-Soto, Ehud Reiter, Kees van Deemter, Jose M. Alonso, Albert Gatt
Abstract	We present a data resource which can be useful for research purposes on language grounding tasks in the context of geographical referring expression generation. The resource is composed of two data sets that encompass 25 different geographical descriptors and a set of associated graphical representations, drawn as polygons on a map by two groups of human subjects: teenage students and expert meteorologists.
Tasks
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02494v1
PDF	http://arxiv.org/pdf/1809.02494v1.pdf
PWC	https://paperswithcode.com/paper/meteorologists-and-students-a-resource-for
Repo
Framework

Convolutional Neural Networks Deceived by Visual Illusions


Title	Convolutional Neural Networks Deceived by Visual Illusions
Authors	Alexander Gomez-Villa, Adrián Martín, Javier Vazquez-Corral, Marcelo Bertalmío
Abstract	Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the operations present in Convolutional Neural Networks (CNNs). This motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. We believe that this CNNs behaviour appears as a by-product of the training for the low level vision tasks of denoising, color constancy or deblurring. Our work opens a new bridge between human perception and CNNs: in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.
Tasks	Color Constancy, Deblurring, Denoising, Image Denoising
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10565v1
PDF	http://arxiv.org/pdf/1811.10565v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-deceived-by
Repo
Framework

Three-dimensional Optical Coherence Tomography Image Denoising through Multi-input Fully-Convolutional Networks


Title	Three-dimensional Optical Coherence Tomography Image Denoising through Multi-input Fully-Convolutional Networks
Authors	Ashkan Abbasi, Amirhassan Monadjemi, Leyuan Fang, Hossein Rabbani, Yi Zhang
Abstract	In recent years, there has been a growing interest in applying convolutional neural networks (CNNs) to low-level vision tasks such as denoising and super-resolution. Due to the coherent nature of the image formation process, optical coherence tomography (OCT) images are inevitably affected by noise. This paper proposes a new method named the multi-input fully-convolutional networks (MIFCN) for denoising of OCT images. In contrast to recently proposed natural image denoising CNNs, the proposed architecture allows the exploitation of high degrees of correlation and complementary information among neighboring OCT images through pixel by pixel fusion of multiple FCNs. The parameters of the proposed multi-input architecture are learned by considering the consistency between the overall output and the contribution of each input image. The proposed MIFCN method is compared with the state-of-the-art denoising methods adopted on OCT images of normal and age-related macular degeneration eyes in a quantitative and qualitative manner.
Tasks	Denoising, Image Denoising, Super-Resolution
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09022v2
PDF	http://arxiv.org/pdf/1811.09022v2.pdf
PWC	https://paperswithcode.com/paper/three-dimensional-optical-coherence
Repo
Framework

Lossless (and Lossy) Compression of Random Forests


Title	Lossless (and Lossy) Compression of Random Forests
Authors	Amichai Painsky, Saharon Rosset
Abstract	Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on random forests. Our suggested method is based on probabilistic modeling of the ensemble’s trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble. In addition, we introduce a theoretically sound lossy compression scheme, which allows us to control the trade-off between the distortion and the coding rate.
Tasks
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11197v1
PDF	http://arxiv.org/pdf/1810.11197v1.pdf
PWC	https://paperswithcode.com/paper/lossless-and-lossy-compression-of-random
Repo
Framework

FactSheets: Increasing Trust in AI Services through Supplier’s Declarations of Conformity


Title	FactSheets: Increasing Trust in AI Services through Supplier’s Declarations of Conformity
Authors	Matthew Arnold, Rachel K. E. Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilovic, Ravi Nair, Karthikeyan Natesan Ramamurthy, Darrell Reimer, Alexandra Olteanu, David Piorkowski, Jason Tsay, Kush R. Varshney
Abstract	Accuracy is an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety (which includes fairness and explainability), security, and provenance, are also critical elements to engender consumers’ trust in a service. Many industries use transparent, standardized, but often not legally required documents called supplier’s declarations of conformity (SDoCs) to describe the lineage of a product along with the safety and performance testing it has undergone. SDoCs may be considered multi-dimensional fact sheets that capture and quantify various aspects of the product and its development to make it worthy of consumers’ trust. Inspired by this practice, we propose FactSheets to help increase trust in AI services. We envision such documents to contain purpose, performance, safety, security, and provenance information to be completed by AI service providers for examination by consumers. We suggest a comprehensive set of declaration items tailored to AI and provide examples for two fictitious AI services in the appendix of the paper.
Tasks
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07261v2
PDF	http://arxiv.org/pdf/1808.07261v2.pdf
PWC	https://paperswithcode.com/paper/factsheets-increasing-trust-in-ai-services
Repo
Framework

A Fast Algorithm for Clustering High Dimensional Feature Vectors


Title	A Fast Algorithm for Clustering High Dimensional Feature Vectors
Authors	Shahina Rahman, Valen E. Johnson
Abstract	We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the $N\times N$ matrix ${\bf XX}^T$. Computational burden thus depends primarily on $N$, the number of objects to be clustered, rather than $P$, the number of features that are measured. This makes the method particularly useful in high dimensional settings, where it is substantially faster than a number of other popular clustering algorithms. Aside from an upper bound on the number of potential clusters, the method is independent of tuning parameters. When compared to $16$ other clustering algorithms on $32$ genomic datasets with gold standards, we show that it provides the most accurate cluster configuration more than twice as often than its closest competitors. We illustrate the method on data taken from highly cited genomic studies.
Tasks
Published	2018-11-02
URL	http://arxiv.org/abs/1811.00956v1
PDF	http://arxiv.org/pdf/1811.00956v1.pdf
PWC	https://paperswithcode.com/paper/a-fast-algorithm-for-clustering-high
Repo
Framework

A Generalization Method of Partitioned Activation Function for Complex Number


Title	A Generalization Method of Partitioned Activation Function for Complex Number
Authors	HyeonSeok Lee, Hyo Seon Park
Abstract	A method to convert real number partitioned activation function into complex number one is provided. The method has 4em variations; 1 has potential to get holomorphic activation, 2 has potential to conserve complex angle, and the last 1 guarantees interaction between real and imaginary parts. The method has been applied to LReLU and SELU as examples. The complex number activation function is an building block of complex number ANN, which has potential to properly deal with complex number problems. But the complex activation is not well established yet. Therefore, we propose a way to extend the partitioned real activation to complex number.
Tasks
Published	2018-02-08
URL	http://arxiv.org/abs/1802.02987v1
PDF	http://arxiv.org/pdf/1802.02987v1.pdf
PWC	https://paperswithcode.com/paper/a-generalization-method-of-partitioned
Repo
Framework

What have we learned from deep representations for action recognition?


Title	What have we learned from deep representations for action recognition?
Authors	Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes, Andrew Zisserman
Abstract	As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video. We show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following. First, cross-stream fusion enables the learning of true spatiotemporal features rather than simply separate appearance and motion features. Second, the networks can learn local representations that are highly class specific, but also generic representations that can serve a range of classes. Third, throughout the hierarchy of the network, features become more abstract and show increasing invariance to aspects of the data that are unimportant to desired distinctions (e.g. motion patterns across various speeds). Fourth, visualizations can be used not only to shed light on learned representations, but also to reveal idiosyncracies of training data and to explain failure cases of the system.
Tasks	Temporal Action Localization
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01415v1
PDF	http://arxiv.org/pdf/1801.01415v1.pdf
PWC	https://paperswithcode.com/paper/what-have-we-learned-from-deep
Repo
Framework

Improving Confidence Estimates for Unfamiliar Examples


Title	Improving Confidence Estimates for Unfamiliar Examples
Authors	Zhizhong Li, Derek Hoiem
Abstract	Intuitively, unfamiliarity should lead to lack of confidence. In reality, current algorithms often make highly confident yet wrong predictions when faced with unfamiliar examples that are relevant but not from the training distribution. A classifier we trained to recognize gender is 12 times more likely to be wrong in a 99% confident prediction if presented with a subject from a different age group than those seen during training. In this paper, we compare and evaluate several methods to improve confidence estimates for novel and familiar samples. We propose a testing methodology of splitting novel and familiar samples by attribute (age, breed, subcategory) or sampling (similar datasets collected by different people at different times). We evaluate methods including confidence calibration, ensembles, distillation, and a Bayesian model and use several metrics to analyze label, likelihood, and calibration error. While all methods reduce over-confident errors, the ensemble of calibrated models performs best overall, and T-scaling performs best among the approaches with fastest inference.
Tasks	Calibration, Domain Adaptation
Published	2018-04-09
URL	https://arxiv.org/abs/1804.03166v4
PDF	https://arxiv.org/pdf/1804.03166v4.pdf
PWC	https://paperswithcode.com/paper/reducing-over-confident-errors-outside-the
Repo
Framework

3D Feature Pyramid Attention Module for Robust Visual Speech Recognition


Title	3D Feature Pyramid Attention Module for Robust Visual Speech Recognition
Authors	Jingyun Xiao
Abstract	Visual speech recognition is the task to decode the speech content from a video based on visual information, especially the movements of lips. It is also referenced as lipreading. Motivated by two problems existing in lipreading, words with similar pronunciation and the variation of word duration, we propose a novel 3D Feature Pyramid Attention (3D-FPA) module to jointly improve the representation power of features in both the spatial and temporal domains. Specifically, the input features are downsampled for 3 times in both the spatial and temporal dimensions to construct spatiotemporal feature pyramids. Then high-level features are upsampled and combined with low-level features, finally generating a pixel-level soft attention mask to be multiplied with the input features.It enhances the discriminative power of features and exploits the temporal multi-scale information while decoding the visual speeches. Also, this module provides a new method to construct and utilize temporal pyramid structures in video analysis tasks. The field of temporal featrue pyramids are still under exploring compared to the plentiful works on spatial feature pyramids for image analysis tasks. To validate the effectiveness and adaptability of our proposed module, we embed the module in a sentence-level lipreading model, LipNet, with the result of 3.6% absolute decrease in word error rate, and a word-level model, with the result of 1.4% absolute improvement in accuracy.
Tasks	Lipreading, Speech Recognition, Visual Speech Recognition
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06178v4
PDF	http://arxiv.org/pdf/1810.06178v4.pdf
PWC	https://paperswithcode.com/paper/3d-feature-pyramid-attention-module-for
Repo
Framework

Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component Analysis


Title	Understanding and Improving Multi-Sense Word Embeddings via Extended Robust Principal Component Analysis
Authors	Haoyue Shi, Yuqi Sun, Junfeng Hu
Abstract	Unsupervised learned representations of polysemous words generate a large of pseudo multi senses since unsupervised methods are overly sensitive to contextual variations. In this paper, we address the pseudo multi-sense detection for word embeddings by dimensionality reduction of sense pairs. We propose a novel principal analysis method, termed Ex-RPCA, designed to detect both pseudo multi senses and real multi senses. With Ex-RPCA, we empirically show that pseudo multi senses are generated systematically in unsupervised method. Moreover, the multi-sense word embeddings can by improved by a simple linear transformation based on Ex-RPCA. Our improved word embedding outperform the original one by 5.6 points on Stanford contextual word similarity (SCWS) dataset. We hope our simple yet effective approach will help the linguistic analysis of multi-sense word embeddings in the future.
Tasks	Dimensionality Reduction, Word Embeddings
Published	2018-03-03
URL	http://arxiv.org/abs/1803.01255v1
PDF	http://arxiv.org/pdf/1803.01255v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-improving-multi-sense-word
Repo
Framework