October 17, 2019

3427 words 17 mins read

Paper Group ANR 812

Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion. Learning to Infer the Depth Map of a Hand from its Color Image. Towards improved lossy image compression: Human image reconstruction with public-domain images. Fingertip Detection and Tracking for Recognition of Air-Writing in Videos. Detecting Adversar …

Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion


Title	Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion
Authors	Christopher Syben, Bernhard Stimpel, Jonathan Lommen, Tobias Würfl, Arnd Dörfler, Andreas Maier
Abstract	In this paper, we derive a neural network architecture based on an analytical formulation of the parallel-to-fan beam conversion problem following the concept of precision learning. The network allows to learn the unknown operators in this conversion in a data-driven manner avoiding interpolation and potential loss of resolution. Integration of known operators results in a small number of trainable parameters that can be estimated from synthetic data only. The concept is evaluated in the context of Hybrid MRI/X-ray imaging where transformation of the parallel-beam MRI projections to fan-beam X-ray projections is required. The proposed method is compared to a traditional rebinning method. The results demonstrate that the proposed method is superior to ray-by-ray interpolation and is able to deliver sharper images using the same amount of parallel-beam input projections which is crucial for interventional applications. We believe that this approach forms a basis for further work uniting deep learning, signal processing, physics, and traditional pattern recognition.
Tasks
Published	2018-07-09
URL	http://arxiv.org/abs/1807.03057v2
PDF	http://arxiv.org/pdf/1807.03057v2.pdf
PWC	https://paperswithcode.com/paper/deriving-neural-network-architectures-using
Repo
Framework

Learning to Infer the Depth Map of a Hand from its Color Image


Title	Learning to Infer the Depth Map of a Hand from its Color Image
Authors	Vassilis C. Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos, Antonis Argyros
Abstract	We propose the first approach to the problem of inferring the depth map of a human hand based on a single RGB image. We achieve this with a Convolutional Neural Network (CNN) that employs a stacked hourglass model as its main building block. Intermediate supervision is used in several outputs of the proposed architecture in a staged approach. To aid the process of training and inference, hand segmentation masks are also estimated in such an intermediate supervision step, and used to guide the subsequent depth estimation process. In order to train and evaluate the proposed method we compile and make publicly available HandRGBD, a new dataset of 20,601 views of hands, each consisting of an RGB image and an aligned depth map. Based on HandRGBD, we explore variants of the proposed approach in an ablative study and determine the best performing one. The results of an extensive experimental evaluation demonstrate that hand depth estimation from a single RGB frame can be achieved with an accuracy of 22mm, which is comparable to the accuracy achieved by contemporary low-cost depth cameras. Such a 3D reconstruction of hands based on RGB information is valuable as a final result on its own right, but also as an input to several other hand analysis and perception algorithms that require depth input. Essentially, in such a context, the proposed approach bridges the gap between RGB and RGBD, by making all existing RGBD-based methods applicable to RGB input.
Tasks	3D Reconstruction, Depth Estimation, Hand Segmentation
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02486v1
PDF	http://arxiv.org/pdf/1812.02486v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-infer-the-depth-map-of-a-hand
Repo
Framework

Towards improved lossy image compression: Human image reconstruction with public-domain images


Title	Towards improved lossy image compression: Human image reconstruction with public-domain images
Authors	Ashutosh Bhown, Soham Mukherjee, Sean Yang, Shubham Chandak, Irena Fischer-Hwang, Kedar Tatwawadi, Judith Fan, Tsachy Weissman
Abstract	Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order to perform lossy image compression. In this paradigm, one human describes images to a second human, whose task is to reconstruct the target image using publicly available images and text instructions. The resulting reconstructions are then evaluated by human raters on the Amazon Mechanical Turk platform and compared to reconstructions obtained using state-of-the-art compressor WebP. Our results suggest that prioritizing semantic visual elements may be key to achieving significant improvements in image compression, and that our paradigm can be used to develop a more human-centric loss function. The images, results and additional data are available at https://compression.stanford.edu/human-compression
Tasks	Image Compression, Image Reconstruction
Published	2018-10-25
URL	https://arxiv.org/abs/1810.11137v3
PDF	https://arxiv.org/pdf/1810.11137v3.pdf
PWC	https://paperswithcode.com/paper/humans-are-still-the-best-lossy-image
Repo
Framework

Fingertip Detection and Tracking for Recognition of Air-Writing in Videos


Title	Fingertip Detection and Tracking for Recognition of Air-Writing in Videos
Authors	Sohom Mukherjee, Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy
Abstract	Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems.
Tasks	Hand Segmentation, Object Detection
Published	2018-09-09
URL	http://arxiv.org/abs/1809.03016v1
PDF	http://arxiv.org/pdf/1809.03016v1.pdf
PWC	https://paperswithcode.com/paper/fingertip-detection-and-tracking-for
Repo
Framework

Detecting Adversarial Examples in Convolutional Neural Networks


Title	Detecting Adversarial Examples in Convolutional Neural Networks
Authors	Stefanos Pertigkiozoglou, Petros Maragos
Abstract	The great success of convolutional neural networks has caused a massive spread of the use of such models in a large variety of Computer Vision applications. However, these models are vulnerable to certain inputs, the adversarial examples, which although are not easily perceived by humans, they can lead a neural network to produce faulty results. This paper focuses on the detection of adversarial examples, which are created for convolutional neural networks that perform image classification. We propose three methods for detecting possible adversarial examples and after we analyze and compare their performance, we combine their best aspects to develop an even more robust approach. The first proposed method is based on the regularization of the feature vector that the neural network produces as output. The second method detects adversarial examples by using histograms, which are created from the outputs of the hidden layers of the neural network. These histograms create a feature vector which is used as the input of an SVM classifier, which classifies the original input either as an adversarial or as a real input. Finally, for the third method we introduce the concept of the residual image, which contains information about the parts of the input pattern that are ignored by the neural network. This method aims at the detection of possible adversarial examples, by using the residual image and reinforcing the parts of the input pattern that are ignored by the neural network. Each one of these methods has some novelties and by combining them we can further improve the detection results. For the proposed methods and their combination, we present the results of detecting adversarial examples on the MNIST dataset. The combination of the proposed methods offers some improvements over similar state of the art approaches.
Tasks	Image Classification
Published	2018-12-08
URL	http://arxiv.org/abs/1812.03303v1
PDF	http://arxiv.org/pdf/1812.03303v1.pdf
PWC	https://paperswithcode.com/paper/detecting-adversarial-examples-in
Repo
Framework

Power Networks: A Novel Neural Architecture to Predict Power Relations


Title	Power Networks: A Novel Neural Architecture to Predict Power Relations
Authors	Michelle Lam, Catherina Xu, Angela Kong, Vinodkumar Prabhakaran
Abstract	Can language analysis reveal the underlying social power relations that exist between participants of an interaction? Prior work within NLP has shown promise in this area, but the performance of automatically predicting power relations using NLP analysis of social interactions remains wanting. In this paper, we present a novel neural architecture that captures manifestations of power within individual emails which are then aggregated in an order-preserving way in order to infer the direction of power between pairs of participants in an email thread. We obtain an accuracy of 80.4%, a 10.1% improvement over state-of-the-art methods, in this task. We further apply our model to the task of predicting power relations between individuals based on the entire set of messages exchanged between them; here also, our model significantly outperforms the70.0% accuracy using prior state-of-the-art techniques, obtaining an accuracy of 83.0%.
Tasks
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06557v1
PDF	http://arxiv.org/pdf/1807.06557v1.pdf
PWC	https://paperswithcode.com/paper/power-networks-a-novel-neural-architecture-to
Repo
Framework

Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition


Title	Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition
Authors	Feng Liu, Ronghang Zhu, Dan Zeng, Qijun Zhao, Xiaoming Liu
Abstract	This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously. Unlike existing 3D face reconstruction methods, our proposed method directly regresses dense 3D face shapes from single 2D images, and tackles identity and residual (i.e., non-identity) components in 3D face shapes explicitly and separately based on a composite 3D face shape model with latent representations. We devise a training process for the proposed network with a joint loss measuring both face identification error and 3D face shape reconstruction error. To construct training data we develop a method for fitting 3D morphable model (3DMM) to multiple 2D images of a subject. Comprehensive experiments have been done on MICC, BU3DFE, LFW and YTF databases. The results show that our method expands the capacity of 3DMM for capturing discriminative shape features and facial detail, and thus outperforms existing methods both in 3D face reconstruction accuracy and in face recognition accuracy.
Tasks	3D Face Reconstruction, Face Identification, Face Recognition, Face Reconstruction
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11366v1
PDF	http://arxiv.org/pdf/1803.11366v1.pdf
PWC	https://paperswithcode.com/paper/disentangling-features-in-3d-face-shapes-for
Repo
Framework

Hard-Aware Point-to-Set Deep Metric for Person Re-identification


Title	Hard-Aware Point-to-Set Deep Metric for Person Re-identification
Authors	Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, Xiang Bai
Abstract	Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion. Deep metric learning provides a satisfactory solution to person re-ID by training a deep network under supervision of metric loss, e.g., triplet loss. However, the performance of deep metric learning is greatly limited by traditional sampling methods. To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S) loss with a soft hard-mining scheme. Based on the point-to-set triplet loss framework, the HAP2S loss adaptively assigns greater weights to harder samples. Several advantageous properties are observed when compared with other state-of-the-art loss functions: 1) Accuracy: HAP2S loss consistently achieves higher re-ID accuracies than other alternatives on three large-scale benchmark datasets; 2) Robustness: HAP2S loss is more robust to outliers than other losses; 3) Flexibility: HAP2S loss does not rely on a specific weight function, i.e., different instantiations of HAP2S loss are equally effective. 4) Generality: In addition to person re-ID, we apply the proposed method to generic deep metric learning benchmarks including CUB-200-2011 and Cars196, and also achieve state-of-the-art results.
Tasks	Metric Learning, Person Re-Identification
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11206v1
PDF	http://arxiv.org/pdf/1807.11206v1.pdf
PWC	https://paperswithcode.com/paper/hard-aware-point-to-set-deep-metric-for
Repo
Framework

Norms, Institutions, and Robots


Title	Norms, Institutions, and Robots
Authors	Stevan Tomic, Federico Pecora, Alessandro Saffiotti
Abstract	Interactions within human societies are usually regulated by social norms. If robots are to be accepted into human society, it is essential that they are aware of and capable of reasoning about social norms. In this paper, we focus on how to represent social norms in societies with humans and robots, and how artificial agents such as robots can reason about social norms in order to plan appropriate behavior. We use the notion of institution as a way to formally define and encapsulate norms. We provide a formal framework built around the notion of institution. The framework distinguishes between abstract norms and their semantics in a concrete domain, hence allowing the use of the same institution across physical domains and agent types. It also provides a formal computational framework for norm verification, planning, and plan execution in a domain.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11456v1
PDF	http://arxiv.org/pdf/1807.11456v1.pdf
PWC	https://paperswithcode.com/paper/norms-institutions-and-robots
Repo
Framework

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text


Title	What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text
Authors	Yosi Mass, Lili Kotlerman, Shachar Mirkin, Elad Venezian, Gera Witzling, Noam Slonim
Abstract	We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe the benchmark, the process and the guidelines that were used to build it. We then demonstrate the results of a state-of-the-art system running on that benchmark.
Tasks
Published	2018-01-23
URL	http://arxiv.org/abs/1801.07507v3
PDF	http://arxiv.org/pdf/1801.07507v3.pdf
PWC	https://paperswithcode.com/paper/what-did-you-mention-a-large-scale-mention
Repo
Framework

Rotation Invariant Descriptors for Galaxy Morphological Classification


Title	Rotation Invariant Descriptors for Galaxy Morphological Classification
Authors	Hubert Cecotti
Abstract	The detection of objects that are multi-oriented is a difficult pattern recognition problem. In this paper, we propose to evaluate the performance of different families of descriptors for the classification of galaxy morphologies. We investigate the performance of the Hu moments, Flusser moments, Zernike moments, Fourier-Mellin moments, and ring projection techniques based on 1D moment and the Fourier transform. We consider two main datasets for the performance evaluation. The first dataset is an artificial dataset based on representative templates from 11 types of galaxies, which are evaluated with different transformations (noise, smoothing), alone or combined. The evaluation is based on image retrieval performance to estimate the robustness of the rotation invariant descriptors with this type of images. The second dataset is composed of real images extracted from the Galaxy Zoo 2 project. The binary classification of elliptical and spiral galaxies is achieved with pre-processing steps including morphological filtering and a Laplacian pyramid. For the binary classification, we compare the different set of features with Support Vector Machines (SVM), Extreme Learning Machine, and different types of linear discriminant analysis techniques. The results support the conclusion that the proposed framework for the binary classification of elliptical and spiral galaxies provides an area under the ROC curve reaching 99.54%, proving the robustness of the approach for helping astronomers to study galaxies.
Tasks	Image Retrieval
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04706v2
PDF	https://arxiv.org/pdf/1812.04706v2.pdf
PWC	https://paperswithcode.com/paper/rotation-invariant-descriptors-for-galaxy
Repo
Framework

Effective Use of Synthetic Data for Urban Scene Semantic Segmentation


Title	Effective Use of Synthetic Data for Urban Scene Semantic Segmentation
Authors	Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Jose M. Alvarez
Abstract	Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled automatically. Unfortunately, a network trained on synthetic data performs relatively poorly on real images. While this can be addressed by domain adaptation, existing methods all require having access to real images during training. In this paper, we introduce a drastically different way to handle synthetic images that does not require seeing any real images at training time. Our approach builds on the observation that foreground and background classes are not affected in the same manner by the domain shift, and thus should be treated differently. In particular, the former should be handled in a detection-based manner to better account for the fact that, while their texture in synthetic images is not photo-realistic, their shape looks natural. Our experiments evidence the effectiveness of our approach on Cityscapes and CamVid with models trained on synthetic data only.
Tasks	Domain Adaptation, Semantic Segmentation
Published	2018-07-16
URL	http://arxiv.org/abs/1807.06132v1
PDF	http://arxiv.org/pdf/1807.06132v1.pdf
PWC	https://paperswithcode.com/paper/effective-use-of-synthetic-data-for-urban
Repo
Framework

Learning Finite State Representations of Recurrent Policy Networks


Title	Learning Finite State Representations of Recurrent Policy Networks
Authors	Anurag Koul, Sam Greydanus, Alan Fern
Abstract	Recurrent neural networks (RNNs) are an effective representation of control policies for a wide range of reinforcement and imitation learning problems. RNN policies, however, are particularly difficult to explain, understand, and analyze due to their use of continuous-valued memory vectors and observation features. In this paper, we introduce a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features. The result is a quantized representation of the RNN that can be analyzed to improve our understanding of memory use and general behavior. We present results of this approach on synthetic environments and six Atari games. The resulting finite representations are surprisingly small in some cases, using as few as 3 discrete memory states and 10 observations for a perfect Pong policy. We also show that these finite policy representations lead to improved interpretability.
Tasks	Atari Games, Imitation Learning
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12530v1
PDF	http://arxiv.org/pdf/1811.12530v1.pdf
PWC	https://paperswithcode.com/paper/learning-finite-state-representations-of
Repo
Framework

ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks


Title	ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
Authors	Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann
Abstract	In this paper, we introduce an approach to training a given compact network. To this end, we leverage over-parameterization, which typically improves both optimization and generalization in neural network training, while being unnecessary at inference time. We propose to expand each linear layer, both fully-connected and convolutional, of the compact network into multiple linear layers, without adding any nonlinearity. As such, the resulting expanded network can benefit from over-parameterization during training but can be compressed back to the compact one algebraically at inference. We introduce several expansion strategies, together with an initialization scheme, and demonstrate the benefits of our ExpandNets on several tasks, including image classification, object detection, and semantic segmentation. As evidenced by our experiments, our approach outperforms both training the compact network from scratch and performing knowledge distillation from a teacher.
Tasks	Image Classification, Object Detection, Semantic Segmentation, Transfer Learning
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10495v4
PDF	https://arxiv.org/pdf/1811.10495v4.pdf
PWC	https://paperswithcode.com/paper/expandnets-exploiting-linear-redundancy-to
Repo
Framework

Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method


Title	Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method
Authors	Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
Abstract	Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available.
Tasks	Community Detection, Compressive Sensing
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02537v1
PDF	http://arxiv.org/pdf/1812.02537v1.pdf
PWC	https://paperswithcode.com/paper/rank-one-matrix-estimation-analysis-of
Repo
Framework