Paper Group ANR 812
Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion. Learning to Infer the Depth Map of a Hand from its Color Image. Towards improved lossy image compression: Human image reconstruction with public-domain images. Fingertip Detection and Tracking for Recognition of Air-Writing in Videos. Detecting Adversar …
Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion
Title | Deriving Neural Network Architectures using Precision Learning: Parallel-to-fan beam Conversion |
Authors | Christopher Syben, Bernhard Stimpel, Jonathan Lommen, Tobias Würfl, Arnd Dörfler, Andreas Maier |
Abstract | In this paper, we derive a neural network architecture based on an analytical formulation of the parallel-to-fan beam conversion problem following the concept of precision learning. The network allows to learn the unknown operators in this conversion in a data-driven manner avoiding interpolation and potential loss of resolution. Integration of known operators results in a small number of trainable parameters that can be estimated from synthetic data only. The concept is evaluated in the context of Hybrid MRI/X-ray imaging where transformation of the parallel-beam MRI projections to fan-beam X-ray projections is required. The proposed method is compared to a traditional rebinning method. The results demonstrate that the proposed method is superior to ray-by-ray interpolation and is able to deliver sharper images using the same amount of parallel-beam input projections which is crucial for interventional applications. We believe that this approach forms a basis for further work uniting deep learning, signal processing, physics, and traditional pattern recognition. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03057v2 |
http://arxiv.org/pdf/1807.03057v2.pdf | |
PWC | https://paperswithcode.com/paper/deriving-neural-network-architectures-using |
Repo | |
Framework | |
Learning to Infer the Depth Map of a Hand from its Color Image
Title | Learning to Infer the Depth Map of a Hand from its Color Image |
Authors | Vassilis C. Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos, Antonis Argyros |
Abstract | We propose the first approach to the problem of inferring the depth map of a human hand based on a single RGB image. We achieve this with a Convolutional Neural Network (CNN) that employs a stacked hourglass model as its main building block. Intermediate supervision is used in several outputs of the proposed architecture in a staged approach. To aid the process of training and inference, hand segmentation masks are also estimated in such an intermediate supervision step, and used to guide the subsequent depth estimation process. In order to train and evaluate the proposed method we compile and make publicly available HandRGBD, a new dataset of 20,601 views of hands, each consisting of an RGB image and an aligned depth map. Based on HandRGBD, we explore variants of the proposed approach in an ablative study and determine the best performing one. The results of an extensive experimental evaluation demonstrate that hand depth estimation from a single RGB frame can be achieved with an accuracy of 22mm, which is comparable to the accuracy achieved by contemporary low-cost depth cameras. Such a 3D reconstruction of hands based on RGB information is valuable as a final result on its own right, but also as an input to several other hand analysis and perception algorithms that require depth input. Essentially, in such a context, the proposed approach bridges the gap between RGB and RGBD, by making all existing RGBD-based methods applicable to RGB input. |
Tasks | 3D Reconstruction, Depth Estimation, Hand Segmentation |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02486v1 |
http://arxiv.org/pdf/1812.02486v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-infer-the-depth-map-of-a-hand |
Repo | |
Framework | |
Towards improved lossy image compression: Human image reconstruction with public-domain images
Title | Towards improved lossy image compression: Human image reconstruction with public-domain images |
Authors | Ashutosh Bhown, Soham Mukherjee, Sean Yang, Shubham Chandak, Irena Fischer-Hwang, Kedar Tatwawadi, Judith Fan, Tsachy Weissman |
Abstract | Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order to perform lossy image compression. In this paradigm, one human describes images to a second human, whose task is to reconstruct the target image using publicly available images and text instructions. The resulting reconstructions are then evaluated by human raters on the Amazon Mechanical Turk platform and compared to reconstructions obtained using state-of-the-art compressor WebP. Our results suggest that prioritizing semantic visual elements may be key to achieving significant improvements in image compression, and that our paradigm can be used to develop a more human-centric loss function. The images, results and additional data are available at https://compression.stanford.edu/human-compression |
Tasks | Image Compression, Image Reconstruction |
Published | 2018-10-25 |
URL | https://arxiv.org/abs/1810.11137v3 |
https://arxiv.org/pdf/1810.11137v3.pdf | |
PWC | https://paperswithcode.com/paper/humans-are-still-the-best-lossy-image |
Repo | |
Framework | |
Fingertip Detection and Tracking for Recognition of Air-Writing in Videos
Title | Fingertip Detection and Tracking for Recognition of Air-Writing in Videos |
Authors | Sohom Mukherjee, Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy |
Abstract | Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems. |
Tasks | Hand Segmentation, Object Detection |
Published | 2018-09-09 |
URL | http://arxiv.org/abs/1809.03016v1 |
http://arxiv.org/pdf/1809.03016v1.pdf | |
PWC | https://paperswithcode.com/paper/fingertip-detection-and-tracking-for |
Repo | |
Framework | |
Detecting Adversarial Examples in Convolutional Neural Networks
Title | Detecting Adversarial Examples in Convolutional Neural Networks |
Authors | Stefanos Pertigkiozoglou, Petros Maragos |
Abstract | The great success of convolutional neural networks has caused a massive spread of the use of such models in a large variety of Computer Vision applications. However, these models are vulnerable to certain inputs, the adversarial examples, which although are not easily perceived by humans, they can lead a neural network to produce faulty results. This paper focuses on the detection of adversarial examples, which are created for convolutional neural networks that perform image classification. We propose three methods for detecting possible adversarial examples and after we analyze and compare their performance, we combine their best aspects to develop an even more robust approach. The first proposed method is based on the regularization of the feature vector that the neural network produces as output. The second method detects adversarial examples by using histograms, which are created from the outputs of the hidden layers of the neural network. These histograms create a feature vector which is used as the input of an SVM classifier, which classifies the original input either as an adversarial or as a real input. Finally, for the third method we introduce the concept of the residual image, which contains information about the parts of the input pattern that are ignored by the neural network. This method aims at the detection of possible adversarial examples, by using the residual image and reinforcing the parts of the input pattern that are ignored by the neural network. Each one of these methods has some novelties and by combining them we can further improve the detection results. For the proposed methods and their combination, we present the results of detecting adversarial examples on the MNIST dataset. The combination of the proposed methods offers some improvements over similar state of the art approaches. |
Tasks | Image Classification |
Published | 2018-12-08 |
URL | http://arxiv.org/abs/1812.03303v1 |
http://arxiv.org/pdf/1812.03303v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-adversarial-examples-in |
Repo | |
Framework | |
Power Networks: A Novel Neural Architecture to Predict Power Relations
Title | Power Networks: A Novel Neural Architecture to Predict Power Relations |
Authors | Michelle Lam, Catherina Xu, Angela Kong, Vinodkumar Prabhakaran |
Abstract | Can language analysis reveal the underlying social power relations that exist between participants of an interaction? Prior work within NLP has shown promise in this area, but the performance of automatically predicting power relations using NLP analysis of social interactions remains wanting. In this paper, we present a novel neural architecture that captures manifestations of power within individual emails which are then aggregated in an order-preserving way in order to infer the direction of power between pairs of participants in an email thread. We obtain an accuracy of 80.4%, a 10.1% improvement over state-of-the-art methods, in this task. We further apply our model to the task of predicting power relations between individuals based on the entire set of messages exchanged between them; here also, our model significantly outperforms the70.0% accuracy using prior state-of-the-art techniques, obtaining an accuracy of 83.0%. |
Tasks | |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06557v1 |
http://arxiv.org/pdf/1807.06557v1.pdf | |
PWC | https://paperswithcode.com/paper/power-networks-a-novel-neural-architecture-to |
Repo | |
Framework | |
Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition
Title | Disentangling Features in 3D Face Shapes for Joint Face Reconstruction and Recognition |
Authors | Feng Liu, Ronghang Zhu, Dan Zeng, Qijun Zhao, Xiaoming Liu |
Abstract | This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously. Unlike existing 3D face reconstruction methods, our proposed method directly regresses dense 3D face shapes from single 2D images, and tackles identity and residual (i.e., non-identity) components in 3D face shapes explicitly and separately based on a composite 3D face shape model with latent representations. We devise a training process for the proposed network with a joint loss measuring both face identification error and 3D face shape reconstruction error. To construct training data we develop a method for fitting 3D morphable model (3DMM) to multiple 2D images of a subject. Comprehensive experiments have been done on MICC, BU3DFE, LFW and YTF databases. The results show that our method expands the capacity of 3DMM for capturing discriminative shape features and facial detail, and thus outperforms existing methods both in 3D face reconstruction accuracy and in face recognition accuracy. |
Tasks | 3D Face Reconstruction, Face Identification, Face Recognition, Face Reconstruction |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11366v1 |
http://arxiv.org/pdf/1803.11366v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-features-in-3d-face-shapes-for |
Repo | |
Framework | |
Hard-Aware Point-to-Set Deep Metric for Person Re-identification
Title | Hard-Aware Point-to-Set Deep Metric for Person Re-identification |
Authors | Rui Yu, Zhiyong Dou, Song Bai, Zhaoxiang Zhang, Yongchao Xu, Xiang Bai |
Abstract | Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion. Deep metric learning provides a satisfactory solution to person re-ID by training a deep network under supervision of metric loss, e.g., triplet loss. However, the performance of deep metric learning is greatly limited by traditional sampling methods. To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S) loss with a soft hard-mining scheme. Based on the point-to-set triplet loss framework, the HAP2S loss adaptively assigns greater weights to harder samples. Several advantageous properties are observed when compared with other state-of-the-art loss functions: 1) Accuracy: HAP2S loss consistently achieves higher re-ID accuracies than other alternatives on three large-scale benchmark datasets; 2) Robustness: HAP2S loss is more robust to outliers than other losses; 3) Flexibility: HAP2S loss does not rely on a specific weight function, i.e., different instantiations of HAP2S loss are equally effective. 4) Generality: In addition to person re-ID, we apply the proposed method to generic deep metric learning benchmarks including CUB-200-2011 and Cars196, and also achieve state-of-the-art results. |
Tasks | Metric Learning, Person Re-Identification |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11206v1 |
http://arxiv.org/pdf/1807.11206v1.pdf | |
PWC | https://paperswithcode.com/paper/hard-aware-point-to-set-deep-metric-for |
Repo | |
Framework | |
Norms, Institutions, and Robots
Title | Norms, Institutions, and Robots |
Authors | Stevan Tomic, Federico Pecora, Alessandro Saffiotti |
Abstract | Interactions within human societies are usually regulated by social norms. If robots are to be accepted into human society, it is essential that they are aware of and capable of reasoning about social norms. In this paper, we focus on how to represent social norms in societies with humans and robots, and how artificial agents such as robots can reason about social norms in order to plan appropriate behavior. We use the notion of institution as a way to formally define and encapsulate norms. We provide a formal framework built around the notion of institution. The framework distinguishes between abstract norms and their semantics in a concrete domain, hence allowing the use of the same institution across physical domains and agent types. It also provides a formal computational framework for norm verification, planning, and plan execution in a domain. |
Tasks | |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11456v1 |
http://arxiv.org/pdf/1807.11456v1.pdf | |
PWC | https://paperswithcode.com/paper/norms-institutions-and-robots |
Repo | |
Framework | |
What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text
Title | What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text |
Authors | Yosi Mass, Lili Kotlerman, Shachar Mirkin, Elad Venezian, Gera Witzling, Noam Slonim |
Abstract | We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe the benchmark, the process and the guidelines that were used to build it. We then demonstrate the results of a state-of-the-art system running on that benchmark. |
Tasks | |
Published | 2018-01-23 |
URL | http://arxiv.org/abs/1801.07507v3 |
http://arxiv.org/pdf/1801.07507v3.pdf | |
PWC | https://paperswithcode.com/paper/what-did-you-mention-a-large-scale-mention |
Repo | |
Framework | |
Rotation Invariant Descriptors for Galaxy Morphological Classification
Title | Rotation Invariant Descriptors for Galaxy Morphological Classification |
Authors | Hubert Cecotti |
Abstract | The detection of objects that are multi-oriented is a difficult pattern recognition problem. In this paper, we propose to evaluate the performance of different families of descriptors for the classification of galaxy morphologies. We investigate the performance of the Hu moments, Flusser moments, Zernike moments, Fourier-Mellin moments, and ring projection techniques based on 1D moment and the Fourier transform. We consider two main datasets for the performance evaluation. The first dataset is an artificial dataset based on representative templates from 11 types of galaxies, which are evaluated with different transformations (noise, smoothing), alone or combined. The evaluation is based on image retrieval performance to estimate the robustness of the rotation invariant descriptors with this type of images. The second dataset is composed of real images extracted from the Galaxy Zoo 2 project. The binary classification of elliptical and spiral galaxies is achieved with pre-processing steps including morphological filtering and a Laplacian pyramid. For the binary classification, we compare the different set of features with Support Vector Machines (SVM), Extreme Learning Machine, and different types of linear discriminant analysis techniques. The results support the conclusion that the proposed framework for the binary classification of elliptical and spiral galaxies provides an area under the ROC curve reaching 99.54%, proving the robustness of the approach for helping astronomers to study galaxies. |
Tasks | Image Retrieval |
Published | 2018-12-11 |
URL | https://arxiv.org/abs/1812.04706v2 |
https://arxiv.org/pdf/1812.04706v2.pdf | |
PWC | https://paperswithcode.com/paper/rotation-invariant-descriptors-for-galaxy |
Repo | |
Framework | |
Effective Use of Synthetic Data for Urban Scene Semantic Segmentation
Title | Effective Use of Synthetic Data for Urban Scene Semantic Segmentation |
Authors | Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Jose M. Alvarez |
Abstract | Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled automatically. Unfortunately, a network trained on synthetic data performs relatively poorly on real images. While this can be addressed by domain adaptation, existing methods all require having access to real images during training. In this paper, we introduce a drastically different way to handle synthetic images that does not require seeing any real images at training time. Our approach builds on the observation that foreground and background classes are not affected in the same manner by the domain shift, and thus should be treated differently. In particular, the former should be handled in a detection-based manner to better account for the fact that, while their texture in synthetic images is not photo-realistic, their shape looks natural. Our experiments evidence the effectiveness of our approach on Cityscapes and CamVid with models trained on synthetic data only. |
Tasks | Domain Adaptation, Semantic Segmentation |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.06132v1 |
http://arxiv.org/pdf/1807.06132v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-use-of-synthetic-data-for-urban |
Repo | |
Framework | |
Learning Finite State Representations of Recurrent Policy Networks
Title | Learning Finite State Representations of Recurrent Policy Networks |
Authors | Anurag Koul, Sam Greydanus, Alan Fern |
Abstract | Recurrent neural networks (RNNs) are an effective representation of control policies for a wide range of reinforcement and imitation learning problems. RNN policies, however, are particularly difficult to explain, understand, and analyze due to their use of continuous-valued memory vectors and observation features. In this paper, we introduce a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features. The result is a quantized representation of the RNN that can be analyzed to improve our understanding of memory use and general behavior. We present results of this approach on synthetic environments and six Atari games. The resulting finite representations are surprisingly small in some cases, using as few as 3 discrete memory states and 10 observations for a perfect Pong policy. We also show that these finite policy representations lead to improved interpretability. |
Tasks | Atari Games, Imitation Learning |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12530v1 |
http://arxiv.org/pdf/1811.12530v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-finite-state-representations-of |
Repo | |
Framework | |
ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
Title | ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks |
Authors | Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann |
Abstract | In this paper, we introduce an approach to training a given compact network. To this end, we leverage over-parameterization, which typically improves both optimization and generalization in neural network training, while being unnecessary at inference time. We propose to expand each linear layer, both fully-connected and convolutional, of the compact network into multiple linear layers, without adding any nonlinearity. As such, the resulting expanded network can benefit from over-parameterization during training but can be compressed back to the compact one algebraically at inference. We introduce several expansion strategies, together with an initialization scheme, and demonstrate the benefits of our ExpandNets on several tasks, including image classification, object detection, and semantic segmentation. As evidenced by our experiments, our approach outperforms both training the compact network from scratch and performing knowledge distillation from a teacher. |
Tasks | Image Classification, Object Detection, Semantic Segmentation, Transfer Learning |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10495v4 |
https://arxiv.org/pdf/1811.10495v4.pdf | |
PWC | https://paperswithcode.com/paper/expandnets-exploiting-linear-redundancy-to |
Repo | |
Framework | |
Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method
Title | Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method |
Authors | Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová |
Abstract | Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available. |
Tasks | Community Detection, Compressive Sensing |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02537v1 |
http://arxiv.org/pdf/1812.02537v1.pdf | |
PWC | https://paperswithcode.com/paper/rank-one-matrix-estimation-analysis-of |
Repo | |
Framework | |