Paper Group ANR 254
Transfer Regression via Pairwise Similarity Regularization. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light Field Segmentation From Super-pixel Graph Representation. Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation. Online Forecasting Matrix Factorization. Penalizing Unf …
Transfer Regression via Pairwise Similarity Regularization
Title | Transfer Regression via Pairwise Similarity Regularization |
Authors | Aubrey Gress, Ian Davidson |
Abstract | Transfer learning methods address the situation where little labeled training data from the “target” problem exists, but much training data from a related “source” domain is available. However, the overwhelming majority of transfer learning methods are designed for simple settings where the source and target predictive functions are almost identical, limiting the applicability of transfer learning methods to real world data. We propose a novel, weaker, property of the source domain that can be transferred even when the source and target predictive functions diverge. Our method assumes the source and target functions share a Pairwise Similarity property, where if the source function makes similar predictions on a pair of instances, then so will the target function. We propose Pairwise Similarity Regularization Transfer, a flexible graph-based regularization framework which can incorporate this modeling assumption into standard supervised learning algorithms. We show how users can encode domain knowledge into our regularizer in the form of spatial continuity, pairwise “similarity constraints” and how our method can be scaled to large data sets using the Nystrom approximation. Finally, we present positive and negative results on real and synthetic data sets and discuss when our Pairwise Similarity transfer assumption seems to hold in practice. |
Tasks | Transfer Learning |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.08855v1 |
http://arxiv.org/pdf/1712.08855v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-regression-via-pairwise-similarity |
Repo | |
Framework | |
Phase recovery and holographic image reconstruction using deep learning in neural networks
Title | Phase recovery and holographic image reconstruction using deep learning in neural networks |
Authors | Yair Rivenson, Yibo Zhang, Harun Gunaydin, Da Teng, Aydogan Ozcan |
Abstract | Phase recovery from intensity-only measurements forms the heart of coherent imaging techniques and holography. Here we demonstrate that a neural network can learn to perform phase recovery and holographic image reconstruction after appropriate training. This deep learning-based approach provides an entirely new framework to conduct holographic imaging by rapidly eliminating twin-image and self-interference related spatial artifacts. Compared to existing approaches, this neural network based method is significantly faster to compute, and reconstructs improved phase and amplitude images of the objects using only one hologram, i.e., requires less number of measurements in addition to being computationally faster. We validated this method by reconstructing phase and amplitude images of various samples, including blood and Pap smears, and tissue sections. These results are broadly applicable to any phase recovery problem, and highlight that through machine learning challenging problems in imaging science can be overcome, providing new avenues to design powerful computational imaging systems. |
Tasks | Image Reconstruction |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.04286v1 |
http://arxiv.org/pdf/1705.04286v1.pdf | |
PWC | https://paperswithcode.com/paper/phase-recovery-and-holographic-image |
Repo | |
Framework | |
Light Field Segmentation From Super-pixel Graph Representation
Title | Light Field Segmentation From Super-pixel Graph Representation |
Authors | Xianqiang Lv, Hao Zhu, Qing Wang |
Abstract | Efficient and accurate segmentation of light field is an important task in computer vision and graphics. The large volume of input data and the redundancy of light field make it an open challenge. In the paper, we propose a novel graph representation for interactive light field segmentation based on light field super-pixel (LFSP). The LFSP not only maintains light field redundancy, but also greatly reduces the graph size. These advantages make LFSP useful to improve segmentation efficiency. Based on LFSP graph structure, we present an efficient light field segmentation algorithm using graph-cuts. Experimental results on both synthetic and real dataset demonstrate that our method is superior to previous light field segmentation algorithms with respect to accuracy and efficiency. |
Tasks | |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07394v1 |
http://arxiv.org/pdf/1712.07394v1.pdf | |
PWC | https://paperswithcode.com/paper/light-field-segmentation-from-super-pixel |
Repo | |
Framework | |
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
Title | Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation |
Authors | Jean-Benoit Delbrouck, Stephane Dupont |
Abstract | In state-of-the-art Neural Machine Translation, an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, where it becomes possible to focus both on sentence parts and image regions. Approaches to pool two modalities usually include element-wise product, sum or concatenation. In this paper, we evaluate the more advanced Multimodal Compact Bilinear pooling method, which takes the outer product of two vectors to combine the attention features for the two modalities. This has been previously investigated for visual question answering. We try out this approach for multimodal image caption translation and show improvements compared to basic combination methods. |
Tasks | Machine Translation, Question Answering, Visual Question Answering |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08084v1 |
http://arxiv.org/pdf/1703.08084v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-compact-bilinear-pooling-for-1 |
Repo | |
Framework | |
Online Forecasting Matrix Factorization
Title | Online Forecasting Matrix Factorization |
Authors | San Gultekin, John Paisley |
Abstract | In this paper the problem of forecasting high dimensional time series is considered. Such time series can be modeled as matrices where each column denotes a measurement. In addition, when missing values are present, low rank matrix factorization approaches are suitable for predicting future values. This paper formally defines and analyzes the forecasting problem in the online setting, i.e. where the data arrives as a stream and only a single pass is allowed. We present and analyze novel matrix factorization techniques which can learn low-dimensional embeddings effectively in an online manner. Based on these embeddings a recursive minimum mean square error estimator is derived, which learns an autoregressive model on them. Experiments with two real datasets with tens of millions of measurements show the benefits of the proposed approach. |
Tasks | Time Series |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.08734v1 |
http://arxiv.org/pdf/1712.08734v1.pdf | |
PWC | https://paperswithcode.com/paper/online-forecasting-matrix-factorization |
Repo | |
Framework | |
Penalizing Unfairness in Binary Classification
Title | Penalizing Unfairness in Binary Classification |
Authors | Yahav Bechavod, Katrina Ligett |
Abstract | We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions. |
Tasks | |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1707.00044v3 |
http://arxiv.org/pdf/1707.00044v3.pdf | |
PWC | https://paperswithcode.com/paper/penalizing-unfairness-in-binary |
Repo | |
Framework | |
New Ideas for Brain Modelling 4
Title | New Ideas for Brain Modelling 4 |
Authors | Kieran Greer |
Abstract | This paper continues the research that considers a new cognitive model based strongly on the human brain. In particular, it considers the neural binding structure of an earlier paper. It also describes some new methods in the areas of image processing and behaviour simulation. The work is all based on earlier research by the author and the new additions are intended to fit in with the overall design. For image processing, a grid-like structure is used with ‘full linking’. Each cell in the classifier grid stores a list of all other cells it gets associated with and this is used as the learned image that new input is compared to. For the behaviour metric, a new prediction equation is suggested, as part of a simulation, that uses feedback and history to dynamically determine its course of action. While the new methods are from widely different topics, both can be compared with the binary-analog type of interface that is the main focus of the paper. It is suggested that the simplest of linking between a tree and ensemble can explain neural binding and variable signal strengths. |
Tasks | |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04806v4 |
http://arxiv.org/pdf/1708.04806v4.pdf | |
PWC | https://paperswithcode.com/paper/new-ideas-for-brain-modelling-4 |
Repo | |
Framework | |
Learning Modality-Invariant Representations for Speech and Images
Title | Learning Modality-Invariant Representations for Speech and Images |
Authors | Kenneth Leidal, David Harwath, James Glass |
Abstract | In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs. Specifically, we focus on the task of learning a semantic vector space for both spoken and handwritten digits using the TIDIGITs and MNIST datasets. Current techniques encode image and audio/textual inputs directly to semantic embeddings. In contrast, our technique maps an input to the mean and log variance vectors of a diagonal Gaussian from which sample semantic embeddings are drawn. In addition to encouraging semantic similarity between co-occurring inputs,our loss function includes a regularization term borrowed from variational autoencoders (VAEs) which drives the posterior distributions over embeddings to be unit Gaussian. We can use this regularization term to filter out modality information while preserving semantic information. We speculate this technique may be more broadly applicable to other areas of cross-modality/domain information retrieval and transfer learning. |
Tasks | Information Retrieval, Semantic Similarity, Semantic Textual Similarity, Transfer Learning |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03897v1 |
http://arxiv.org/pdf/1712.03897v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-modality-invariant-representations |
Repo | |
Framework | |
Automatic Recognition of Facial Displays of Unfelt Emotions
Title | Automatic Recognition of Facial Displays of Unfelt Emotions |
Authors | Kaustubh Kulkarni, Ciprian Adrian Corneanu, Ikechukwu Ofodile, Sergio Escalera, Xavier Baro, Sylwia Hyniewska, Juri Allik, Gholamreza Anbarjafari |
Abstract | Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase. |
Tasks | Emotion Recognition, Video Emotion Recognition |
Published | 2017-07-13 |
URL | http://arxiv.org/abs/1707.04061v2 |
http://arxiv.org/pdf/1707.04061v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-recognition-of-facial-displays-of |
Repo | |
Framework | |
Interactive 3D Modeling with a Generative Adversarial Network
Title | Interactive 3D Modeling with a Generative Adversarial Network |
Authors | Jerry Liu, Fisher Yu, Thomas Funkhouser |
Abstract | This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface. The user edits a voxel grid with a painting interface (like Minecraft). Yet, at any time, he/she can execute a SNAP command, which projects the current voxel grid onto a latent shape manifold with a learned projection operator and then generates a similar, but more realistic, shape using a learned generator network. Then the user can edit the resulting shape and snap again until he/she is satisfied with the result. The main advantage of this approach is that the projection and generation operators assist novice users to create 3D models characteristic of a background distribution of object shapes, but without having to specify all the details. The core new research idea is to use a GAN to support this application. 3D GANs have previously been used for shape generation, interpolation, and completion, but never for interactive modeling. The new challenge for this application is to learn a projection operator that takes an arbitrary 3D voxel model and produces a latent vector on the shape manifold from which a similar and realistic shape can be generated. We develop algorithms for this and other steps of the SNAP processing pipeline and integrate them into a simple modeling tool. Experiments with these algorithms and tool suggest that GANs provide a promising approach to computer-assisted interactive modeling. |
Tasks | |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05170v2 |
http://arxiv.org/pdf/1706.05170v2.pdf | |
PWC | https://paperswithcode.com/paper/interactive-3d-modeling-with-a-generative |
Repo | |
Framework | |
A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization
Title | A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization |
Authors | Vineet Gupta, Tomer Koren, Yoram Singer |
Abstract | We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, including the AdaGrad and Online Newton Step algorithms as well as their diagonal versions. As a result, we obtain new convergence proofs for these algorithms that are substantially simpler than previous analyses. Our framework also exposes the rationale for the different preconditioned updates used in common stochastic optimization methods. |
Tasks | Stochastic Optimization |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06569v1 |
http://arxiv.org/pdf/1706.06569v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-approach-to-adaptive-regularization |
Repo | |
Framework | |
Multi-Person Pose Estimation via Column Generation
Title | Multi-Person Pose Estimation via Column Generation |
Authors | Shaofei Wang, Chong Zhang, Miguel A. Gonzalez-Ballester, Alexander Ihler, Julian Yarkony |
Abstract | We study the problem of multi-person pose estimation in natural images. A pose estimate describes the spatial position and identity (head, foot, knee, etc.) of every non-occluded body part of a person. Pose estimation is difficult due to issues such as deformation and variation in body configurations and occlusion of parts, while multi-person settings add complications such as an unknown number of people, with unknown appearance and possible interactions in their poses and part locations. We give a novel integer program formulation of the multi-person pose estimation problem, in which variables correspond to assignments of parts in the image to poses in a two-tier, hierarchical way. This enables us to develop an efficient custom optimization procedure based on column generation, where columns are produced by exact optimization of very small scale integer programs. We demonstrate improved accuracy and speed for our method on the MPII multi-person pose estimation benchmark. |
Tasks | Multi-Person Pose Estimation, Pose Estimation |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.05982v1 |
http://arxiv.org/pdf/1709.05982v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-person-pose-estimation-via-column |
Repo | |
Framework | |
Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN
Title | Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN |
Authors | Teik Koon Cheang, Yong Shean Chong, Yong Haur Tay |
Abstract | While vehicle license plate recognition (VLPR) is usually done with a sliding window approach, it can have limited performance on datasets with characters that are of variable width. This can be solved by hand-crafting algorithms to prescale the characters. While this approach can work fairly well, the recognizer is only aware of the pixels within each detector window, and fails to account for other contextual information that might be present in other parts of the image. A sliding window approach also requires training data in the form of presegmented characters, which can be more difficult to obtain. In this paper, we propose a unified ConvNet-RNN model to recognize real-world captured license plate photographs. By using a Convolutional Neural Network (ConvNet) to perform feature extraction and using a Recurrent Neural Network (RNN) for sequencing, we address the problem of sliding window approaches being unable to access the context of the entire image by feeding the entire image as input to the ConvNet. This has the added benefit of being able to perform end-to-end training of the entire model on labelled, full license plate images. Experimental results comparing the ConvNet-RNN architecture to a sliding window-based approach shows that the ConvNet-RNN architecture performs significantly better. |
Tasks | License Plate Recognition |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06439v1 |
http://arxiv.org/pdf/1701.06439v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-free-vehicle-license-plate |
Repo | |
Framework | |
On Study of the Reliable Fully Convolutional Networks with Tree Arranged Outputs (TAO-FCN) for Handwritten String Recognition
Title | On Study of the Reliable Fully Convolutional Networks with Tree Arranged Outputs (TAO-FCN) for Handwritten String Recognition |
Authors | Song Wang, Jun Sun, Satoshi Naoi |
Abstract | The handwritten string recognition is still a challengeable task, though the powerful deep learning tools were introduced. In this paper, based on TAO-FCN, we proposed an end-to-end system for handwritten string recognition. Compared with the conventional methods, there is no preprocess nor manually designed rules employed. With enough labelled data, it is easy to apply the proposed method to different applications. Although the performance of the proposed method may not be comparable with the state-of-the-art approaches, it’s usability and robustness are more meaningful for practical applications. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02975v1 |
http://arxiv.org/pdf/1707.02975v1.pdf | |
PWC | https://paperswithcode.com/paper/on-study-of-the-reliable-fully-convolutional |
Repo | |
Framework | |
Non-linear Convolution Filters for CNN-based Learning
Title | Non-linear Convolution Filters for CNN-based Learning |
Authors | Georgios Zoumpourlis, Alexandros Doumanoglou, Nicholas Vretos, Petros Daras |
Abstract | During the last years, Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in image classification. Their architectures have largely drawn inspiration by models of the primate visual system. However, while recent research results of neuroscience prove the existence of non-linear operations in the response of complex visual cells, little effort has been devoted to extend the convolution technique to non-linear forms. Typical convolutional layers are linear systems, hence their expressiveness is limited. To overcome this, various non-linearities have been used as activation functions inside CNNs, while also many pooling strategies have been applied. We address the issue of developing a convolution method in the context of a computational model of the visual cortex, exploring quadratic forms through the Volterra kernels. Such forms, constituting a more rich function space, are used as approximations of the response profile of visual cells. Our proposed second-order convolution is tested on CIFAR-10 and CIFAR-100. We show that a network which combines linear and non-linear filters in its convolutional layers, can outperform networks that use standard linear filters with the same architecture, yielding results competitive with the state-of-the-art on these datasets. |
Tasks | Image Classification |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.07038v1 |
http://arxiv.org/pdf/1708.07038v1.pdf | |
PWC | https://paperswithcode.com/paper/non-linear-convolution-filters-for-cnn-based |
Repo | |
Framework | |