July 27, 2019

3074 words 15 mins read

Paper Group ANR 592

An Epipolar Line from a Single Pixel. A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market. Targeted matrix completion. Vid2speech: Speech Reconstruction from Silent Video. Clustering with feature selection using alternating minimization, Application to computational biology. …

An Epipolar Line from a Single Pixel


Title	An Epipolar Line from a Single Pixel
Authors	Tavi Halperin, Michael Werman
Abstract	Computing the epipolar geometry from feature points between cameras with very different viewpoints is often error prone, as an object’s appearance can vary greatly between images. For such cases, it has been shown that using motion extracted from video can achieve much better results than using a static image. This paper extends these earlier works based on the scene dynamics. In this paper we propose a new method to compute the epipolar geometry from a video stream, by exploiting the following observation: For a pixel p in Image A, all pixels corresponding to p in Image B are on the same epipolar line. Equivalently, the image of the line going through camera A’s center and p is an epipolar line in B. Therefore, when cameras A and B are synchronized, the momentary images of two objects projecting to the same pixel, p, in camera A at times t1 and t2, lie on an epipolar line in camera B. Based on this observation we achieve fast and precise computation of epipolar lines. Calibrating cameras based on our method of finding epipolar lines is much faster and more robust than previous methods.
Tasks
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09725v3
PDF	http://arxiv.org/pdf/1703.09725v3.pdf
PWC	https://paperswithcode.com/paper/an-epipolar-line-from-a-single-pixel
Repo
Framework

A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market


Title	A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market
Authors	Piero Mazzarisi, Paolo Barucca, Fabrizio Lillo, Daniele Tantari
Abstract	We propose a dynamic network model where two mechanisms control the probability of a link between two nodes: (i) the existence or absence of this link in the past, and (ii) node-specific latent variables (dynamic fitnesses) describing the propensity of each node to create links. Assuming a Markov dynamics for both mechanisms, we propose an Expectation-Maximization algorithm for model estimation and inference of the latent variables. The estimated parameters and fitnesses can be used to forecast the presence of a link in the future. We apply our methodology to the e-MID interbank network for which the two linkage mechanisms are associated with two different trading behaviors in the process of network formation, namely preferential trading and trading driven by node-specific characteristics. The empirical results allow to recognise preferential lending in the interbank market and indicate how a method that does not account for time-varying network topologies tends to overestimate preferential linkage.
Tasks
Published	2017-12-30
URL	http://arxiv.org/abs/1801.00185v1
PDF	http://arxiv.org/pdf/1801.00185v1.pdf
PWC	https://paperswithcode.com/paper/a-dynamic-network-model-with-persistent-links
Repo
Framework

Targeted matrix completion


Title	Targeted matrix completion
Authors	Natali Ruchansky, Mark Crovella, Evimaria Terzi
Abstract	Matrix completion is a problem that arises in many data-analysis settings where the input consists of a partially-observed matrix (e.g., recommender systems, traffic matrix analysis etc.). Classical approaches to matrix completion assume that the input partially-observed matrix is low rank. The success of these methods depends on the number of observed entries and the rank of the matrix; the larger the rank, the more entries need to be observed in order to accurately complete the matrix. In this paper, we deal with matrices that are not necessarily low rank themselves, but rather they contain low-rank submatrices. We propose Targeted, which is a general framework for completing such matrices. In this framework, we first extract the low-rank submatrices and then apply a matrix-completion algorithm to these low-rank submatrices as well as the remainder matrix separately. Although for the completion itself we use state-of-the-art completion methods, our results demonstrate that Targeted achieves significantly smaller reconstruction errors than other classical matrix-completion methods. One of the key technical contributions of the paper lies in the identification of the low-rank submatrices from the input partially-observed matrices.
Tasks	Matrix Completion, Recommendation Systems
Published	2017-04-30
URL	http://arxiv.org/abs/1705.00375v1
PDF	http://arxiv.org/pdf/1705.00375v1.pdf
PWC	https://paperswithcode.com/paper/targeted-matrix-completion
Repo
Framework

Vid2speech: Speech Reconstruction from Silent Video


Title	Vid2speech: Speech Reconstruction from Silent Video
Authors	Ariel Ephrat, Shmuel Peleg
Abstract	Speechreading is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible acoustic speech signal from silent video frames of a speaking person. The proposed CNN generates sound features for each frame based on its neighboring frames. Waveforms are then synthesized from the learned speech features to produce intelligible speech. We show that by leveraging the automatic feature learning capabilities of a CNN, we can obtain state-of-the-art word intelligibility on the GRID dataset, and show promising results for learning out-of-vocabulary (OOV) words.
Tasks
Published	2017-01-02
URL	http://arxiv.org/abs/1701.00495v2
PDF	http://arxiv.org/pdf/1701.00495v2.pdf
PWC	https://paperswithcode.com/paper/vid2speech-speech-reconstruction-from-silent
Repo
Framework

Clustering with feature selection using alternating minimization, Application to computational biology


Title	Clustering with feature selection using alternating minimization, Application to computational biology
Authors	Cyprien Gilet, Marie Deprez, Jean-Baptiste Caillau, Michel Barlaud
Abstract	This paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combinatorial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse which alternates k-means with projection-gradient minimization. The projection-gradient step is a method of splitting type, with exact projection on the $\ell^1$ ball to promote sparsity. The convergence of the gradient-projection step is addressed, and a preliminary analysis of the alternating minimization is made. The Frobenius norm criterion converges as the number of iterates in Algorithm K-sparse goes to infinity. Experiments on Single Cell RNA sequencing datasets show that our method significantly improves the results of PCA k-means, spectral clustering, SIMLR, and Sparcl methods, and achieves a relevant selection of genes. The complexity of K-sparse is linear in the number of samples (cells), so that the method scales up to large datasets.
Tasks	Feature Selection
Published	2017-11-08
URL	https://arxiv.org/abs/1711.02974v4
PDF	https://arxiv.org/pdf/1711.02974v4.pdf
PWC	https://paperswithcode.com/paper/clustering-with-feature-selection-using
Repo
Framework

Document Decomposition of Bangla Printed Text


Title	Document Decomposition of Bangla Printed Text
Authors	Md. Fahad Hasan, Tasmin Afroz, Sabir Ismail, Md. Saiful Islam
Abstract	Today all kind of information is getting digitized and along with all this digitization, the huge archive of various kinds of documents is being digitized too. We know that, Optical Character Recognition is the method through which, newspapers and other paper documents convert into digital resources. But, it is a fact that this method works on texts only. As a result, if we try to process any document which contains non-textual zones, then we will get garbage texts as output. That is why; in order to digitize documents properly they should be prepossessed carefully. And while preprocessing, segmenting document in different regions according to the category properly is most important. But, the Optical Character Recognition processes available for Bangla language have no such algorithm that can categorize a newspaper/book page fully. So we worked to decompose a document into its several parts like headlines, sub headlines, columns, images etc. And if the input is skewed and rotated, then the input was also deskewed and de-rotated. To decompose any Bangla document we found out the edges of the input image. Then we find out the horizontal and vertical area of every pixel where it lies in. Later on the input image was cut according to these areas. Then we pick each and every sub image and found out their height-width ratio, line height. Then according to these values the sub images were categorized. To deskew the image we found out the skew angle and de skewed the image according to this angle. To de-rotate the image we used the line height, matra line, pixel ratio of matra line.
Tasks	Optical Character Recognition
Published	2017-01-27
URL	http://arxiv.org/abs/1701.08706v1
PDF	http://arxiv.org/pdf/1701.08706v1.pdf
PWC	https://paperswithcode.com/paper/document-decomposition-of-bangla-printed-text
Repo
Framework

Video Summarization with Attention-Based Encoder-Decoder Networks


Title	Video Summarization with Attention-Based Encoder-Decoder Networks
Authors	Zhong Ji, Kailin Xiong, Yanwei Pang, Xuelong Li
Abstract	This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches,with remarkable improvements from 0.8% to 3% on two datasets,respectively..
Tasks	Supervised Video Summarization, Video Summarization
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09545v2
PDF	http://arxiv.org/pdf/1708.09545v2.pdf
PWC	https://paperswithcode.com/paper/video-summarization-with-attention-based
Repo
Framework

Forecasting Hands and Objects in Future Frames


Title	Forecasting Hands and Objects in Future Frames
Authors	Chenyou Fan, Jangwon Lee, Michael S. Ryoo
Abstract	This paper presents an approach to forecast future presence and location of human hands and objects. Given an image frame, the goal is to predict what objects will appear in the future frame (e.g., 5 seconds later) and where they will be located at, even when they are not visible in the current frame. The key idea is that (1) an intermediate representation of a convolutional object recognition model abstracts scene information in its frame and that (2) we can predict (i.e., regress) such representations corresponding to the future frames based on that of the current frame. We design a new two-stream convolutional neural network (CNN) architecture for videos by extending the state-of-the-art convolutional object detection network, and present a new fully convolutional regression network for predicting future scene representations. Our experiments confirm that combining the regressed future representation with our detection network allows reliable estimation of future hands and objects in videos. We obtain much higher accuracy compared to the state-of-the-art future object presence forecast method on a public dataset.
Tasks	Object Detection, Object Recognition
Published	2017-05-20
URL	http://arxiv.org/abs/1705.07328v3
PDF	http://arxiv.org/pdf/1705.07328v3.pdf
PWC	https://paperswithcode.com/paper/forecasting-hands-and-objects-in-future
Repo
Framework

The Devil is in the Decoder: Classification, Regression and GANs


Title	The Devil is in the Decoder: Classification, Regression and GANs
Authors	Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings
Abstract	Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. This paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise tasks ranging from classification, regression to synthesis. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce new residual-like connections for decoders. (3) We introduce a novel decoder: bilinear additive upsampling. (4) We explore prediction artifacts.
Tasks	Boundary Detection, Depth Estimation, Semantic Segmentation
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05847v3
PDF	http://arxiv.org/pdf/1707.05847v3.pdf
PWC	https://paperswithcode.com/paper/the-devil-is-in-the-decoder-classification
Repo
Framework

MR fingerprinting Deep RecOnstruction NEtwork (DRONE)


Title	MR fingerprinting Deep RecOnstruction NEtwork (DRONE)
Authors	Ouri Cohen, Bo Zhu, Matthew S. Rosen
Abstract	PURPOSE: Demonstrate a novel fast method for reconstruction of multi-dimensional MR Fingerprinting (MRF) data using Deep Learning methods. METHODS: A neural network (NN) is defined using the TensorFlow framework and trained on simulated MRF data computed using the Bloch equations. The accuracy of the NN reconstruction of noisy data is compared to conventional MRF template matching as a function of training data size, and quantified in a both simulated numerical brain phantom data and acquired data from the ISMRM/NIST phantom. The utility of the method is demonstrated in a healthy subject in vivo at 1.5 T. RESULTS: Network training required 10 minutes and once trained, data reconstruction required approximately 10 ms. Reconstruction of simulated brain data using the NN resulted in a root-mean-square error (RMSE) of 3.5 ms for T1 and 7.8 ms for T2. The RMSE for the NN trained on sparse dictionaries was approximately 6 fold lower for T1 and 2 fold lower for T2 than conventional MRF dot-product dictionary matching on the same dictionaries. Phantom measurements yielded good agreement (R2=0.99) between the T1 and T2 estimated by the NN and reference values from the ISMRM/NIST phantom. CONCLUSION: Reconstruction of MRF data with a NN is accurate, 300 fold faster and more robust to noise and undersampling than conventional MRF dictionary matching.
Tasks
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05267v3
PDF	http://arxiv.org/pdf/1710.05267v3.pdf
PWC	https://paperswithcode.com/paper/mr-fingerprinting-deep-reconstruction-network
Repo
Framework

Restricted Isometry Property of Gaussian Random Projection for Finite Set of Subspaces


Title	Restricted Isometry Property of Gaussian Random Projection for Finite Set of Subspaces
Authors	Gen Li, Yuantao Gu
Abstract	Dimension reduction plays an essential role when decreasing the complexity of solving large-scale problems. The well-known Johnson-Lindenstrauss (JL) Lemma and Restricted Isometry Property (RIP) admit the use of random projection to reduce the dimension while keeping the Euclidean distance, which leads to the boom of Compressed Sensing and the field of sparsity related signal processing. Recently, successful applications of sparse models in computer vision and machine learning have increasingly hinted that the underlying structure of high dimensional data looks more like a union of subspaces (UoS). In this paper, motivated by JL Lemma and an emerging field of Compressed Subspace Clustering (CSC), we study for the first time the RIP of Gaussian random matrices for the compression of two subspaces based on the generalized projection $F$-norm distance. We theoretically prove that with high probability the affinity or distance between two projected subspaces are concentrated around their estimates. When the ambient dimension after projection is sufficiently large, the affinity and distance between two subspaces almost remain unchanged after random projection. Numerical experiments verify the theoretical work.
Tasks	Dimensionality Reduction
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02109v3
PDF	http://arxiv.org/pdf/1704.02109v3.pdf
PWC	https://paperswithcode.com/paper/restricted-isometry-property-of-gaussian
Repo
Framework

Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm


Title	Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm
Authors	Chelsea Finn, Sergey Levine
Abstract	Learning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test inputs. Alternatively, a more recent approach to meta-learning aims to acquire deep representations that can be effectively fine-tuned, via standard gradient descent, to new tasks. In this paper, we consider the meta-learning problem from the perspective of universality, formalizing the notion of learning algorithm approximation and comparing the expressive power of the aforementioned recurrent models to the more recent approaches that embed gradient descent into the meta-learner. In particular, we seek to answer the following question: does deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm? We find that this is indeed true, and further find, in our experiments, that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.
Tasks	Meta-Learning
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11622v3
PDF	http://arxiv.org/pdf/1710.11622v3.pdf
PWC	https://paperswithcode.com/paper/meta-learning-and-universality-deep
Repo
Framework


Title	Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions
Authors	Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
Abstract	In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00503v1
PDF	http://arxiv.org/pdf/1703.00503v1.pdf
PWC	https://paperswithcode.com/paper/learning-social-affordance-grammar-from
Repo
Framework

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications


Title	Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications
Authors	Matthias Müller, Vincent Casser, Jean Lahoud, Neil Smith, Bernard Ghanem
Abstract	We present a photo-realistic training and evaluation simulator (Sim4CV) with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network (DNN) architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool. The supplementary video can be viewed a https://youtu.be/SqAxzsQ7qUU
Tasks	Autonomous Driving
Published	2017-08-19
URL	http://arxiv.org/abs/1708.05869v2
PDF	http://arxiv.org/pdf/1708.05869v2.pdf
PWC	https://paperswithcode.com/paper/sim4cv-a-photo-realistic-simulator-for
Repo
Framework

Identifying Condition-Action Statements in Medical Guidelines Using Domain-Independent Features


Title	Identifying Condition-Action Statements in Medical Guidelines Using Domain-Independent Features
Authors	Hossein Hematialam, Wlodek Zadrozny
Abstract	This paper advances the state of the art in text understanding of medical guidelines by releasing two new annotated clinical guidelines datasets, and establishing baselines for using machine learning to extract condition-action pairs. In contrast to prior work that relies on manually created rules, we report experiment with several supervised machine learning techniques to classify sentences as to whether they express conditions and actions. We show the limitations and possible extensions of this work on text mining of medical guidelines.
Tasks
Published	2017-06-13
URL	http://arxiv.org/abs/1706.04206v2
PDF	http://arxiv.org/pdf/1706.04206v2.pdf
PWC	https://paperswithcode.com/paper/identifying-condition-action-statements-in
Repo
Framework