January 28, 2020

2958 words 14 mins read

Paper Group ANR 909

Optimal low rank tensor recovery. AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers. Decoding of visual-related information from the human EEG using an end-to-end deep learning approach. Joint Spatial and Angular Super-Resolution from a Single Image. Triangulation: Why Optimize?. Generating Highly Relevant Questions. Basis Prediction Ne …

Optimal low rank tensor recovery


Title	Optimal low rank tensor recovery
Authors	Jian-Feng Cai, Lizhang Miao, Yang Wang, Yin Xian
Abstract	We investigate the sample size requirement for exact recovery of a high order tensor of low rank from a subset of its entries. In the Tucker decomposition framework, we show that the Riemannian optimization algorithm with initial value obtained from a spectral method can reconstruct a tensor of size $n\times n \times\cdots \times n$ tensor of ranks $(r,\cdots,r)$ with high probability from as few as $O((r^d+dnr)\log(d))$ entries. In the case of order 3 tensor, the entries can be asymptotically as few as $O(nr)$ for a low rank large tensor. We show the theoretical guarantee condition for the recovery. The analysis relies on the tensor restricted isometry property (tensor RIP) and the curvature of the low rank tensor manifold. Our algorithm is computationally efficient and easy to implement. Numerical results verify that the algorithms are able to recover a low rank tensor from minimum number of measurements. The experiments on hyperspectral images recovery also show that our algorithm is capable of real world signal processing problems.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05346v3
PDF	https://arxiv.org/pdf/1906.05346v3.pdf
PWC	https://paperswithcode.com/paper/optimal-low-rank-tensor-recovery
Repo
Framework

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers


Title	AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers
Authors	Julian Faraone, Martin Kumm, Martin Hardieck, Peter Zipf, Xueyuan Liu, David Boland, Philip H. W. Leong
Abstract	Low-precision arithmetic operations to accelerate deep-learning applications on field-programmable gate arrays (FPGAs) have been studied extensively, because they offer the potential to save silicon area or increase throughput. However, these benefits come at the cost of a decrease in accuracy. In this article, we demonstrate that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic. RCCMs multiply input values by a restricted choice of coefficients using only adders, subtractors, bit shifts, and multiplexers (MUXes), meaning that they can be heavily optimized for FPGAs. We propose a family of RCCMs tailored to FPGA logic elements to ensure their efficient utilization. To minimize information loss from quantization, we then develop novel training techniques that map the possible coefficient representations of the RCCMs to neural network weight parameter distributions. This enables the usage of the RCCMs in hardware, while maintaining high accuracy. We demonstrate the benefits of these techniques using AlexNet, ResNet-18, and ResNet-50 networks. The resulting implementations achieve up to 50% resource savings over traditional 8-bit quantized networks, translating to significant speedups and power savings. Our RCCM with the lowest resource requirements exceeds 6-bit fixed point accuracy, while all other implementations with RCCMs achieve at least similar accuracy to an 8-bit uniformly quantized design, while achieving significant resource savings.
Tasks	Quantization
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08097v1
PDF	https://arxiv.org/pdf/1911.08097v1.pdf
PWC	https://paperswithcode.com/paper/addnet-deep-neural-networks-using-fpga
Repo
Framework


Title	Decoding of visual-related information from the human EEG using an end-to-end deep learning approach
Authors	Lingling Yang, Leanne Lai Hang Chan, Yao Lu
Abstract	There is increasing interest in using deep learning approach for EEG analysis as there are still rooms for the improvement of EEG analysis in its accuracy. Convolutional long short-term (CNNLSTM) has been successfully applied in time series data with spatial structure through end-to-end learning. Here, we proposed a CNNLSTM based neural network architecture termed EEG_CNNLSTMNet for the classification of EEG signals in response to grating stimuli with different spatial frequencies. EEG_CNNLSTMNet comprises two convolutional layers and one bidirectional long short-term memory (LSTM) layer. The convolutional layers capture local temporal characteristics of the EEG signal at each channel as well as global spatial characteristics across channels, while the LSTM layer extracts long-term temporal dependency of EEG signals. Our experiment showed that EEG_CNNLSTMNet performed much better at EEG classification than a traditional machine learning approach, i.e. a support vector machine (SVM) with features. Additionally, EEG_CNNLSTMNet outperformed EEGNet, a state-of-art neural network architecture for the intra-subject case. We infer that the underperformance when using an LSTM layer in the inter-subject case is due to long-term dependency characteristics in the EEG signal that vary greatly across subjects. Moreover, the inter-subject fine-tuned classification model using very little data of the new subject achieved much higher accuracy than that trained only on the data from the other subjects. Our study suggests that the fine-tuned inter-subject model can be a potential end-to-end EEG analysis method considering both the accuracy and the required training data of the new subject.
Tasks	EEG, Time Series
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00550v3
PDF	https://arxiv.org/pdf/1911.00550v3.pdf
PWC	https://paperswithcode.com/paper/decoding-of-visual-related-information-from
Repo
Framework

Joint Spatial and Angular Super-Resolution from a Single Image


Title	Joint Spatial and Angular Super-Resolution from a Single Image
Authors	Andre Ivan, Williem, In Kyu Park
Abstract	Synthesizing a densely sampled light field from a single image is highly beneficial for many applications. Moreover, jointly solving both angular and spatial super-resolution problem also introduces new possibilities in light field imaging. The conventional method relies on physical-based rendering and a secondary network to solve the angular super-resolution problem. In addition, pixel-based loss limits the network capability to infer scene geometry globally. In this paper, we show that both super-resolution problems can be solved jointly from a single image by proposing a single end-to-end deep neural network that does not require a physical-based approach. Two novel loss functions based on known light field domain knowledge are proposed to enable the network to preserve the spatio-angular consistency between sub-aperture images. Experimental results show that the proposed model successfully synthesizes dense high resolution light field and it outperforms the state-of-the-art method in both quantitative and qualitative criteria. The method can be generalized to arbitrary scenes, rather than focusing on a particular subject. The synthesized light field can be used for various applications, such as depth estimation and refocusing.
Tasks	Depth Estimation, Super-Resolution
Published	2019-11-23
URL	https://arxiv.org/abs/1911.11619v2
PDF	https://arxiv.org/pdf/1911.11619v2.pdf
PWC	https://paperswithcode.com/paper/joint-spatial-and-angular-super-resolution
Repo
Framework

Triangulation: Why Optimize?


Title	Triangulation: Why Optimize?
Authors	Seong Hun Lee, Javier Civera
Abstract	For decades, it has been widely accepted that the gold standard for two-view triangulation is to minimize the cost based on reprojection errors. In this work, we challenge this idea. We propose a novel alternative to the classic midpoint method that leads to significantly lower 2D errors and parallax errors. It provides a numerically stable closed-form solution based solely on a pair of backprojected rays. Since our solution is rotationally invariant, it can also be applied for fisheye and omnidirectional cameras. We show that for small parallax angles, our method outperforms the state-of-the-art in terms of combined 2D, 3D and parallax accuracy, while achieving comparable speed.
Tasks
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11917v2
PDF	https://arxiv.org/pdf/1907.11917v2.pdf
PWC	https://paperswithcode.com/paper/triangulation-why-optimize
Repo
Framework

Generating Highly Relevant Questions


Title	Generating Highly Relevant Questions
Authors	Jiazuo Qiu, Deyi Xiong
Abstract	The neural seq2seq based question generation (QG) is prone to generating generic and undiversified questions that are poorly relevant to the given passage and target answer. In this paper, we propose two methods to address the issue. (1) By a partial copy mechanism, we prioritize words that are morphologically close to words in the input passage when generating questions; (2) By a QA-based reranker, from the n-best list of question candidates, we select questions that are preferred by both the QA and QG model. Experiments and analyses demonstrate that the proposed two methods substantially improve the relevance of generated questions to passages and answers.
Tasks	Question Generation
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03401v1
PDF	https://arxiv.org/pdf/1910.03401v1.pdf
PWC	https://paperswithcode.com/paper/generating-highly-relevant-questions
Repo
Framework

Basis Prediction Networks for Effective Burst Denoising with Large Kernels


Title	Basis Prediction Networks for Effective Burst Denoising with Large Kernels
Authors	Zhihao Xia, Federico Perazzi, Michaël Gharbi, Kalyan Sunkavalli, Ayan Chakrabarti
Abstract	Bursts of images exhibit significant self-similarity across both time and space. This motivates a representation of the kernels as linear combinations of a small set of basis elements. To this end, we introduce a novel basis prediction network that, given an input burst, predicts a set of global basis kernels — shared within the image — and the corresponding mixing coefficients — which are specific to individual pixels. Compared to other state-of-the-art deep learning techniques that output a large tensor of per-pixel spatiotemporal kernels, our formulation substantially reduces the dimensionality of the network output. This allows us to effectively exploit larger denoising kernels and achieve significant quality improvements (over 1dB PSNR) at reduced run-times compared to state-of-the-art methods.
Tasks	Denoising
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04421v1
PDF	https://arxiv.org/pdf/1912.04421v1.pdf
PWC	https://paperswithcode.com/paper/basis-prediction-networks-for-effective-burst
Repo
Framework

Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes


Title	Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes
Authors	Li-Fang Cheng, Bianca Dumitrascu, Michael Zhang, Corey Chivers, Michael Draugelis, Kai Li, Barbara E. Engelhardt
Abstract	Multi-output Gaussian processes (GPs) are a flexible Bayesian nonparametric framework that has proven useful in jointly modeling the physiological states of patients in medical time series data. However, capturing the short-term effects of drugs and therapeutic interventions on patient physiological state remains challenging. We propose a novel approach that models the effect of interventions as a hybrid Gaussian process composed of a GP capturing patient physiology convolved with a latent force model capturing effects of treatments on specific physiological features. This convolution of a multi-output GP with a GP including a causal time-marked kernel leads to a well-characterized model of the patients’ physiological state responding to interventions. We show that our model leads to analytically tractable cross-covariance functions, allowing scalable inference. Our hierarchical model includes estimates of patient-specific effects but allows sharing of support across patients. Our approach achieves competitive predictive performance on challenging hospital data, where we recover patient-specific response to the administration of three common drugs: one antihypertensive drug and two anticoagulants.
Tasks	Gaussian Processes, Time Series
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00226v1
PDF	https://arxiv.org/pdf/1906.00226v1.pdf
PWC	https://paperswithcode.com/paper/190600226
Repo
Framework

Structure from Motion for Panorama-Style Videos


Title	Structure from Motion for Panorama-Style Videos
Authors	Chris Sweeney, Aleksander Holynski, Brian Curless, Steve M Seitz
Abstract	We present a novel Structure from Motion pipeline that is capable of reconstructing accurate camera poses for panorama-style video capture without prior camera intrinsic calibration. While panorama-style capture is common and convenient, previous reconstruction methods fail to obtain accurate reconstructions due to the rotation-dominant motion and small baseline between views. Our method is built on the assumption that the camera motion approximately corresponds to motion on a sphere, and we introduce three novel relative pose methods to estimate the fundamental matrix and camera distortion for spherical motion. These solvers are efficient and robust, and provide an excellent initialization for bundle adjustment. A soft prior on the camera poses is used to discourage large deviations from the spherical motion assumption when performing bundle adjustment, which allows cameras to remain properly constrained for optimization in the absence of well-triangulated 3D points. To validate the effectiveness of the proposed method we evaluate our approach on both synthetic and real-world data, and demonstrate that camera poses are accurate enough for multiview stereo.
Tasks	Calibration
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03539v1
PDF	https://arxiv.org/pdf/1906.03539v1.pdf
PWC	https://paperswithcode.com/paper/structure-from-motion-for-panorama-style
Repo
Framework

Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning


Title	Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning
Authors	Woojun Kim, Myungsik Cho, Youngchul Sung
Abstract	In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the message-dropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed message-dropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.
Tasks	Multi-agent Reinforcement Learning
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06527v1
PDF	http://arxiv.org/pdf/1902.06527v1.pdf
PWC	https://paperswithcode.com/paper/message-dropout-an-efficient-training-method
Repo
Framework


Title	Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis
Authors	Zhongkai Sun, Prathusha K Sarma, William Sethares, Erik P. Bucy
Abstract	This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order to improve upon down-stream sentiment classification. The experimental framework also allows investigation of the relative contributions of the individual views in the final multi-modal embedding. Individual features derived from the three views are combined into a multi-modal embedding using Deep Canonical Correlation Analysis (DCCA) in two ways i) One-Step DCCA and ii) Two-Step DCCA. This paper learns text embeddings using BERT, the current state-of-the-art in text encoders. We posit that this highly optimized algorithm dominates over the contribution of other views, though each view does contribute to the final result. Classification tasks are carried out on two benchmark datasets and on a new Debate Emotion data set, and together these demonstrate that the one-Step DCCA outperforms the current state-of-the-art in learning multi-modal embeddings.
Tasks	Sentiment Analysis
Published	2019-07-15
URL	https://arxiv.org/abs/1907.08696v1
PDF	https://arxiv.org/pdf/1907.08696v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-sentiment-analysis-using-deep
Repo
Framework

Multi-View Intact Space Learning


Title	Multi-View Intact Space Learning
Authors	Chang Xu, Dacheng Tao, Chao Xu
Abstract	It is practical to assume that an individual view is unlikely to be sufficient for effective multi-view learning. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose the Multi-view Intact Space Learning (MISL) algorithm, which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Even though each view on its own is insufficient, we show theoretically that by combing multiple views we can obtain abundant information for latent intact space learning. Employing the Cauchy loss (a technique used in statistical learning) as the error measurement strengthens robustness to outliers. We propose a new definition of multi-view stability and then derive the generalization error bound based on multi-view stability and Rademacher complexity, and show that the complementarity between multiple views is beneficial for the stability and generalization. MISL is efficiently optimized using a novel Iteratively Reweight Residuals (IRR) technique, whose convergence is theoretically analyzed. Experiments on synthetic data and real-world datasets demonstrate that MISL is an effective and promising algorithm for practical applications.
Tasks	MULTI-VIEW LEARNING
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02340v1
PDF	http://arxiv.org/pdf/1904.02340v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-intact-space-learning
Repo
Framework

A Large-scale Dataset for Argument Quality Ranking: Construction and Analysis


Title	A Large-scale Dataset for Argument Quality Ranking: Construction and Analysis
Authors	Shai Gretz, Roni Friedman, Edo Cohen-Karlik, Assaf Toledo, Dan Lahav, Ranit Aharonov, Noam Slonim
Abstract	Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset annotated for point-wise argument quality, larger by a factor of five than previously released datasets. Moreover, we address the core issue of inducing a labeled score from crowd annotations by performing a comprehensive evaluation of different approaches to this problem. In addition, we analyze the quality dimensions that characterize this dataset. Finally, we present a neural method for argument quality ranking, which outperforms several baselines on our own dataset, as well as previous methods published for another dataset.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11408v1
PDF	https://arxiv.org/pdf/1911.11408v1.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-dataset-for-argument-quality
Repo
Framework

Toward XAI for Intelligent Tutoring Systems: A Case Study


Title	Toward XAI for Intelligent Tutoring Systems: A Case Study
Authors	Vanessa Putnam, Lea Riegel, Cristina Conati
Abstract	Our research is a step toward understanding when explanations of AI-driven hints and feedback are useful in Intelligent Tutoring Systems (ITS).
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04464v1
PDF	https://arxiv.org/pdf/1912.04464v1.pdf
PWC	https://paperswithcode.com/paper/toward-xai-for-intelligent-tutoring-systems-a
Repo
Framework

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection


Title	TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection
Authors	Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun
Abstract	Current state-of-the-art approaches for spatio-temporal action detection have achieved impressive results but remain unsatisfactory for temporal extent detection. The main reason comes from that, there are some ambiguous states similar to the real actions which may be treated as target actions even by a well-trained network. In this paper, we define these ambiguous samples as “transitional states”, and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states. The proposed TACNet includes two main components, i.e., temporal context detector and transition-aware classifier. The temporal context detector can extract long-term context information with constant time complexity by constructing a recurrent network. The transition-aware classifier can further distinguish transitional states by classifying action and transitional states simultaneously. Therefore, the proposed TACNet can substantially improve the performance of spatio-temporal action detection. We extensively evaluate the proposed TACNet on UCF101-24 and J-HMDB datasets. The experimental results demonstrate that TACNet obtains competitive performance on JHMDB and significantly outperforms the state-of-the-art methods on the untrimmed UCF101-24 in terms of both frame-mAP and video-mAP.
Tasks	Action Detection
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13417v1
PDF	https://arxiv.org/pdf/1905.13417v1.pdf
PWC	https://paperswithcode.com/paper/tacnet-transition-aware-context-network-for-1
Repo
Framework