January 30, 2020

3141 words 15 mins read

Paper Group ANR 376

Logic could be learned from images. Improving the Gating Mechanism of Recurrent Neural Networks. A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis. Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information. Automatic Short Answer Gradi …

Logic could be learned from images


Title	Logic could be learned from images
Authors	Qian Guo, Yuhua Qian, Xinyan Liang, Yanhong She, Deyu Li, Jiye Liang
Abstract	Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic (DCL). In this study, a learning logic task from images, just a LiLi task, first is proposed. This task is to learn and reason the relation between two input images and one output image, without presetting any reasoning patterns. As a preliminary exploration, we design six LiLi data sets (Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication), in which each image is embedded with a n-digit number. It is worth noting that a learning model beforehand does not know the meaning of the n-digit number embedded in images and relation between the input images and the output image. In order to tackle the task, in this work we use many typical neural network models and produce fruitful results. However, these models have the poor performances on the difficult logic task. For furthermore addressing this task, a novel network framework called a divide and conquer model (DCM) by adding some prior information is designed, achieving a high testing accuracy.
Tasks
Published	2019-08-06
URL	https://arxiv.org/abs/1908.01931v1
PDF	https://arxiv.org/pdf/1908.01931v1.pdf
PWC	https://paperswithcode.com/paper/logic-could-be-learned-from-images
Repo
Framework

Improving the Gating Mechanism of Recurrent Neural Networks


Title	Improving the Gating Mechanism of Recurrent Neural Networks
Authors	Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu
Abstract	Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.
Tasks	Image Classification, Language Modelling, Sequential Image Classification
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09890v1
PDF	https://arxiv.org/pdf/1910.09890v1.pdf
PWC	https://paperswithcode.com/paper/improving-the-gating-mechanism-of-recurrent-1
Repo
Framework

A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis


Title	A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis
Authors	Hideaki Hayashi, Taro Shibanoki, Keisuke Shima, Yuichi Kurita, Toshio Tsuji
Abstract	This paper proposes a probabilistic neural network developed on the basis of time-series discriminant component analysis (TSDCA) that can be used to classify high-dimensional time-series patterns. TSDCA involves the compression of high-dimensional time series into a lower-dimensional space using a set of orthogonal transformations and the calculation of posterior probabilities based on a continuous-density hidden Markov model with a Gaussian mixture model expressed in the reduced-dimensional space. The analysis can be incorporated into a neural network, which is named a time-series discriminant component network (TSDCN), so that parameters of dimensionality reduction and classification can be obtained simultaneously as network coefficients according to a backpropagation through time-based learning algorithm with the Lagrange multiplier method. The TSDCN is considered to enable high-accuracy classification of high-dimensional time-series patterns and to reduce the computation time taken for network training. The validity of the TSDCN is demonstrated for high-dimensional artificial data and EEG signals in the experiments conducted during the study.
Tasks	Dimensionality Reduction, EEG, Time Series
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06009v1
PDF	https://arxiv.org/pdf/1911.06009v1.pdf
PWC	https://paperswithcode.com/paper/a-recurrent-probabilistic-neural-network-with
Repo
Framework

Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information


Title	Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information
Authors	Róbert-Adrian Rill
Abstract	In this technical report we investigate speed estimation of the ego-vehicle on the KITTI benchmark using state-of-the-art deep neural network based optical flow and single-view depth prediction methods. Using a straightforward intuitive approach and approximating a single scale factor, we evaluate several application schemes of the deep networks and formulate meaningful conclusions such as: combining depth information with optical flow improves speed estimation accuracy as opposed to using optical flow alone; the quality of the deep neural network methods influences speed estimation performance; using the depth and optical flow results from smaller crops of wide images degrades performance. With these observations in mind, we achieve a RMSE of less than 1 m/s for vehicle speed estimation using monocular images as input from recordings of the KITTI benchmark. Limitations and possible future directions are discussed as well.
Tasks	Depth Estimation, Optical Flow Estimation
Published	2019-07-16
URL	https://arxiv.org/abs/1907.06989v1
PDF	https://arxiv.org/pdf/1907.06989v1.pdf
PWC	https://paperswithcode.com/paper/speed-estimation-evaluation-on-the-kitti
Repo
Framework

Automatic Short Answer Grading via Multiway Attention Networks


Title	Automatic Short Answer Grading via Multiway Attention Networks
Authors	Tiaoqiao Liu, Wenbiao Ding, Zhiwei Wang, Jiliang Tang, Gale Yan Huang, Zitao Liu
Abstract	Automatic short answer grading (ASAG), which autonomously score student answers according to reference answers, provides a cost-effective and consistent approach to teaching professionals and can reduce their monotonous and tedious grading workloads. However, ASAG is a very challenging task due to two reasons: (1) student answers are made up of free text which requires a deep semantic understanding; and (2) the questions are usually open-ended and across many domains in K-12 scenarios. In this paper, we propose a generalized end-to-end ASAG learning framework which aims to (1) autonomously extract linguistic information from both student and reference answers; and (2) accurately model the semantic relations between free-text student and reference answers in open-ended domain. The proposed ASAG model is evaluated on a large real-world K-12 dataset and can outperform the state-of-the-art baselines in terms of various evaluation metrics.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10166v1
PDF	https://arxiv.org/pdf/1909.10166v1.pdf
PWC	https://paperswithcode.com/paper/190910166
Repo
Framework

JSCN: Joint Spectral Convolutional Network for Cross Domain Recommendation


Title	JSCN: Joint Spectral Convolutional Network for Cross Domain Recommendation
Authors	Zhiwei Liu, Lei Zheng, Jiawei Zhang, Jiayu Han, Philip S. Yu
Abstract	Cross-domain recommendation can alleviate the data sparsity problem in recommender systems. To transfer the knowledge from one domain to another, one can either utilize the neighborhood information or learn a direct mapping function. However, all existing methods ignore the high-order connectivity information in cross-domain recommendation area and suffer from the domain-incompatibility problem. In this paper, we propose a \textbf{J}oint \textbf{S}pectral \textbf{C}onvolutional \textbf{N}etwork (JSCN) for cross-domain recommendation. JSCN will simultaneously operate multi-layer spectral convolutions on different graphs, and jointly learn a domain-invariant user representation with a domain adaptive user mapping module. As a result, the high-order comprehensive connectivity information can be extracted by the spectral convolutions and the information can be transferred across domains with the domain-invariant user mapping. The domain adaptive user mapping module can help the incompatible domains to transfer the knowledge across each other. Extensive experiments on $24$ Amazon rating datasets show the effectiveness of JSCN in the cross-domain recommendation, with $9.2%$ improvement on recall and $36.4%$ improvement on MAP compared with state-of-the-art methods. Our code is available online ~\footnote{https://github.com/JimLiu96/JSCN}.
Tasks	Recommendation Systems
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08219v1
PDF	https://arxiv.org/pdf/1910.08219v1.pdf
PWC	https://paperswithcode.com/paper/jscn-joint-spectral-convolutional-network-for
Repo
Framework

RNN Architecture Learning with Sparse Regularization


Title	RNN Architecture Learning with Sparse Regularization
Authors	Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith
Abstract	Neural models for NLP typically use large numbers of parameters to reach state-of-the-art performance, which can lead to excessive memory usage and increased runtime. We present a structure learning method for learning sparse, parameter-efficient NLP models. Our method applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state automata (WFSAs). We take advantage of rational RNNs’ natural grouping of the weights, so the group lasso penalty directly removes WFSA states, substantially reducing the number of parameters in the model. Our experiments on a number of sentiment analysis datasets, using both GloVe and BERT embeddings, show that our approach learns neural structures which have fewer parameters without sacrificing performance relative to parameter-rich baselines. Our method also highlights the interpretable properties of rational RNNs. We show that sparsifying such models makes them easier to visualize, and we present models that rely exclusively on as few as three WFSAs after pruning more than 90% of the weights. We publicly release our code.
Tasks	Sentiment Analysis
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03011v1
PDF	https://arxiv.org/pdf/1909.03011v1.pdf
PWC	https://paperswithcode.com/paper/rnn-architecture-learning-with-sparse
Repo
Framework

An Underwater Image Enhancement Benchmark Dataset and Beyond


Title	An Underwater Image Enhancement Benchmark Dataset and Beyond
Authors	Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, Dacheng Tao
Abstract	Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images acquired in the wild and how we could gauge the progress in the field. To bridge this gap, we present the first comprehensive perceptual study and analysis of underwater image enhancement using large-scale real-world images. In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images. We treat the rest 60 underwater images which cannot obtain satisfactory reference images as challenging data. Using this dataset, we conduct a comprehensive study of the state-of-the-art underwater image enhancement algorithms qualitatively and quantitatively. In addition, we propose an underwater image enhancement network (called Water-Net) trained on this benchmark as a baseline, which indicates the generalization of the proposed UIEB for training Convolutional Neural Networks (CNNs). The benchmark evaluations and the proposed Water-Net demonstrate the performance and limitations of state-of-the-art algorithms, which shed light on future research in underwater image enhancement. The dataset and code are available at https://li-chongyi.github.io/proj_benchmark.html.
Tasks	Image Enhancement
Published	2019-01-11
URL	https://arxiv.org/abs/1901.05495v2
PDF	https://arxiv.org/pdf/1901.05495v2.pdf
PWC	https://paperswithcode.com/paper/an-underwater-image-enhancement-benchmark
Repo
Framework

Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders


Title	Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders
Authors	Taichi Sumi, Brian Kenji Iwana, Hideaki Hayashi, Seiichi Uchida
Abstract	This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06142v1
PDF	https://arxiv.org/pdf/1906.06142v1.pdf
PWC	https://paperswithcode.com/paper/modality-conversion-of-handwritten-patterns
Repo
Framework

A Method for Arbitrary Instance Style Transfer


Title	A Method for Arbitrary Instance Style Transfer
Authors	Zhifeng Yu, Yusheng Wu, Tianyou Wang
Abstract	The ability to synthesize style and content of different images to form a visually coherent image holds great promise in various applications such as stylistic painting, design prototyping, image editing, and augmented reality. However, the majority of works in image style transfer have focused on transferring the style of an image to the entirety of another image, and only a very small number of works have experimented on methods to transfer style to an instance of another image. Researchers have proposed methods to circumvent the difficulty of transferring style to an instance in an arbitrary shape. In this paper, we propose a topologically inspired algorithm called Forward Stretching to tackle this problem by transforming an instance into a tensor representation, which allows us to transfer style to this instance itself directly. Forward Stretching maps pixels to specific positions and interpolate values between pixels to transform an instance to a tensor. This algorithm allows us to introduce a method to transfer arbitrary style to an instance in an arbitrary shape. We showcase the results of our method in this paper.
Tasks	Style Transfer
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06347v1
PDF	https://arxiv.org/pdf/1912.06347v1.pdf
PWC	https://paperswithcode.com/paper/a-method-for-arbitrary-instance-style
Repo
Framework

Galaxy Learning – A Position Paper


Title	Galaxy Learning – A Position Paper
Authors	Chao Wu, Jun Xiao, Gang Huang, Fei Wu
Abstract	The recent rapid development of artificial intelligence (AI, mainly driven by machine learning research, especially deep learning) has achieved phenomenal success in various applications. However, to further apply AI technologies in real-world context, several significant issues regarding the AI ecosystem should be addressed. We identify the main issues as data privacy, ownership, and exchange, which are difficult to be solved with the current centralized paradigm of machine learning training methodology. As a result, we propose a novel model training paradigm based on blockchain, named Galaxy Learning, which aims to train a model with distributed data and to reserve the data ownership for their owners. In this new paradigm, encrypted models are moved around instead, and are federated once trained. Model training, as well as the communication, is achieved with blockchain and its smart contracts. Pricing of training data is determined by its contribution, and therefore it is not about the exchange of data ownership. In this position paper, we describe the motivation, paradigm, design, and challenges as well as opportunities of Galaxy Learning.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1905.00753v1
PDF	http://arxiv.org/pdf/1905.00753v1.pdf
PWC	https://paperswithcode.com/paper/190500753
Repo
Framework

Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN


Title	Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN
Authors	Duy Tin Vo, Richard Khoury
Abstract	Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These methods are usually trained and tested on a huge amount of private data, then used and evaluated as off-the-shelf packages by other researchers using their own datasets, and consequently the various results published are not directly comparable. In this paper, we first create a new massive labelled dataset based on one year of Twitter data. We use this dataset to test several existing language identification systems, in order to obtain a set of coherent benchmarks, and we make our dataset publicly available so that others can add to this set of benchmarks. Finally, we propose a shallow but efficient neural LID system, which is a ngram-regional convolution neural network enhanced with an attention mechanism. Experimental results show that our architecture is able to predict tens of thousands of samples per second and surpasses all state-of-the-art systems with an improvement of 5%.
Tasks	Language Identification
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06748v1
PDF	https://arxiv.org/pdf/1910.06748v1.pdf
PWC	https://paperswithcode.com/paper/language-identification-on-massive-datasets
Repo
Framework

Predicting video saliency using crowdsourced mouse-tracking data


Title	Predicting video saliency using crowdsourced mouse-tracking data
Authors	Vitaliy Lyudvichenko, Dmitriy Vatolin
Abstract	This paper presents a new way of getting high-quality saliency maps for video, using a cheaper alternative to eye-tracking data. We designed a mouse-contingent video viewing system which simulates the viewers’ peripheral vision based on the position of the mouse cursor. The system enables the use of mouse-tracking data recorded from an ordinary computer mouse as an alternative to real gaze fixations recorded by a more expensive eye-tracker. We developed a crowdsourcing system that enables the collection of such mouse-tracking data at large scale. Using the collected mouse-tracking data we showed that it can serve as an approximation of eye-tracking data. Moreover, trying to increase the efficiency of collected mouse-tracking data we proposed a novel deep neural network algorithm that improves the quality of mouse-tracking saliency maps.
Tasks	Eye Tracking
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00480v1
PDF	https://arxiv.org/pdf/1907.00480v1.pdf
PWC	https://paperswithcode.com/paper/predicting-video-saliency-using-crowdsourced
Repo
Framework

Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius


Title	Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius
Authors	Junjie Jiang, Ying-Cheng Lai
Abstract	A common difficulty in applications of machine learning is the lack of any general principle for guiding the choices of key parameters of the underlying neural network. Focusing on a class of recurrent neural networks - reservoir computing systems that have recently been exploited for model-free prediction of nonlinear dynamical systems, we uncover a surprising phenomenon: the emergence of an interval in the spectral radius of the neural network in which the prediction error is minimized. In a three-dimensional representation of the error versus time and spectral radius, the interval corresponds to the bottom region of a “valley.” Such a valley arises for a variety of spatiotemporal dynamical systems described by nonlinear partial differential equations, regardless of the structure and the edge-weight distribution of the underlying reservoir network. We also find that, while the particular location and size of the valley would depend on the details of the target system to be predicted, the interval tends to be larger for undirected than for directed networks. The valley phenomenon can be beneficial to the design of optimal reservoir computing, representing a small step forward in understanding these machine-learning systems.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04426v1
PDF	https://arxiv.org/pdf/1910.04426v1.pdf
PWC	https://paperswithcode.com/paper/model-free-prediction-of-spatiotemporal
Repo
Framework

Deep Eyedentification: Biometric Identification using Micro-Movements of the Eye


Title	Deep Eyedentification: Biometric Identification using Micro-Movements of the Eye
Authors	Lena A. Jäger, Silvia Makowski, Paul Prasse, Sascha Liehr, Maximilian Seidler, Tobias Scheffer
Abstract	We study involuntary micro-movements of the eye for biometric identification. While prior studies extract lower-frequency macro-movements from the output of video-based eye-tracking systems and engineer explicit features of these macro-movements, we develop a deep convolutional architecture that processes the raw eye-tracking signal. Compared to prior work, the network attains a lower error rate by one order of magnitude and is faster by two orders of magnitude: it identifies users accurately within seconds.
Tasks	Eye Tracking
Published	2019-06-20
URL	https://arxiv.org/abs/1906.11889v3
PDF	https://arxiv.org/pdf/1906.11889v3.pdf
PWC	https://paperswithcode.com/paper/deep-eyedentification-biometric
Repo
Framework