Paper Group ANR 376
Logic could be learned from images. Improving the Gating Mechanism of Recurrent Neural Networks. A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis. Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information. Automatic Short Answer Gradi …
Logic could be learned from images
Title | Logic could be learned from images |
Authors | Qian Guo, Yuhua Qian, Xinyan Liang, Yanhong She, Deyu Li, Jiye Liang |
Abstract | Logic reasoning is a significant ability of human intelligence and also an important task in artificial intelligence. The existing logic reasoning methods, quite often, need to design some reasoning patterns beforehand. This has led to an interesting question: can logic reasoning patterns be directly learned from given data? The problem is termed as a data concept logic (DCL). In this study, a learning logic task from images, just a LiLi task, first is proposed. This task is to learn and reason the relation between two input images and one output image, without presetting any reasoning patterns. As a preliminary exploration, we design six LiLi data sets (Bitwise And, Bitwise Or, Bitwise Xor, Addition, Subtraction and Multiplication), in which each image is embedded with a n-digit number. It is worth noting that a learning model beforehand does not know the meaning of the n-digit number embedded in images and relation between the input images and the output image. In order to tackle the task, in this work we use many typical neural network models and produce fruitful results. However, these models have the poor performances on the difficult logic task. For furthermore addressing this task, a novel network framework called a divide and conquer model (DCM) by adding some prior information is designed, achieving a high testing accuracy. |
Tasks | |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.01931v1 |
https://arxiv.org/pdf/1908.01931v1.pdf | |
PWC | https://paperswithcode.com/paper/logic-could-be-learned-from-images |
Repo | |
Framework | |
Improving the Gating Mechanism of Recurrent Neural Networks
Title | Improving the Gating Mechanism of Recurrent Neural Networks |
Authors | Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu |
Abstract | Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved. |
Tasks | Image Classification, Language Modelling, Sequential Image Classification |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09890v1 |
https://arxiv.org/pdf/1910.09890v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-gating-mechanism-of-recurrent-1 |
Repo | |
Framework | |
A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis
Title | A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis |
Authors | Hideaki Hayashi, Taro Shibanoki, Keisuke Shima, Yuichi Kurita, Toshio Tsuji |
Abstract | This paper proposes a probabilistic neural network developed on the basis of time-series discriminant component analysis (TSDCA) that can be used to classify high-dimensional time-series patterns. TSDCA involves the compression of high-dimensional time series into a lower-dimensional space using a set of orthogonal transformations and the calculation of posterior probabilities based on a continuous-density hidden Markov model with a Gaussian mixture model expressed in the reduced-dimensional space. The analysis can be incorporated into a neural network, which is named a time-series discriminant component network (TSDCN), so that parameters of dimensionality reduction and classification can be obtained simultaneously as network coefficients according to a backpropagation through time-based learning algorithm with the Lagrange multiplier method. The TSDCN is considered to enable high-accuracy classification of high-dimensional time-series patterns and to reduce the computation time taken for network training. The validity of the TSDCN is demonstrated for high-dimensional artificial data and EEG signals in the experiments conducted during the study. |
Tasks | Dimensionality Reduction, EEG, Time Series |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06009v1 |
https://arxiv.org/pdf/1911.06009v1.pdf | |
PWC | https://paperswithcode.com/paper/a-recurrent-probabilistic-neural-network-with |
Repo | |
Framework | |
Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information
Title | Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information |
Authors | Róbert-Adrian Rill |
Abstract | In this technical report we investigate speed estimation of the ego-vehicle on the KITTI benchmark using state-of-the-art deep neural network based optical flow and single-view depth prediction methods. Using a straightforward intuitive approach and approximating a single scale factor, we evaluate several application schemes of the deep networks and formulate meaningful conclusions such as: combining depth information with optical flow improves speed estimation accuracy as opposed to using optical flow alone; the quality of the deep neural network methods influences speed estimation performance; using the depth and optical flow results from smaller crops of wide images degrades performance. With these observations in mind, we achieve a RMSE of less than 1 m/s for vehicle speed estimation using monocular images as input from recordings of the KITTI benchmark. Limitations and possible future directions are discussed as well. |
Tasks | Depth Estimation, Optical Flow Estimation |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.06989v1 |
https://arxiv.org/pdf/1907.06989v1.pdf | |
PWC | https://paperswithcode.com/paper/speed-estimation-evaluation-on-the-kitti |
Repo | |
Framework | |
Automatic Short Answer Grading via Multiway Attention Networks
Title | Automatic Short Answer Grading via Multiway Attention Networks |
Authors | Tiaoqiao Liu, Wenbiao Ding, Zhiwei Wang, Jiliang Tang, Gale Yan Huang, Zitao Liu |
Abstract | Automatic short answer grading (ASAG), which autonomously score student answers according to reference answers, provides a cost-effective and consistent approach to teaching professionals and can reduce their monotonous and tedious grading workloads. However, ASAG is a very challenging task due to two reasons: (1) student answers are made up of free text which requires a deep semantic understanding; and (2) the questions are usually open-ended and across many domains in K-12 scenarios. In this paper, we propose a generalized end-to-end ASAG learning framework which aims to (1) autonomously extract linguistic information from both student and reference answers; and (2) accurately model the semantic relations between free-text student and reference answers in open-ended domain. The proposed ASAG model is evaluated on a large real-world K-12 dataset and can outperform the state-of-the-art baselines in terms of various evaluation metrics. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10166v1 |
https://arxiv.org/pdf/1909.10166v1.pdf | |
PWC | https://paperswithcode.com/paper/190910166 |
Repo | |
Framework | |
JSCN: Joint Spectral Convolutional Network for Cross Domain Recommendation
Title | JSCN: Joint Spectral Convolutional Network for Cross Domain Recommendation |
Authors | Zhiwei Liu, Lei Zheng, Jiawei Zhang, Jiayu Han, Philip S. Yu |
Abstract | Cross-domain recommendation can alleviate the data sparsity problem in recommender systems. To transfer the knowledge from one domain to another, one can either utilize the neighborhood information or learn a direct mapping function. However, all existing methods ignore the high-order connectivity information in cross-domain recommendation area and suffer from the domain-incompatibility problem. In this paper, we propose a \textbf{J}oint \textbf{S}pectral \textbf{C}onvolutional \textbf{N}etwork (JSCN) for cross-domain recommendation. JSCN will simultaneously operate multi-layer spectral convolutions on different graphs, and jointly learn a domain-invariant user representation with a domain adaptive user mapping module. As a result, the high-order comprehensive connectivity information can be extracted by the spectral convolutions and the information can be transferred across domains with the domain-invariant user mapping. The domain adaptive user mapping module can help the incompatible domains to transfer the knowledge across each other. Extensive experiments on $24$ Amazon rating datasets show the effectiveness of JSCN in the cross-domain recommendation, with $9.2%$ improvement on recall and $36.4%$ improvement on MAP compared with state-of-the-art methods. Our code is available online ~\footnote{https://github.com/JimLiu96/JSCN}. |
Tasks | Recommendation Systems |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08219v1 |
https://arxiv.org/pdf/1910.08219v1.pdf | |
PWC | https://paperswithcode.com/paper/jscn-joint-spectral-convolutional-network-for |
Repo | |
Framework | |
RNN Architecture Learning with Sparse Regularization
Title | RNN Architecture Learning with Sparse Regularization |
Authors | Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith |
Abstract | Neural models for NLP typically use large numbers of parameters to reach state-of-the-art performance, which can lead to excessive memory usage and increased runtime. We present a structure learning method for learning sparse, parameter-efficient NLP models. Our method applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state automata (WFSAs). We take advantage of rational RNNs’ natural grouping of the weights, so the group lasso penalty directly removes WFSA states, substantially reducing the number of parameters in the model. Our experiments on a number of sentiment analysis datasets, using both GloVe and BERT embeddings, show that our approach learns neural structures which have fewer parameters without sacrificing performance relative to parameter-rich baselines. Our method also highlights the interpretable properties of rational RNNs. We show that sparsifying such models makes them easier to visualize, and we present models that rely exclusively on as few as three WFSAs after pruning more than 90% of the weights. We publicly release our code. |
Tasks | Sentiment Analysis |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03011v1 |
https://arxiv.org/pdf/1909.03011v1.pdf | |
PWC | https://paperswithcode.com/paper/rnn-architecture-learning-with-sparse |
Repo | |
Framework | |
An Underwater Image Enhancement Benchmark Dataset and Beyond
Title | An Underwater Image Enhancement Benchmark Dataset and Beyond |
Authors | Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, Dacheng Tao |
Abstract | Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images acquired in the wild and how we could gauge the progress in the field. To bridge this gap, we present the first comprehensive perceptual study and analysis of underwater image enhancement using large-scale real-world images. In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images. We treat the rest 60 underwater images which cannot obtain satisfactory reference images as challenging data. Using this dataset, we conduct a comprehensive study of the state-of-the-art underwater image enhancement algorithms qualitatively and quantitatively. In addition, we propose an underwater image enhancement network (called Water-Net) trained on this benchmark as a baseline, which indicates the generalization of the proposed UIEB for training Convolutional Neural Networks (CNNs). The benchmark evaluations and the proposed Water-Net demonstrate the performance and limitations of state-of-the-art algorithms, which shed light on future research in underwater image enhancement. The dataset and code are available at https://li-chongyi.github.io/proj_benchmark.html. |
Tasks | Image Enhancement |
Published | 2019-01-11 |
URL | https://arxiv.org/abs/1901.05495v2 |
https://arxiv.org/pdf/1901.05495v2.pdf | |
PWC | https://paperswithcode.com/paper/an-underwater-image-enhancement-benchmark |
Repo | |
Framework | |
Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders
Title | Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders |
Authors | Taichi Sumi, Brian Kenji Iwana, Hideaki Hayashi, Seiichi Uchida |
Abstract | This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06142v1 |
https://arxiv.org/pdf/1906.06142v1.pdf | |
PWC | https://paperswithcode.com/paper/modality-conversion-of-handwritten-patterns |
Repo | |
Framework | |
A Method for Arbitrary Instance Style Transfer
Title | A Method for Arbitrary Instance Style Transfer |
Authors | Zhifeng Yu, Yusheng Wu, Tianyou Wang |
Abstract | The ability to synthesize style and content of different images to form a visually coherent image holds great promise in various applications such as stylistic painting, design prototyping, image editing, and augmented reality. However, the majority of works in image style transfer have focused on transferring the style of an image to the entirety of another image, and only a very small number of works have experimented on methods to transfer style to an instance of another image. Researchers have proposed methods to circumvent the difficulty of transferring style to an instance in an arbitrary shape. In this paper, we propose a topologically inspired algorithm called Forward Stretching to tackle this problem by transforming an instance into a tensor representation, which allows us to transfer style to this instance itself directly. Forward Stretching maps pixels to specific positions and interpolate values between pixels to transform an instance to a tensor. This algorithm allows us to introduce a method to transfer arbitrary style to an instance in an arbitrary shape. We showcase the results of our method in this paper. |
Tasks | Style Transfer |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06347v1 |
https://arxiv.org/pdf/1912.06347v1.pdf | |
PWC | https://paperswithcode.com/paper/a-method-for-arbitrary-instance-style |
Repo | |
Framework | |
Galaxy Learning – A Position Paper
Title | Galaxy Learning – A Position Paper |
Authors | Chao Wu, Jun Xiao, Gang Huang, Fei Wu |
Abstract | The recent rapid development of artificial intelligence (AI, mainly driven by machine learning research, especially deep learning) has achieved phenomenal success in various applications. However, to further apply AI technologies in real-world context, several significant issues regarding the AI ecosystem should be addressed. We identify the main issues as data privacy, ownership, and exchange, which are difficult to be solved with the current centralized paradigm of machine learning training methodology. As a result, we propose a novel model training paradigm based on blockchain, named Galaxy Learning, which aims to train a model with distributed data and to reserve the data ownership for their owners. In this new paradigm, encrypted models are moved around instead, and are federated once trained. Model training, as well as the communication, is achieved with blockchain and its smart contracts. Pricing of training data is determined by its contribution, and therefore it is not about the exchange of data ownership. In this position paper, we describe the motivation, paradigm, design, and challenges as well as opportunities of Galaxy Learning. |
Tasks | |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1905.00753v1 |
http://arxiv.org/pdf/1905.00753v1.pdf | |
PWC | https://paperswithcode.com/paper/190500753 |
Repo | |
Framework | |
Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN
Title | Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN |
Authors | Duy Tin Vo, Richard Khoury |
Abstract | Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These methods are usually trained and tested on a huge amount of private data, then used and evaluated as off-the-shelf packages by other researchers using their own datasets, and consequently the various results published are not directly comparable. In this paper, we first create a new massive labelled dataset based on one year of Twitter data. We use this dataset to test several existing language identification systems, in order to obtain a set of coherent benchmarks, and we make our dataset publicly available so that others can add to this set of benchmarks. Finally, we propose a shallow but efficient neural LID system, which is a ngram-regional convolution neural network enhanced with an attention mechanism. Experimental results show that our architecture is able to predict tens of thousands of samples per second and surpasses all state-of-the-art systems with an improvement of 5%. |
Tasks | Language Identification |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06748v1 |
https://arxiv.org/pdf/1910.06748v1.pdf | |
PWC | https://paperswithcode.com/paper/language-identification-on-massive-datasets |
Repo | |
Framework | |
Predicting video saliency using crowdsourced mouse-tracking data
Title | Predicting video saliency using crowdsourced mouse-tracking data |
Authors | Vitaliy Lyudvichenko, Dmitriy Vatolin |
Abstract | This paper presents a new way of getting high-quality saliency maps for video, using a cheaper alternative to eye-tracking data. We designed a mouse-contingent video viewing system which simulates the viewers’ peripheral vision based on the position of the mouse cursor. The system enables the use of mouse-tracking data recorded from an ordinary computer mouse as an alternative to real gaze fixations recorded by a more expensive eye-tracker. We developed a crowdsourcing system that enables the collection of such mouse-tracking data at large scale. Using the collected mouse-tracking data we showed that it can serve as an approximation of eye-tracking data. Moreover, trying to increase the efficiency of collected mouse-tracking data we proposed a novel deep neural network algorithm that improves the quality of mouse-tracking saliency maps. |
Tasks | Eye Tracking |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00480v1 |
https://arxiv.org/pdf/1907.00480v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-video-saliency-using-crowdsourced |
Repo | |
Framework | |
Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius
Title | Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius |
Authors | Junjie Jiang, Ying-Cheng Lai |
Abstract | A common difficulty in applications of machine learning is the lack of any general principle for guiding the choices of key parameters of the underlying neural network. Focusing on a class of recurrent neural networks - reservoir computing systems that have recently been exploited for model-free prediction of nonlinear dynamical systems, we uncover a surprising phenomenon: the emergence of an interval in the spectral radius of the neural network in which the prediction error is minimized. In a three-dimensional representation of the error versus time and spectral radius, the interval corresponds to the bottom region of a “valley.” Such a valley arises for a variety of spatiotemporal dynamical systems described by nonlinear partial differential equations, regardless of the structure and the edge-weight distribution of the underlying reservoir network. We also find that, while the particular location and size of the valley would depend on the details of the target system to be predicted, the interval tends to be larger for undirected than for directed networks. The valley phenomenon can be beneficial to the design of optimal reservoir computing, representing a small step forward in understanding these machine-learning systems. |
Tasks | |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04426v1 |
https://arxiv.org/pdf/1910.04426v1.pdf | |
PWC | https://paperswithcode.com/paper/model-free-prediction-of-spatiotemporal |
Repo | |
Framework | |
Deep Eyedentification: Biometric Identification using Micro-Movements of the Eye
Title | Deep Eyedentification: Biometric Identification using Micro-Movements of the Eye |
Authors | Lena A. Jäger, Silvia Makowski, Paul Prasse, Sascha Liehr, Maximilian Seidler, Tobias Scheffer |
Abstract | We study involuntary micro-movements of the eye for biometric identification. While prior studies extract lower-frequency macro-movements from the output of video-based eye-tracking systems and engineer explicit features of these macro-movements, we develop a deep convolutional architecture that processes the raw eye-tracking signal. Compared to prior work, the network attains a lower error rate by one order of magnitude and is faster by two orders of magnitude: it identifies users accurately within seconds. |
Tasks | Eye Tracking |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.11889v3 |
https://arxiv.org/pdf/1906.11889v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-eyedentification-biometric |
Repo | |
Framework | |