May 7, 2019

3071 words 15 mins read

Paper Group AWR 47

Paper Group AWR 47

The Option-Critic Architecture. Non-negative Factorization of the Occurrence Tensor from Financial Contracts. Detecting Vanishing Points using Global Image Context in a Non-Manhattan World. Top-N Recommendation on Graphs. A Diagram Is Worth A Dozen Images. STFCN: Spatio-Temporal FCN for Semantic Video Segmentation. Deep Pyramidal Residual Networks. …

The Option-Critic Architecture

Title The Option-Critic Architecture
Authors Pierre-Luc Bacon, Jean Harb, Doina Precup
Abstract Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.
Tasks
Published 2016-09-16
URL http://arxiv.org/abs/1609.05140v2
PDF http://arxiv.org/pdf/1609.05140v2.pdf
PWC https://paperswithcode.com/paper/the-option-critic-architecture
Repo https://github.com/UWaterloo-ASL/option-critic-architecture
Framework tf

Non-negative Factorization of the Occurrence Tensor from Financial Contracts

Title Non-negative Factorization of the Occurrence Tensor from Financial Contracts
Authors Zheng Xu, Furong Huang, Louiqa Raschid, Tom Goldstein
Abstract We propose an algorithm for the non-negative factorization of an occurrence tensor built from heterogeneous networks. We use l0 norm to model sparse errors over discrete values (occurrences), and use decomposed factors to model the embedded groups of nodes. An efficient splitting method is developed to optimize the nonconvex and nonsmooth objective. We study both synthetic problems and a new dataset built from financial documents, resMBS.
Tasks
Published 2016-12-10
URL http://arxiv.org/abs/1612.03350v1
PDF http://arxiv.org/pdf/1612.03350v1.pdf
PWC https://paperswithcode.com/paper/non-negative-factorization-of-the-occurrence
Repo https://github.com/nightldj/tensor_notf
Framework none

Detecting Vanishing Points using Global Image Context in a Non-Manhattan World

Title Detecting Vanishing Points using Global Image Context in a Non-Manhattan World
Authors Menghua Zhai, Scott Workman, Nathan Jacobs
Abstract We propose a novel method for detecting horizontal vanishing points and the zenith vanishing point in man-made environments. The dominant trend in existing methods is to first find candidate vanishing points, then remove outliers by enforcing mutual orthogonality. Our method reverses this process: we propose a set of horizon line candidates and score each based on the vanishing points it contains. A key element of our approach is the use of global image context, extracted with a deep convolutional network, to constrain the set of candidates under consideration. Our method does not make a Manhattan-world assumption and can operate effectively on scenes with only a single horizontal vanishing point. We evaluate our approach on three benchmark datasets and achieve state-of-the-art performance on each. In addition, our approach is significantly faster than the previous best method.
Tasks Horizon Line Estimation
Published 2016-08-19
URL http://arxiv.org/abs/1608.05684v1
PDF http://arxiv.org/pdf/1608.05684v1.pdf
PWC https://paperswithcode.com/paper/detecting-vanishing-points-using-global-image
Repo https://github.com/viibridges/gc-horizon-detector
Framework none

Top-N Recommendation on Graphs

Title Top-N Recommendation on Graphs
Authors Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng
Abstract Recommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach.
Tasks Recommendation Systems
Published 2016-09-27
URL http://arxiv.org/abs/1609.08264v1
PDF http://arxiv.org/pdf/1609.08264v1.pdf
PWC https://paperswithcode.com/paper/top-n-recommendation-on-graphs
Repo https://github.com/sckangz/CIKM16
Framework none

A Diagram Is Worth A Dozen Images

Title A Diagram Is Worth A Dozen Images
Authors Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi
Abstract Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.
Tasks Visual Question Answering
Published 2016-03-24
URL http://arxiv.org/abs/1603.07396v1
PDF http://arxiv.org/pdf/1603.07396v1.pdf
PWC https://paperswithcode.com/paper/a-diagram-is-worth-a-dozen-images
Repo https://github.com/allenai/dqa-net
Framework tf

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

Title STFCN: Spatio-Temporal FCN for Semantic Video Segmentation
Authors Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang
Abstract This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets.
Tasks Semantic Segmentation, Video Semantic Segmentation
Published 2016-08-21
URL http://arxiv.org/abs/1608.05971v2
PDF http://arxiv.org/pdf/1608.05971v2.pdf
PWC https://paperswithcode.com/paper/stfcn-spatio-temporal-fcn-for-semantic-video
Repo https://github.com/MohsenFayyaz89/STFCN
Framework torch

Deep Pyramidal Residual Networks

Title Deep Pyramidal Residual Networks
Authors Dongyoon Han, Jiwhan Kim, Junmo Kim
Abstract Deep convolutional neural networks (DCNNs) have shown remarkable performance in image classification tasks in recent years. Generally, deep neural network architectures are stacks consisting of a large number of convolutional layers, and they perform downsampling along the spatial dimension via pooling to reduce memory usage. Concurrently, the feature map dimension (i.e., the number of channels) is sharply increased at downsampling locations, which is essential to ensure effective performance because it increases the diversity of high-level attributes. This also applies to residual networks and is very closely related to their performance. In this research, instead of sharply increasing the feature map dimension at units that perform downsampling, we gradually increase the feature map dimension at all units to involve as many locations as possible. This design, which is discussed in depth together with our new insights, has proven to be an effective means of improving generalization ability. Furthermore, we propose a novel residual unit capable of further improving the classification accuracy with our new network architecture. Experiments on benchmark CIFAR-10, CIFAR-100, and ImageNet datasets have shown that our network architecture has superior generalization ability compared to the original residual networks. Code is available at https://github.com/jhkim89/PyramidNet}
Tasks Image Classification
Published 2016-10-10
URL http://arxiv.org/abs/1610.02915v4
PDF http://arxiv.org/pdf/1610.02915v4.pdf
PWC https://paperswithcode.com/paper/deep-pyramidal-residual-networks
Repo https://github.com/Stick-To/PyramidNet-TF
Framework tf

A Geometric Analysis of Phase Retrieval

Title A Geometric Analysis of Phase Retrieval
Authors Ju Sun, Qing Qu, John Wright
Abstract Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = \mathbf a_k^* \mathbf x$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.
Tasks
Published 2016-02-22
URL http://arxiv.org/abs/1602.06664v3
PDF http://arxiv.org/pdf/1602.06664v3.pdf
PWC https://paperswithcode.com/paper/a-geometric-analysis-of-phase-retrieval
Repo https://github.com/sunju/pr_plain
Framework none

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

Title T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
Authors Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang
Abstract The state-of-the-art performance for object detection has been significantly improved over the past two years. Besides the introduction of powerful deep neural networks such as GoogleNet and VGG, novel object detection frameworks such as R-CNN and its successors, Fast R-CNN and Faster R-CNN, play an essential role in improving the state-of-the-art. Despite their effectiveness on still images, those frameworks are not specifically designed for object detection from videos. Temporal and contextual information of videos are not fully investigated and utilized. In this work, we propose a deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos. It is called T-CNN, i.e. tubelets with convolutional neueral networks. The proposed framework won the recently introduced object-detection-from-video (VID) task with provided data in the ImageNet Large-Scale Visual Recognition Challenge 2015 (ILSVRC2015).
Tasks Object Detection, Object Recognition
Published 2016-04-09
URL http://arxiv.org/abs/1604.02532v4
PDF http://arxiv.org/pdf/1604.02532v4.pdf
PWC https://paperswithcode.com/paper/t-cnn-tubelets-with-convolutional-neural
Repo https://github.com/myfavouritekk/T-CNN
Framework none

Nine Features in a Random Forest to Learn Taxonomical Semantic Relations

Title Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
Authors Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, Chu-Ren Huang
Abstract ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias.
Tasks
Published 2016-03-29
URL http://arxiv.org/abs/1603.08702v1
PDF http://arxiv.org/pdf/1603.08702v1.pdf
PWC https://paperswithcode.com/paper/nine-features-in-a-random-forest-to-learn
Repo https://github.com/esantus/ROOT9
Framework none

Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures

Title Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures
Authors Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou
Abstract Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet, ResNet, GoogleNet, include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead. This limits their practical use for training, optimization and memory efficiency. On the contrary, light-weight architectures, being proposed to address this issue, mainly suffer from low accuracy. These inefficiencies mostly stem from following an ad hoc procedure. We propose a simple architecture, called SimpleNet, based on a set of designing principles, with which we empirically show, a well-crafted yet simple and reasonably deep architecture can perform on par with deeper and more complex architectures. SimpleNet provides a good tradeoff between the computation/memory efficiency and the accuracy. Our simple 13-layer architecture outperforms most of the deeper and complex architectures to date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarks while having 2 to 25 times fewer number of parameters and operations. This makes it very handy for embedded system or system with computational and memory limitations. We achieved state-of-the-art result on CIFAR10 outperforming several heavier architectures, near state of the art on MNIST and competitive results on CIFAR100 and SVHN. Models are made available at: https://github.com/Coderx7/SimpleNet
Tasks Image Classification
Published 2016-08-22
URL http://arxiv.org/abs/1608.06037v7
PDF http://arxiv.org/pdf/1608.06037v7.pdf
PWC https://paperswithcode.com/paper/lets-keep-it-simple-using-simple
Repo https://github.com/JavierAntoran/moby_dick_whale_audio_detection
Framework pytorch

R-FCN: Object Detection via Region-based Fully Convolutional Networks

Title R-FCN: Object Detection via Region-based Fully Convolutional Networks
Authors Jifeng Dai, Yi Li, Kaiming He, Jian Sun
Abstract We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn
Tasks Object Detection, Real-Time Object Detection
Published 2016-05-20
URL http://arxiv.org/abs/1605.06409v2
PDF http://arxiv.org/pdf/1605.06409v2.pdf
PWC https://paperswithcode.com/paper/r-fcn-object-detection-via-region-based-fully
Repo https://github.com/xiaoyongzhu/Deformable-ConvNets
Framework mxnet

Predicting the direction of stock market prices using random forest

Title Predicting the direction of stock market prices using random forest
Authors Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey
Abstract Predicting trends in stock market prices has been an area of interest for researchers for many years due to its complex and dynamic nature. Intrinsic volatility in stock market across the globe makes the task of prediction challenging. Forecasting and diffusion modeling, although effective can’t be the panacea to the diverse range of problems encountered in prediction, short-term or otherwise. Market risk, strongly correlated with forecasting errors, needs to be minimized to ensure minimal risk in investment. The authors propose to minimize forecasting error by treating the forecasting problem as a classification problem, a popular suite of algorithms in Machine learning. In this paper, we propose a novel way to minimize the risk of investment in stock market by predicting the returns of a stock using a class of powerful machine learning algorithms known as ensemble learning. Some of the technical indicators such as Relative Strength Index (RSI), stochastic oscillator etc are used as inputs to train our model. The learning model used is an ensemble of multiple decision trees. The algorithm is shown to outperform existing algo- rithms found in the literature. Out of Bag (OOB) error estimates have been found to be encouraging. Key Words: Random Forest Classifier, stock price forecasting, Exponential smoothing, feature extraction, OOB error and convergence.
Tasks
Published 2016-04-29
URL http://arxiv.org/abs/1605.00003v1
PDF http://arxiv.org/pdf/1605.00003v1.pdf
PWC https://paperswithcode.com/paper/predicting-the-direction-of-stock-market
Repo https://github.com/wpla/Khaidem.etal.2016_Analysis
Framework none

Tutorial on Answering Questions about Images with Deep Learning

Title Tutorial on Answering Questions about Images with Deep Learning
Authors Mateusz Malinowski, Mario Fritz
Abstract Together with the development of more accurate methods in Computer Vision and Natural Language Understanding, holistic architectures that answer on questions about the content of real-world images have emerged. In this tutorial, we build a neural-based approach to answer questions about images. We base our tutorial on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the models that we present here can achieve a competitive performance on both datasets, in fact, they are among the best methods that use a combination of LSTM with a global, full frame CNN representation of an image. We hope that after reading this tutorial, the reader will be able to use Deep Learning frameworks, such as Keras and introduced Kraino, to build various architectures that will lead to a further performance improvement on this challenging task.
Tasks Visual Question Answering
Published 2016-10-04
URL http://arxiv.org/abs/1610.01076v1
PDF http://arxiv.org/pdf/1610.01076v1.pdf
PWC https://paperswithcode.com/paper/tutorial-on-answering-questions-about-images
Repo https://github.com/mateuszmalinowski/visual_turing_test-tutorial
Framework none

Recurrent Neural Network for Text Classification with Multi-Task Learning

Title Recurrent Neural Network for Text Classification with Multi-Task Learning
Authors Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Abstract Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from insufficient training data. In this paper, we use the multi-task learning framework to jointly learn across multiple related tasks. Based on recurrent neural network, we propose three different mechanisms of sharing information to model text with task-specific and shared layers. The entire network is trained jointly on all these tasks. Experiments on four benchmark text classification tasks show that our proposed models can improve the performance of a task with the help of other related tasks.
Tasks Multi-Task Learning, Text Classification
Published 2016-05-17
URL http://arxiv.org/abs/1605.05101v1
PDF http://arxiv.org/pdf/1605.05101v1.pdf
PWC https://paperswithcode.com/paper/recurrent-neural-network-for-text
Repo https://github.com/baixl/text_classification
Framework pytorch
comments powered by Disqus