May 7, 2019

3071 words 15 mins read

Paper Group AWR 47

The Option-Critic Architecture. Non-negative Factorization of the Occurrence Tensor from Financial Contracts. Detecting Vanishing Points using Global Image Context in a Non-Manhattan World. Top-N Recommendation on Graphs. A Diagram Is Worth A Dozen Images. STFCN: Spatio-Temporal FCN for Semantic Video Segmentation. Deep Pyramidal Residual Networks. …

The Option-Critic Architecture


Title	The Option-Critic Architecture
Authors	Pierre-Luc Bacon, Jean Harb, Doina Precup
Abstract	Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.
Tasks
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05140v2
PDF	http://arxiv.org/pdf/1609.05140v2.pdf
PWC	https://paperswithcode.com/paper/the-option-critic-architecture
Repo	https://github.com/UWaterloo-ASL/option-critic-architecture
Framework	tf

Non-negative Factorization of the Occurrence Tensor from Financial Contracts


Title	Non-negative Factorization of the Occurrence Tensor from Financial Contracts
Authors	Zheng Xu, Furong Huang, Louiqa Raschid, Tom Goldstein
Abstract	We propose an algorithm for the non-negative factorization of an occurrence tensor built from heterogeneous networks. We use l0 norm to model sparse errors over discrete values (occurrences), and use decomposed factors to model the embedded groups of nodes. An efficient splitting method is developed to optimize the nonconvex and nonsmooth objective. We study both synthetic problems and a new dataset built from financial documents, resMBS.
Tasks
Published	2016-12-10
URL	http://arxiv.org/abs/1612.03350v1
PDF	http://arxiv.org/pdf/1612.03350v1.pdf
PWC	https://paperswithcode.com/paper/non-negative-factorization-of-the-occurrence
Repo	https://github.com/nightldj/tensor_notf
Framework	none

Detecting Vanishing Points using Global Image Context in a Non-Manhattan World


Title	Detecting Vanishing Points using Global Image Context in a Non-Manhattan World
Authors	Menghua Zhai, Scott Workman, Nathan Jacobs
Abstract	We propose a novel method for detecting horizontal vanishing points and the zenith vanishing point in man-made environments. The dominant trend in existing methods is to first find candidate vanishing points, then remove outliers by enforcing mutual orthogonality. Our method reverses this process: we propose a set of horizon line candidates and score each based on the vanishing points it contains. A key element of our approach is the use of global image context, extracted with a deep convolutional network, to constrain the set of candidates under consideration. Our method does not make a Manhattan-world assumption and can operate effectively on scenes with only a single horizontal vanishing point. We evaluate our approach on three benchmark datasets and achieve state-of-the-art performance on each. In addition, our approach is significantly faster than the previous best method.
Tasks	Horizon Line Estimation
Published	2016-08-19
URL	http://arxiv.org/abs/1608.05684v1
PDF	http://arxiv.org/pdf/1608.05684v1.pdf
PWC	https://paperswithcode.com/paper/detecting-vanishing-points-using-global-image
Repo	https://github.com/viibridges/gc-horizon-detector
Framework	none

Top-N Recommendation on Graphs


Title	Top-N Recommendation on Graphs
Authors	Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng
Abstract	Recommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach.
Tasks	Recommendation Systems
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08264v1
PDF	http://arxiv.org/pdf/1609.08264v1.pdf
PWC	https://paperswithcode.com/paper/top-n-recommendation-on-graphs
Repo	https://github.com/sckangz/CIKM16
Framework	none

A Diagram Is Worth A Dozen Images


Title	A Diagram Is Worth A Dozen Images
Authors	Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi
Abstract	Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs.
Tasks	Visual Question Answering
Published	2016-03-24
URL	http://arxiv.org/abs/1603.07396v1
PDF	http://arxiv.org/pdf/1603.07396v1.pdf
PWC	https://paperswithcode.com/paper/a-diagram-is-worth-a-dozen-images
Repo	https://github.com/allenai/dqa-net
Framework	tf

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation


Title	STFCN: Spatio-Temporal FCN for Semantic Video Segmentation
Authors	Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang
Abstract	This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets.
Tasks	Semantic Segmentation, Video Semantic Segmentation
Published	2016-08-21
URL	http://arxiv.org/abs/1608.05971v2
PDF	http://arxiv.org/pdf/1608.05971v2.pdf
PWC	https://paperswithcode.com/paper/stfcn-spatio-temporal-fcn-for-semantic-video
Repo	https://github.com/MohsenFayyaz89/STFCN
Framework	torch

Deep Pyramidal Residual Networks


Title	Deep Pyramidal Residual Networks
Authors	Dongyoon Han, Jiwhan Kim, Junmo Kim
Abstract	Deep convolutional neural networks (DCNNs) have shown remarkable performance in image classification tasks in recent years. Generally, deep neural network architectures are stacks consisting of a large number of convolutional layers, and they perform downsampling along the spatial dimension via pooling to reduce memory usage. Concurrently, the feature map dimension (i.e., the number of channels) is sharply increased at downsampling locations, which is essential to ensure effective performance because it increases the diversity of high-level attributes. This also applies to residual networks and is very closely related to their performance. In this research, instead of sharply increasing the feature map dimension at units that perform downsampling, we gradually increase the feature map dimension at all units to involve as many locations as possible. This design, which is discussed in depth together with our new insights, has proven to be an effective means of improving generalization ability. Furthermore, we propose a novel residual unit capable of further improving the classification accuracy with our new network architecture. Experiments on benchmark CIFAR-10, CIFAR-100, and ImageNet datasets have shown that our network architecture has superior generalization ability compared to the original residual networks. Code is available at https://github.com/jhkim89/PyramidNet}
Tasks	Image Classification
Published	2016-10-10
URL	http://arxiv.org/abs/1610.02915v4
PDF	http://arxiv.org/pdf/1610.02915v4.pdf
PWC	https://paperswithcode.com/paper/deep-pyramidal-residual-networks
Repo	https://github.com/Stick-To/PyramidNet-TF
Framework	tf

A Geometric Analysis of Phase Retrieval


Title	A Geometric Analysis of Phase Retrieval
Authors	Ju Sun, Qing Qu, John Wright
Abstract	Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = \mathbf a_k^* \mathbf x$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This generalized phase retrieval (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.
Tasks
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06664v3
PDF	http://arxiv.org/pdf/1602.06664v3.pdf
PWC	https://paperswithcode.com/paper/a-geometric-analysis-of-phase-retrieval
Repo	https://github.com/sunju/pr_plain
Framework	none

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos


Title	T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
Authors	Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang
Abstract	The state-of-the-art performance for object detection has been significantly improved over the past two years. Besides the introduction of powerful deep neural networks such as GoogleNet and VGG, novel object detection frameworks such as R-CNN and its successors, Fast R-CNN and Faster R-CNN, play an essential role in improving the state-of-the-art. Despite their effectiveness on still images, those frameworks are not specifically designed for object detection from videos. Temporal and contextual information of videos are not fully investigated and utilized. In this work, we propose a deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos. It is called T-CNN, i.e. tubelets with convolutional neueral networks. The proposed framework won the recently introduced object-detection-from-video (VID) task with provided data in the ImageNet Large-Scale Visual Recognition Challenge 2015 (ILSVRC2015).
Tasks	Object Detection, Object Recognition
Published	2016-04-09
URL	http://arxiv.org/abs/1604.02532v4
PDF	http://arxiv.org/pdf/1604.02532v4.pdf
PWC	https://paperswithcode.com/paper/t-cnn-tubelets-with-convolutional-neural
Repo	https://github.com/myfavouritekk/T-CNN
Framework	none

Nine Features in a Random Forest to Learn Taxonomical Semantic Relations


Title	Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
Authors	Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, Chu-Ren Huang
Abstract	ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias.
Tasks
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08702v1
PDF	http://arxiv.org/pdf/1603.08702v1.pdf
PWC	https://paperswithcode.com/paper/nine-features-in-a-random-forest-to-learn
Repo	https://github.com/esantus/ROOT9
Framework	none

Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures


Title	Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures
Authors	Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou
Abstract	Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet, ResNet, GoogleNet, include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead. This limits their practical use for training, optimization and memory efficiency. On the contrary, light-weight architectures, being proposed to address this issue, mainly suffer from low accuracy. These inefficiencies mostly stem from following an ad hoc procedure. We propose a simple architecture, called SimpleNet, based on a set of designing principles, with which we empirically show, a well-crafted yet simple and reasonably deep architecture can perform on par with deeper and more complex architectures. SimpleNet provides a good tradeoff between the computation/memory efficiency and the accuracy. Our simple 13-layer architecture outperforms most of the deeper and complex architectures to date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarks while having 2 to 25 times fewer number of parameters and operations. This makes it very handy for embedded system or system with computational and memory limitations. We achieved state-of-the-art result on CIFAR10 outperforming several heavier architectures, near state of the art on MNIST and competitive results on CIFAR100 and SVHN. Models are made available at: https://github.com/Coderx7/SimpleNet
Tasks	Image Classification
Published	2016-08-22
URL	http://arxiv.org/abs/1608.06037v7
PDF	http://arxiv.org/pdf/1608.06037v7.pdf
PWC	https://paperswithcode.com/paper/lets-keep-it-simple-using-simple
Repo	https://github.com/JavierAntoran/moby_dick_whale_audio_detection
Framework	pytorch

R-FCN: Object Detection via Region-based Fully Convolutional Networks


Title	R-FCN: Object Detection via Region-based Fully Convolutional Networks
Authors	Jifeng Dai, Yi Li, Kaiming He, Jian Sun
Abstract	We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn
Tasks	Object Detection, Real-Time Object Detection
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06409v2
PDF	http://arxiv.org/pdf/1605.06409v2.pdf
PWC	https://paperswithcode.com/paper/r-fcn-object-detection-via-region-based-fully
Repo	https://github.com/xiaoyongzhu/Deformable-ConvNets
Framework	mxnet

Predicting the direction of stock market prices using random forest


Title	Predicting the direction of stock market prices using random forest
Authors	Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey
Abstract	Predicting trends in stock market prices has been an area of interest for researchers for many years due to its complex and dynamic nature. Intrinsic volatility in stock market across the globe makes the task of prediction challenging. Forecasting and diffusion modeling, although effective can’t be the panacea to the diverse range of problems encountered in prediction, short-term or otherwise. Market risk, strongly correlated with forecasting errors, needs to be minimized to ensure minimal risk in investment. The authors propose to minimize forecasting error by treating the forecasting problem as a classification problem, a popular suite of algorithms in Machine learning. In this paper, we propose a novel way to minimize the risk of investment in stock market by predicting the returns of a stock using a class of powerful machine learning algorithms known as ensemble learning. Some of the technical indicators such as Relative Strength Index (RSI), stochastic oscillator etc are used as inputs to train our model. The learning model used is an ensemble of multiple decision trees. The algorithm is shown to outperform existing algo- rithms found in the literature. Out of Bag (OOB) error estimates have been found to be encouraging. Key Words: Random Forest Classifier, stock price forecasting, Exponential smoothing, feature extraction, OOB error and convergence.
Tasks
Published	2016-04-29
URL	http://arxiv.org/abs/1605.00003v1
PDF	http://arxiv.org/pdf/1605.00003v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-direction-of-stock-market
Repo	https://github.com/wpla/Khaidem.etal.2016_Analysis
Framework	none

Tutorial on Answering Questions about Images with Deep Learning


Title	Tutorial on Answering Questions about Images with Deep Learning
Authors	Mateusz Malinowski, Mario Fritz
Abstract	Together with the development of more accurate methods in Computer Vision and Natural Language Understanding, holistic architectures that answer on questions about the content of real-world images have emerged. In this tutorial, we build a neural-based approach to answer questions about images. We base our tutorial on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the models that we present here can achieve a competitive performance on both datasets, in fact, they are among the best methods that use a combination of LSTM with a global, full frame CNN representation of an image. We hope that after reading this tutorial, the reader will be able to use Deep Learning frameworks, such as Keras and introduced Kraino, to build various architectures that will lead to a further performance improvement on this challenging task.
Tasks	Visual Question Answering
Published	2016-10-04
URL	http://arxiv.org/abs/1610.01076v1
PDF	http://arxiv.org/pdf/1610.01076v1.pdf
PWC	https://paperswithcode.com/paper/tutorial-on-answering-questions-about-images
Repo	https://github.com/mateuszmalinowski/visual_turing_test-tutorial
Framework	none

Recurrent Neural Network for Text Classification with Multi-Task Learning


Title	Recurrent Neural Network for Text Classification with Multi-Task Learning
Authors	Pengfei Liu, Xipeng Qiu, Xuanjing Huang
Abstract	Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from insufficient training data. In this paper, we use the multi-task learning framework to jointly learn across multiple related tasks. Based on recurrent neural network, we propose three different mechanisms of sharing information to model text with task-specific and shared layers. The entire network is trained jointly on all these tasks. Experiments on four benchmark text classification tasks show that our proposed models can improve the performance of a task with the help of other related tasks.
Tasks	Multi-Task Learning, Text Classification
Published	2016-05-17
URL	http://arxiv.org/abs/1605.05101v1
PDF	http://arxiv.org/pdf/1605.05101v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-for-text
Repo	https://github.com/baixl/text_classification
Framework	pytorch