Paper Group AWR 47
The Option-Critic Architecture. Non-negative Factorization of the Occurrence Tensor from Financial Contracts. Detecting Vanishing Points using Global Image Context in a Non-Manhattan World. Top-N Recommendation on Graphs. A Diagram Is Worth A Dozen Images. STFCN: Spatio-Temporal FCN for Semantic Video Segmentation. Deep Pyramidal Residual Networks. …
The Option-Critic Architecture
Title | The Option-Critic Architecture |
Authors | Pierre-Luc Bacon, Jean Harb, Doina Precup |
Abstract | Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework. |
Tasks | |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05140v2 |
http://arxiv.org/pdf/1609.05140v2.pdf | |
PWC | https://paperswithcode.com/paper/the-option-critic-architecture |
Repo | https://github.com/UWaterloo-ASL/option-critic-architecture |
Framework | tf |
Non-negative Factorization of the Occurrence Tensor from Financial Contracts
Title | Non-negative Factorization of the Occurrence Tensor from Financial Contracts |
Authors | Zheng Xu, Furong Huang, Louiqa Raschid, Tom Goldstein |
Abstract | We propose an algorithm for the non-negative factorization of an occurrence tensor built from heterogeneous networks. We use l0 norm to model sparse errors over discrete values (occurrences), and use decomposed factors to model the embedded groups of nodes. An efficient splitting method is developed to optimize the nonconvex and nonsmooth objective. We study both synthetic problems and a new dataset built from financial documents, resMBS. |
Tasks | |
Published | 2016-12-10 |
URL | http://arxiv.org/abs/1612.03350v1 |
http://arxiv.org/pdf/1612.03350v1.pdf | |
PWC | https://paperswithcode.com/paper/non-negative-factorization-of-the-occurrence |
Repo | https://github.com/nightldj/tensor_notf |
Framework | none |
Detecting Vanishing Points using Global Image Context in a Non-Manhattan World
Title | Detecting Vanishing Points using Global Image Context in a Non-Manhattan World |
Authors | Menghua Zhai, Scott Workman, Nathan Jacobs |
Abstract | We propose a novel method for detecting horizontal vanishing points and the zenith vanishing point in man-made environments. The dominant trend in existing methods is to first find candidate vanishing points, then remove outliers by enforcing mutual orthogonality. Our method reverses this process: we propose a set of horizon line candidates and score each based on the vanishing points it contains. A key element of our approach is the use of global image context, extracted with a deep convolutional network, to constrain the set of candidates under consideration. Our method does not make a Manhattan-world assumption and can operate effectively on scenes with only a single horizontal vanishing point. We evaluate our approach on three benchmark datasets and achieve state-of-the-art performance on each. In addition, our approach is significantly faster than the previous best method. |
Tasks | Horizon Line Estimation |
Published | 2016-08-19 |
URL | http://arxiv.org/abs/1608.05684v1 |
http://arxiv.org/pdf/1608.05684v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-vanishing-points-using-global-image |
Repo | https://github.com/viibridges/gc-horizon-detector |
Framework | none |
Top-N Recommendation on Graphs
Title | Top-N Recommendation on Graphs |
Authors | Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng |
Abstract | Recommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach. |
Tasks | Recommendation Systems |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08264v1 |
http://arxiv.org/pdf/1609.08264v1.pdf | |
PWC | https://paperswithcode.com/paper/top-n-recommendation-on-graphs |
Repo | https://github.com/sckangz/CIKM16 |
Framework | none |
A Diagram Is Worth A Dozen Images
Title | A Diagram Is Worth A Dozen Images |
Authors | Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi |
Abstract | Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs. |
Tasks | Visual Question Answering |
Published | 2016-03-24 |
URL | http://arxiv.org/abs/1603.07396v1 |
http://arxiv.org/pdf/1603.07396v1.pdf | |
PWC | https://paperswithcode.com/paper/a-diagram-is-worth-a-dozen-images |
Repo | https://github.com/allenai/dqa-net |
Framework | tf |
STFCN: Spatio-Temporal FCN for Semantic Video Segmentation
Title | STFCN: Spatio-Temporal FCN for Semantic Video Segmentation |
Authors | Mohsen Fayyaz, Mohammad Hajizadeh Saffar, Mohammad Sabokrou, Mahmood Fathy, Reinhard Klette, Fay Huang |
Abstract | This paper presents a novel method to involve both spatial and temporal features for semantic video segmentation. Current work on convolutional neural networks(CNNs) has shown that CNNs provide advanced spatial features supporting a very good performance of solutions for both image and video analysis, especially for the semantic segmentation task. We investigate how involving temporal features also has a good effect on segmenting video data. We propose a module based on a long short-term memory (LSTM) architecture of a recurrent neural network for interpreting the temporal characteristics of video frames over time. Our system takes as input frames of a video and produces a correspondingly-sized output; for segmenting the video our method combines the use of three components: First, the regional spatial features of frames are extracted using a CNN; then, using LSTM the temporal features are added; finally, by deconvolving the spatio-temporal features we produce pixel-wise predictions. Our key insight is to build spatio-temporal convolutional networks (spatio-temporal CNNs) that have an end-to-end architecture for semantic video segmentation. We adapted fully some known convolutional network architectures (such as FCN-AlexNet and FCN-VGG16), and dilated convolution into our spatio-temporal CNNs. Our spatio-temporal CNNs achieve state-of-the-art semantic segmentation, as demonstrated for the Camvid and NYUDv2 datasets. |
Tasks | Semantic Segmentation, Video Semantic Segmentation |
Published | 2016-08-21 |
URL | http://arxiv.org/abs/1608.05971v2 |
http://arxiv.org/pdf/1608.05971v2.pdf | |
PWC | https://paperswithcode.com/paper/stfcn-spatio-temporal-fcn-for-semantic-video |
Repo | https://github.com/MohsenFayyaz89/STFCN |
Framework | torch |
Deep Pyramidal Residual Networks
Title | Deep Pyramidal Residual Networks |
Authors | Dongyoon Han, Jiwhan Kim, Junmo Kim |
Abstract | Deep convolutional neural networks (DCNNs) have shown remarkable performance in image classification tasks in recent years. Generally, deep neural network architectures are stacks consisting of a large number of convolutional layers, and they perform downsampling along the spatial dimension via pooling to reduce memory usage. Concurrently, the feature map dimension (i.e., the number of channels) is sharply increased at downsampling locations, which is essential to ensure effective performance because it increases the diversity of high-level attributes. This also applies to residual networks and is very closely related to their performance. In this research, instead of sharply increasing the feature map dimension at units that perform downsampling, we gradually increase the feature map dimension at all units to involve as many locations as possible. This design, which is discussed in depth together with our new insights, has proven to be an effective means of improving generalization ability. Furthermore, we propose a novel residual unit capable of further improving the classification accuracy with our new network architecture. Experiments on benchmark CIFAR-10, CIFAR-100, and ImageNet datasets have shown that our network architecture has superior generalization ability compared to the original residual networks. Code is available at https://github.com/jhkim89/PyramidNet} |
Tasks | Image Classification |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.02915v4 |
http://arxiv.org/pdf/1610.02915v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-pyramidal-residual-networks |
Repo | https://github.com/Stick-To/PyramidNet-TF |
Framework | tf |
A Geometric Analysis of Phase Retrieval
Title | A Geometric Analysis of Phase Retrieval |
Authors | Ju Sun, Qing Qu, John Wright |
Abstract | Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = \mathbf a_k^* \mathbf x$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06664v3 |
http://arxiv.org/pdf/1602.06664v3.pdf | |
PWC | https://paperswithcode.com/paper/a-geometric-analysis-of-phase-retrieval |
Repo | https://github.com/sunju/pr_plain |
Framework | none |
T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
Title | T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos |
Authors | Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang |
Abstract | The state-of-the-art performance for object detection has been significantly improved over the past two years. Besides the introduction of powerful deep neural networks such as GoogleNet and VGG, novel object detection frameworks such as R-CNN and its successors, Fast R-CNN and Faster R-CNN, play an essential role in improving the state-of-the-art. Despite their effectiveness on still images, those frameworks are not specifically designed for object detection from videos. Temporal and contextual information of videos are not fully investigated and utilized. In this work, we propose a deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos. It is called T-CNN, i.e. tubelets with convolutional neueral networks. The proposed framework won the recently introduced object-detection-from-video (VID) task with provided data in the ImageNet Large-Scale Visual Recognition Challenge 2015 (ILSVRC2015). |
Tasks | Object Detection, Object Recognition |
Published | 2016-04-09 |
URL | http://arxiv.org/abs/1604.02532v4 |
http://arxiv.org/pdf/1604.02532v4.pdf | |
PWC | https://paperswithcode.com/paper/t-cnn-tubelets-with-convolutional-neural |
Repo | https://github.com/myfavouritekk/T-CNN |
Framework | none |
Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
Title | Nine Features in a Random Forest to Learn Taxonomical Semantic Relations |
Authors | Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, Chu-Ren Huang |
Abstract | ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08702v1 |
http://arxiv.org/pdf/1603.08702v1.pdf | |
PWC | https://paperswithcode.com/paper/nine-features-in-a-random-forest-to-learn |
Repo | https://github.com/esantus/ROOT9 |
Framework | none |
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures
Title | Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures |
Authors | Seyyed Hossein Hasanpour, Mohammad Rouhani, Mohsen Fayyaz, Mohammad Sabokrou |
Abstract | Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet, ResNet, GoogleNet, include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead. This limits their practical use for training, optimization and memory efficiency. On the contrary, light-weight architectures, being proposed to address this issue, mainly suffer from low accuracy. These inefficiencies mostly stem from following an ad hoc procedure. We propose a simple architecture, called SimpleNet, based on a set of designing principles, with which we empirically show, a well-crafted yet simple and reasonably deep architecture can perform on par with deeper and more complex architectures. SimpleNet provides a good tradeoff between the computation/memory efficiency and the accuracy. Our simple 13-layer architecture outperforms most of the deeper and complex architectures to date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarks while having 2 to 25 times fewer number of parameters and operations. This makes it very handy for embedded system or system with computational and memory limitations. We achieved state-of-the-art result on CIFAR10 outperforming several heavier architectures, near state of the art on MNIST and competitive results on CIFAR100 and SVHN. Models are made available at: https://github.com/Coderx7/SimpleNet |
Tasks | Image Classification |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06037v7 |
http://arxiv.org/pdf/1608.06037v7.pdf | |
PWC | https://paperswithcode.com/paper/lets-keep-it-simple-using-simple |
Repo | https://github.com/JavierAntoran/moby_dick_whale_audio_detection |
Framework | pytorch |
R-FCN: Object Detection via Region-based Fully Convolutional Networks
Title | R-FCN: Object Detection via Region-based Fully Convolutional Networks |
Authors | Jifeng Dai, Yi Li, Kaiming He, Jian Sun |
Abstract | We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06409v2 |
http://arxiv.org/pdf/1605.06409v2.pdf | |
PWC | https://paperswithcode.com/paper/r-fcn-object-detection-via-region-based-fully |
Repo | https://github.com/xiaoyongzhu/Deformable-ConvNets |
Framework | mxnet |
Predicting the direction of stock market prices using random forest
Title | Predicting the direction of stock market prices using random forest |
Authors | Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey |
Abstract | Predicting trends in stock market prices has been an area of interest for researchers for many years due to its complex and dynamic nature. Intrinsic volatility in stock market across the globe makes the task of prediction challenging. Forecasting and diffusion modeling, although effective can’t be the panacea to the diverse range of problems encountered in prediction, short-term or otherwise. Market risk, strongly correlated with forecasting errors, needs to be minimized to ensure minimal risk in investment. The authors propose to minimize forecasting error by treating the forecasting problem as a classification problem, a popular suite of algorithms in Machine learning. In this paper, we propose a novel way to minimize the risk of investment in stock market by predicting the returns of a stock using a class of powerful machine learning algorithms known as ensemble learning. Some of the technical indicators such as Relative Strength Index (RSI), stochastic oscillator etc are used as inputs to train our model. The learning model used is an ensemble of multiple decision trees. The algorithm is shown to outperform existing algo- rithms found in the literature. Out of Bag (OOB) error estimates have been found to be encouraging. Key Words: Random Forest Classifier, stock price forecasting, Exponential smoothing, feature extraction, OOB error and convergence. |
Tasks | |
Published | 2016-04-29 |
URL | http://arxiv.org/abs/1605.00003v1 |
http://arxiv.org/pdf/1605.00003v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-direction-of-stock-market |
Repo | https://github.com/wpla/Khaidem.etal.2016_Analysis |
Framework | none |
Tutorial on Answering Questions about Images with Deep Learning
Title | Tutorial on Answering Questions about Images with Deep Learning |
Authors | Mateusz Malinowski, Mario Fritz |
Abstract | Together with the development of more accurate methods in Computer Vision and Natural Language Understanding, holistic architectures that answer on questions about the content of real-world images have emerged. In this tutorial, we build a neural-based approach to answer questions about images. We base our tutorial on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the models that we present here can achieve a competitive performance on both datasets, in fact, they are among the best methods that use a combination of LSTM with a global, full frame CNN representation of an image. We hope that after reading this tutorial, the reader will be able to use Deep Learning frameworks, such as Keras and introduced Kraino, to build various architectures that will lead to a further performance improvement on this challenging task. |
Tasks | Visual Question Answering |
Published | 2016-10-04 |
URL | http://arxiv.org/abs/1610.01076v1 |
http://arxiv.org/pdf/1610.01076v1.pdf | |
PWC | https://paperswithcode.com/paper/tutorial-on-answering-questions-about-images |
Repo | https://github.com/mateuszmalinowski/visual_turing_test-tutorial |
Framework | none |
Recurrent Neural Network for Text Classification with Multi-Task Learning
Title | Recurrent Neural Network for Text Classification with Multi-Task Learning |
Authors | Pengfei Liu, Xipeng Qiu, Xuanjing Huang |
Abstract | Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from insufficient training data. In this paper, we use the multi-task learning framework to jointly learn across multiple related tasks. Based on recurrent neural network, we propose three different mechanisms of sharing information to model text with task-specific and shared layers. The entire network is trained jointly on all these tasks. Experiments on four benchmark text classification tasks show that our proposed models can improve the performance of a task with the help of other related tasks. |
Tasks | Multi-Task Learning, Text Classification |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05101v1 |
http://arxiv.org/pdf/1605.05101v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-network-for-text |
Repo | https://github.com/baixl/text_classification |
Framework | pytorch |