July 29, 2019

3268 words 16 mins read

Paper Group AWR 125

Event Representations with Tensor-based Compositions. Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification. SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks. NormFace: L2 Hypersphere Embedding for Face Verification. Predicting Visual Features f …

Event Representations with Tensor-based Compositions


Title	Event Representations with Tensor-based Compositions
Authors	Noah Weber, Niranjan Balasubramanian, Nathanael Chambers
Abstract	Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.
Tasks
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07611v1
PDF	http://arxiv.org/pdf/1711.07611v1.pdf
PWC	https://paperswithcode.com/paper/event-representations-with-tensor-based
Repo	https://github.com/stonybrooknlp/event-tensors
Framework	tf

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification


Title	Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification
Authors	Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
Abstract	This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single language model, while benefiting from better generalization properties across languages.
Tasks	Language Modelling, Sentiment Analysis
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02504v1
PDF	http://arxiv.org/pdf/1703.02504v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-large-amounts-of-weakly-supervised
Repo	https://github.com/spinningbytes/deep-mlsa
Framework	tf

SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks


Title	SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Authors	Martim Brandao, Kenji Hashimoto, Atsuo Takanishi
Abstract	Trajectory optimization and posture generation are hard problems in robot locomotion, which can be non-convex and have multiple local optima. Progress on these problems is further hindered by a lack of open benchmarks, since comparisons of different solutions are difficult to make. In this paper we introduce a new benchmark for trajectory optimization and posture generation of legged robots, using a pre-defined scenario, robot and constraints, as well as evaluation criteria. We evaluate state-of-the-art trajectory optimization algorithms based on sequential quadratic programming (SQP) on the benchmark, as well as new stochastic and incremental optimization methods borrowed from the large-scale machine learning literature. Interestingly we show that some of these stochastic and incremental methods, which are based on stochastic gradient descent (SGD), achieve higher success rates than SQP on tough initializations. Inspired by this observation we also propose a new incremental variant of SQP which updates only a random subset of the costs and constraints at each iteration. The algorithm is the best performing in both success rate and convergence speed, improving over SQP by up to 30% in both criteria. The benchmark’s resources and a solution evaluation script are made openly available.
Tasks	Legged Robots, Stochastic Optimization
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03029v1
PDF	http://arxiv.org/pdf/1710.03029v1.pdf
PWC	https://paperswithcode.com/paper/sgd-for-robot-motion-the-effectiveness-of
Repo	https://github.com/martimbrandao/legopt-benchmark
Framework	none

NormFace: L2 Hypersphere Embedding for Face Verification


Title	NormFace: L2 Hypersphere Embedding for Face Verification
Authors	Feng Wang, Xiang Xiang, Jian Cheng, Alan L. Yuille
Abstract	Thanks to the recent developments of Convolutional Neural Networks, the performance of face verification methods has increased rapidly. In a typical face verification method, feature normalization is a critical step for boosting performance. This motivates us to introduce and study the effect of normalization during training. But we find this is non-trivial, despite normalization being differentiable. We identify and study four issues related to normalization through mathematical analysis, which yields understanding and helps with parameter settings. Based on this analysis we propose two strategies for training using normalized features. The first is a modification of softmax loss, which optimizes cosine similarity instead of inner-product. The second is a reformulation of metric learning by introducing an agent vector for each class. We show that both strategies, and small variants, consistently improve performance by between 0.2% to 0.4% on the LFW dataset based on two models. This is significant because the performance of the two models on LFW dataset is close to saturation at over 98%. Codes and models are released on https://github.com/happynear/NormFace
Tasks	Face Verification, Metric Learning
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06369v4
PDF	http://arxiv.org/pdf/1704.06369v4.pdf
PWC	https://paperswithcode.com/paper/normface-l2-hypersphere-embedding-for-face
Repo	https://github.com/anax32/dimensionality-reduction
Framework	tf

Predicting Visual Features from Text for Image and Video Caption Retrieval


Title	Predicting Visual Features from Text for Image and Video Caption Retrieval
Authors	Jianfeng Dong, Xirong Li, Cees G. M. Snoek
Abstract	This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emph{Word2VisualVec}, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec’s properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results.
Tasks	Video Description
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01362v3
PDF	http://arxiv.org/pdf/1709.01362v3.pdf
PWC	https://paperswithcode.com/paper/predicting-visual-features-from-text-for
Repo	https://github.com/danieljf24/w2vv
Framework	tf

Temporal Stability in Predictive Process Monitoring


Title	Temporal Stability in Predictive Process Monitoring
Authors	Irene Teinemaa, Marlon Dumas, Anna Leontjeva, Fabrizio Maria Maggi
Abstract	Predictive process monitoring is concerned with the analysis of events produced during the execution of a business process in order to predict as early as possible the final outcome of an ongoing case. Traditionally, predictive process monitoring methods are optimized with respect to accuracy. However, in environments where users make decisions and take actions in response to the predictions they receive, it is equally important to optimize the stability of the successive predictions made for each case. To this end, this paper defines a notion of temporal stability for binary classification tasks in predictive process monitoring and evaluates existing methods with respect to both temporal stability and accuracy. We find that methods based on XGBoost and LSTM neural networks exhibit the highest temporal stability. We then show that temporal stability can be enhanced by hyperparameter-optimizing random forests and XGBoost classifiers with respect to inter-run stability. Finally, we show that time series smoothing techniques can further enhance temporal stability at the expense of slightly lower accuracy.
Tasks	Time Series
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04165v3
PDF	http://arxiv.org/pdf/1712.04165v3.pdf
PWC	https://paperswithcode.com/paper/temporal-stability-in-predictive-process
Repo	https://github.com/irhete/stability-predictive-monitoring
Framework	tf

Learning from Between-class Examples for Deep Sound Recognition


Title	Learning from Between-class Examples for Deep Sound Recognition
Authors	Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada
Abstract	Deep learning methods have achieved high performance in sound recognition tasks. Deciding how to feed the training data is important for further performance improvement. We propose a novel learning method for deep sound recognition: Between-Class learning (BC learning). Our strategy is to learn a discriminative feature space by recognizing the between-class sounds as between-class sounds. We generate between-class sounds by mixing two sounds belonging to different classes with a random ratio. We then input the mixed sound to the model and train the model to output the mixing ratio. The advantages of BC learning are not limited only to the increase in variation of the training data; BC learning leads to an enlargement of Fisher’s criterion in the feature space and a regularization of the positional relationship among the feature distributions of the classes. The experimental results show that BC learning improves the performance on various sound recognition networks, datasets, and data augmentation schemes, in which BC learning proves to be always beneficial. Furthermore, we construct a new deep sound recognition network (EnvNet-v2) and train it with BC learning. As a result, we achieved a performance surpasses the human level.
Tasks	Data Augmentation
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10282v2
PDF	http://arxiv.org/pdf/1711.10282v2.pdf
PWC	https://paperswithcode.com/paper/learning-from-between-class-examples-for-deep
Repo	https://github.com/mil-tokyo/bc_learning_image
Framework	torch

A Bridge Between Hyperparameter Optimization and Learning-to-learn


Title	A Bridge Between Hyperparameter Optimization and Learning-to-learn
Authors	Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil
Abstract	We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the specific setting, the variables of the outer objective take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We show that some recently proposed methods in the latter setting can be instantiated in our framework and tackled with the same gradient-based algorithms. Finally, we discuss possible design patterns for learning-to-learn and present encouraging preliminary experiments for few-shot learning.
Tasks	Few-Shot Learning, Hyperparameter Optimization, Meta-Learning
Published	2017-12-18
URL	https://arxiv.org/abs/1712.06283v3
PDF	https://arxiv.org/pdf/1712.06283v3.pdf
PWC	https://paperswithcode.com/paper/a-bridge-between-hyperparameter-optimization
Repo	https://github.com/lucfra/FAR-HO
Framework	tf

Masked Autoregressive Flow for Density Estimation


Title	Masked Autoregressive Flow for Density Estimation
Authors	George Papamakarios, Theo Pavlakou, Iain Murray
Abstract	Autoregressive models are among the best performing neural density estimators. We describe an approach for increasing the flexibility of an autoregressive model, based on modelling the random numbers that the model uses internally when generating data. By constructing a stack of autoregressive models, each modelling the random numbers of the next model in the stack, we obtain a type of normalizing flow suitable for density estimation, which we call Masked Autoregressive Flow. This type of flow is closely related to Inverse Autoregressive Flow and is a generalization of Real NVP. Masked Autoregressive Flow achieves state-of-the-art performance in a range of general-purpose density estimation tasks.
Tasks	Density Estimation
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07057v4
PDF	http://arxiv.org/pdf/1705.07057v4.pdf
PWC	https://paperswithcode.com/paper/masked-autoregressive-flow-for-density
Repo	https://github.com/e-hulten/maf
Framework	pytorch

Semi-Adversarial Networks: Convolutional Autoencoders for Imparting Privacy to Face Images


Title	Semi-Adversarial Networks: Convolutional Autoencoders for Imparting Privacy to Face Images
Authors	Vahid Mirjalili, Sebastian Raschka, Anoop Namboodiri, Arun Ross
Abstract	In this paper, we design and evaluate a convolutional autoencoder that perturbs an input face image to impart privacy to a subject. Specifically, the proposed autoencoder transforms an input face image such that the transformed image can be successfully used for face recognition but not for gender classification. In order to train this autoencoder, we propose a novel training scheme, referred to as semi-adversarial training in this work. The training is facilitated by attaching a semi-adversarial module consisting of a pseudo gender classifier and a pseudo face matcher to the autoencoder. The objective function utilized for training this network has three terms: one to ensure that the perturbed image is a realistic face image; another to ensure that the gender attributes of the face are confounded; and a third to ensure that biometric recognition performance due to the perturbed image is not impacted. Extensive experiments confirm the efficacy of the proposed architecture in extending gender privacy to face images.
Tasks	Face Recognition
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00321v3
PDF	http://arxiv.org/pdf/1712.00321v3.pdf
PWC	https://paperswithcode.com/paper/semi-adversarial-networks-convolutional
Repo	https://github.com/gianluca-pepe/semi-adversarial-network
Framework	tf

NoScope: Optimizing Neural Network Queries over Video at Scale


Title	NoScope: Optimizing Neural Network Queries over Video at Scale
Authors	Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia
Abstract	Recent advances in computer vision-in the form of deep neural networks-have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NoScope cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NoScope uses an efficient cost-based optimizer to search across models and cascades. With this approach, NoScope achieves two to three order of magnitude speed-ups (265-15,500x real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1-5% of state-of-the-art neural networks.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02529v3
PDF	http://arxiv.org/pdf/1703.02529v3.pdf
PWC	https://paperswithcode.com/paper/noscope-optimizing-neural-network-queries
Repo	https://github.com/stanford-futuredata/noscope
Framework	tf

Learning for Disparity Estimation through Feature Constancy


Title	Learning for Disparity Estimation through Feature Constancy
Authors	Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei Chen, Linbo Qiao, Li Zhou, Jianfeng Zhang
Abstract	Stereo matching algorithms usually consist of four steps, including matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement. Existing CNN-based methods only adopt CNN to solve parts of the four steps, or use different networks to deal with different steps, making them difficult to obtain the overall optimal solution. In this paper, we propose a network architecture to incorporate all steps of stereo matching. The network consists of three parts. The first part calculates the multi-scale shared features. The second part performs matching cost calculation, matching cost aggregation and disparity calculation to estimate the initial disparity using shared features. The initial disparity and the shared features are used to calculate the feature constancy that measures correctness of the correspondence between two input images. The initial disparity and the feature constancy are then fed to a sub-network to refine the initial disparity. The proposed method has been evaluated on the Scene Flow and KITTI datasets. It achieves the state-of-the-art performance on the KITTI 2012 and KITTI 2015 benchmarks while maintaining a very fast running time.
Tasks	Disparity Estimation, Stereo Matching, Stereo Matching Hand
Published	2017-12-04
URL	http://arxiv.org/abs/1712.01039v2
PDF	http://arxiv.org/pdf/1712.01039v2.pdf
PWC	https://paperswithcode.com/paper/learning-for-disparity-estimation-through
Repo	https://github.com/leonzfa/iResNet
Framework	none


Title	Multiple Instance Detection Network with Online Instance Classifier Refinement
Authors	Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
Abstract	Of late, weakly supervised object detection is with great importance in object recognition. Based on deep learning, weakly supervised detectors have achieved many promising results. However, compared with fully supervised detection, it is more challenging to train deep network based detectors in a weakly supervised manner. Here we formulate weakly supervised detection as a Multiple Instance Learning (MIL) problem, where instance classifiers (object detectors) are put into the network as hidden nodes. We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i.e., without object location information. More precisely, instance labels inferred from weak supervision are propagated to their spatially overlapped instances to refine instance classifier online. The iterative instance classifier refinement procedure is implemented using multiple streams in deep network, where each stream supervises its latter stream. Weakly supervised object detection experiments are carried out on the challenging PASCAL VOC 2007 and 2012 benchmarks. We obtain 47% mAP on VOC 2007 that significantly outperforms the previous state-of-the-art.
Tasks	Multiple Instance Learning, Object Detection, Object Recognition, Weakly Supervised Object Detection
Published	2017-04-01
URL	http://arxiv.org/abs/1704.00138v1
PDF	http://arxiv.org/pdf/1704.00138v1.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-detection-network-with
Repo	https://github.com/ppengtang/oicr
Framework	pytorch

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs


Title	MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs
Authors	Pranav Rajpurkar, Jeremy Irvin, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng
Abstract	We introduce MURA, a large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. To evaluate models robustly and to get an estimate of radiologist performance, we collect additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. On this test set, the majority vote of a group of three radiologists serves as gold standard. We train a 169-layer DenseNet baseline model to detect and localize abnormalities. Our model achieves an AUROC of 0.929, with an operating point of 0.815 sensitivity and 0.887 specificity. We compare our model and radiologists on the Cohen’s kappa statistic, which expresses the agreement of our model and of each radiologist with the gold standard. Model performance is comparable to the best radiologist performance in detecting abnormalities on finger and wrist studies. However, model performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, and shoulder studies. We believe that the task is a good challenge for future research. To encourage advances, we have made our dataset freely available at https://stanfordmlgroup.github.io/competitions/mura .
Tasks	Anomaly Detection
Published	2017-12-11
URL	http://arxiv.org/abs/1712.06957v4
PDF	http://arxiv.org/pdf/1712.06957v4.pdf
PWC	https://paperswithcode.com/paper/mura-large-dataset-for-abnormality-detection
Repo	https://github.com/rajkumargithub/densenet.mura
Framework	pytorch

A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem


Title	A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
Authors	Zhengyao Jiang, Dixing Xu, Jinjun Liang
Abstract	Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.
Tasks	Portfolio Optimization
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10059v2
PDF	http://arxiv.org/pdf/1706.10059v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-reinforcement-learning-framework-for
Repo	https://github.com/AlphaHounds/UBS
Framework	tf