Paper Group AWR 85
Soft Proposal Networks for Weakly Supervised Object Localization. Chord Generation from Symbolic Melody Using BLSTM Networks. Outcome-Oriented Predictive Process Monitoring: Review and Benchmark. Passing the Brazilian OAB Exam: data preparation and some experiments. Boosting Adversarial Attacks with Momentum. DPC-Net: Deep Pose Correction for Visua …
Soft Proposal Networks for Weakly Supervised Object Localization
Title | Soft Proposal Networks for Weakly Supervised Object Localization |
Authors | Yi Zhu, Yanzhao Zhou, Qixiang Ye, Qiang Qiu, Jianbin Jiao |
Abstract | Weakly supervised object localization remains challenging, where only image labels instead of bounding boxes are available during training. Object proposal is an effective component in localization, but often computationally expensive and incapable of joint optimization with some of the remaining modules. In this paper, to the best of our knowledge, we for the first time integrate weakly supervised object proposal into convolutional neural networks (CNNs) in an end-to-end learning manner. We design a network component, Soft Proposal (SP), to be plugged into any standard convolutional architecture to introduce the nearly cost-free object proposal, orders of magnitude faster than state-of-the-art methods. In the SP-augmented CNNs, referred to as Soft Proposal Networks (SPNs), iteratively evolved object proposals are generated based on the deep feature maps then projected back, and further jointly optimized with network parameters, with image-level supervision only. Through the unified learning process, SPNs learn better object-centric filters, discover more discriminative visual evidence, and suppress background interference, significantly boosting both weakly supervised object localization and classification performance. We report the best results on popular benchmarks, including PASCAL VOC, MS COCO, and ImageNet. |
Tasks | Object Localization, Weakly Supervised Object Detection, Weakly-Supervised Object Localization |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01829v1 |
http://arxiv.org/pdf/1709.01829v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-proposal-networks-for-weakly-supervised |
Repo | https://github.com/yeezhu/SPN.pytorch |
Framework | pytorch |
Chord Generation from Symbolic Melody Using BLSTM Networks
Title | Chord Generation from Symbolic Melody Using BLSTM Networks |
Authors | Hyungui Lim, Seungyeon Rhyu, Kyogu Lee |
Abstract | Generating a chord progression from a monophonic melody is a challenging problem because a chord progression requires a series of layered notes played simultaneously. This paper presents a novel method of generating chord sequences from a symbolic melody using bidirectional long short-term memory (BLSTM) networks trained on a lead sheet database. To this end, a group of feature vectors composed of 12 semitones is extracted from the notes in each bar of monophonic melodies. In order to ensure that the data shares uniform key and duration characteristics, the key and the time signatures of the vectors are normalized. The BLSTM networks then learn from the data to incorporate the temporal dependencies to produce a chord progression. Both quantitative and qualitative evaluations are conducted by comparing the proposed method with the conventional HMM and DNN-HMM based approaches. Proposed model achieves 23.8% and 11.4% performance increase from the other models, respectively. User studies further confirm that the chord sequences generated by the proposed method are preferred by listeners. |
Tasks | |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.01011v1 |
http://arxiv.org/pdf/1712.01011v1.pdf | |
PWC | https://paperswithcode.com/paper/chord-generation-from-symbolic-melody-using |
Repo | https://github.com/nprabala/Mixtape |
Framework | pytorch |
Outcome-Oriented Predictive Process Monitoring: Review and Benchmark
Title | Outcome-Oriented Predictive Process Monitoring: Review and Benchmark |
Authors | Irene Teinemaa, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi |
Abstract | Predictive business process monitoring refers to the act of making predictions about the future state of ongoing cases of a business process, based on their incomplete execution traces and logs of historical (completed) traces. Motivated by the increasingly pervasive availability of fine-grained event data about business process executions, the problem of predictive process monitoring has received substantial attention in the past years. In particular, a considerable number of methods have been put forward to address the problem of outcome-oriented predictive process monitoring, which refers to classifying each ongoing case of a process according to a given set of possible categorical outcomes - e.g., Will the customer complain or not? Will an order be delivered, canceled or withdrawn? Unfortunately, different authors have used different datasets, experimental settings, evaluation measures and baselines to assess their proposals, resulting in poor comparability and an unclear picture of the relative merits and applicability of different methods. To address this gap, this article presents a systematic review and taxonomy of outcome-oriented predictive process monitoring methods, and a comparative experimental evaluation of eleven representative methods using a benchmark covering 24 predictive process monitoring tasks based on nine real-life event logs. |
Tasks | |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06766v4 |
http://arxiv.org/pdf/1707.06766v4.pdf | |
PWC | https://paperswithcode.com/paper/outcome-oriented-predictive-process |
Repo | https://github.com/irhete/predictive-monitoring-benchmark |
Framework | none |
Passing the Brazilian OAB Exam: data preparation and some experiments
Title | Passing the Brazilian OAB Exam: data preparation and some experiments |
Authors | Pedro Delfino, Bruno Cuconato, Edward Hermann Haeusler, Alexandre Rademaker |
Abstract | In Brazil, all legal professionals must demonstrate their knowledge of the law and its application by passing the OAB exams, the national bar exams. The OAB exams therefore provide an excellent benchmark for the performance of legal information systems since passing the exam would arguably signal that the system has acquired capacity of legal reasoning comparable to that of a human lawyer. This article describes the construction of a new data set and some preliminary experiments on it, treating the problem of finding the justification for the answers to questions. The results provide a baseline performance measure against which to evaluate future improvements. We discuss the reasons to the poor performance and propose next steps. |
Tasks | |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05128v1 |
http://arxiv.org/pdf/1712.05128v1.pdf | |
PWC | https://paperswithcode.com/paper/passing-the-brazilian-oab-exam-data |
Repo | https://github.com/own-pt/oab-exams |
Framework | none |
Boosting Adversarial Attacks with Momentum
Title | Boosting Adversarial Attacks with Momentum |
Authors | Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li |
Abstract | Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions. |
Tasks | Adversarial Attack |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06081v3 |
http://arxiv.org/pdf/1710.06081v3.pdf | |
PWC | https://paperswithcode.com/paper/boosting-adversarial-attacks-with-momentum |
Repo | https://github.com/srk97/targeted-adversarial-mnist |
Framework | tf |
DPC-Net: Deep Pose Correction for Visual Localization
Title | DPC-Net: Deep Pose Correction for Visual Localization |
Authors | Valentin Peretroukhin, Jonathan Kelly |
Abstract | We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model corrections to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE(3) corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train a Deep Pose Correction network (DPC-Net) that predicts corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters. |
Tasks | Visual Localization, Visual Odometry |
Published | 2017-09-10 |
URL | http://arxiv.org/abs/1709.03128v3 |
http://arxiv.org/pdf/1709.03128v3.pdf | |
PWC | https://paperswithcode.com/paper/dpc-net-deep-pose-correction-for-visual |
Repo | https://github.com/utiasSTARS/dpc-net |
Framework | pytorch |
MSC: A Dataset for Macro-Management in StarCraft II
Title | MSC: A Dataset for Macro-Management in StarCraft II |
Authors | Huikai Wu, Junge Zhang, Kaiqi Huang |
Abstract | Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There’re neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don’t have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC. |
Tasks | Real-Time Strategy Games, Starcraft, Starcraft II |
Published | 2017-10-09 |
URL | http://arxiv.org/abs/1710.03131v2 |
http://arxiv.org/pdf/1710.03131v2.pdf | |
PWC | https://paperswithcode.com/paper/msc-a-dataset-for-macro-management-in |
Repo | https://github.com/wuhuikai/MSC |
Framework | none |
Natural Language Processing with Small Feed-Forward Networks
Title | Natural Language Processing with Small Feed-Forward Networks |
Authors | Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David Weiss, Ryan McDonald, Slav Petrov |
Abstract | We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget. |
Tasks | |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00214v1 |
http://arxiv.org/pdf/1708.00214v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-processing-with-small-feed |
Repo | https://github.com/bzz/LangID |
Framework | tf |
Multitask learning and benchmarking with clinical time series data
Title | Multitask learning and benchmarking with clinical time series data |
Authors | Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Greg Ver Steeg, Aram Galstyan |
Abstract | Health care is one of the most exciting frontiers in data mining and machine learning. Successful adoption of electronic health records (EHRs) created an explosion in digital clinical data available for analysis, but progress in machine learning for healthcare research has been difficult to measure because of the absence of publicly available benchmark data sets. To address this problem, we propose four clinical prediction benchmarks using data derived from the publicly available Medical Information Mart for Intensive Care (MIMIC-III) database. These tasks cover a range of clinical problems including modeling risk of mortality, forecasting length of stay, detecting physiologic decline, and phenotype classification. We propose strong linear and neural baselines for all four tasks and evaluate the effect of deep supervision, multitask training and data-specific architectural modifications on the performance of neural models. |
Tasks | Computational Phenotyping, Length-of-Stay prediction, Mortality Prediction, Time Series |
Published | 2017-03-22 |
URL | https://arxiv.org/abs/1703.07771v3 |
https://arxiv.org/pdf/1703.07771v3.pdf | |
PWC | https://paperswithcode.com/paper/multitask-learning-and-benchmarking-with |
Repo | https://github.com/ksu-hmi/Class-Assignment-https-github.com-pwhitley2-Evaluation-of-Python-suite-to-construct-benchmark |
Framework | none |
Self-view Grounding Given a Narrated 360° Video
Title | Self-view Grounding Given a Narrated 360° Video |
Authors | Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun |
Abstract | Narrated 360{\deg} videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative. In this project, we aim at automatically grounding the NFoVs of a 360{\deg} video given subtitles of the narrative (referred to as “NFoV-grounding”). We propose a novel Visual Grounding Model (VGM) to implicitly and efficiently predict the NFoVs given the video content and subtitles. Specifically, at each frame, we efficiently encode the panorama into feature map of candidate NFoVs using a Convolutional Neural Network (CNN) and the subtitles to the same hidden space using an RNN with Gated Recurrent Units (GRU). Then, we apply soft-attention on candidate NFoVs to trigger sentence decoder aiming to minimize the reconstruct loss between the generated and given sentence. Finally, we obtain the NFoV as the candidate NFoV with the maximum attention without any human supervision. To train VGM more robustly, we also generate a reverse sentence conditioning on one minus the soft-attention such that the attention focuses on candidate NFoVs less relevant to the given sentence. The negative log reconstruction loss of the reverse sentence (referred to as “irrelevant loss”) is jointly minimized to encourage the reverse sentence to be different from the given sentence. To evaluate our method, we collect the first narrated 360{\deg} videos dataset and achieve state-of-the-art NFoV-grounding performance. |
Tasks | |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08664v1 |
http://arxiv.org/pdf/1711.08664v1.pdf | |
PWC | https://paperswithcode.com/paper/self-view-grounding-given-a-narrated-360 |
Repo | https://github.com/ShihHanChou/360grounding |
Framework | pytorch |
Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems
Title | Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems |
Authors | Jong Chul Ye, Yoseob Han, Eunju Cha |
Abstract | Recently, deep learning approaches with various network architectures have achieved significant performance improvement over existing iterative reconstruction methods in various imaging problems. However, it is still unclear why these deep learning architectures work for specific inverse problems. To address these issues, here we show that the long-searched-for missing link is the convolution framelets for representing a signal by convolving local and non-local bases. The convolution framelets was originally developed to generalize the theory of low-rank Hankel matrix approaches for inverse problems, and this paper further extends the idea so that we can obtain a deep neural network using multilayer convolution framelets with perfect reconstruction (PR) under rectilinear linear unit nonlinearity (ReLU). Our analysis also shows that the popular deep network components such as residual block, redundant filter channels, and concatenated ReLU (CReLU) do indeed help to achieve the PR, while the pooling and unpooling layers should be augmented with high-pass branches to meet the PR condition. Moreover, by changing the number of filter channels and bias, we can control the shrinkage behaviors of the neural network. This discovery leads us to propose a novel theory for deep convolutional framelets neural network. Using numerical experiments with various inverse problems, we demonstrated that our deep convolution framelets network shows consistent improvement over existing deep architectures.This discovery suggests that the success of deep learning is not from a magical power of a black-box, but rather comes from the power of a novel signal representation using non-local basis combined with data-driven local basis, which is indeed a natural extension of classical signal processing theory. |
Tasks | |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00372v5 |
http://arxiv.org/pdf/1707.00372v5.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-framelets-a-general-deep |
Repo | https://github.com/hanyoseob/framing-u-net |
Framework | none |
Soft-NMS – Improving Object Detection With One Line of Code
Title | Soft-NMS – Improving Object Detection With One Line of Code |
Authors | Navaneeth Bodla, Bharat Singh, Rama Chellappa, Larry S. Davis |
Abstract | Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are suppressed. This process is recursively applied on the remaining boxes. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss. To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process. Soft-NMS obtains consistent improvements for the coco-style mAP metric on standard datasets like PASCAL VOC 2007 (1.7% for both R-FCN and Faster-RCNN) and MS-COCO (1.3% for R-FCN and 1.1% for Faster-RCNN) by just changing the NMS algorithm without any additional hyper-parameters. Using Deformable-RFCN, Soft-NMS improves state-of-the-art in object detection from 39.8% to 40.9% with a single model. Further, the computational complexity of Soft-NMS is the same as traditional NMS and hence it can be efficiently implemented. Since Soft-NMS does not require any extra training and is simple to implement, it can be easily integrated into any object detection pipeline. Code for Soft-NMS is publicly available on GitHub (http://bit.ly/2nJLNMu). |
Tasks | Object Detection |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04503v2 |
http://arxiv.org/pdf/1704.04503v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-nms-improving-object-detection-with-one |
Repo | https://github.com/tkuanlun350/Kaggle_Ship_Detection_2018 |
Framework | tf |
Generative Face Completion
Title | Generative Face Completion |
Authors | Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang |
Abstract | In this paper, we propose an effective face completion algorithm using a deep generative model. Different from well-studied background completion, the face completion task is more challenging as it often requires to generate semantically new pixels for the missing key components (e.g., eyes and mouths) that contain large appearance variations. Unlike existing nonparametric algorithms that search for patches to synthesize, our algorithm directly generates contents for missing regions based on a neural network. The model is trained with a combination of a reconstruction loss, two adversarial losses and a semantic parsing loss, which ensures pixel faithfulness and local-global contents consistency. With extensive experimental results, we demonstrate qualitatively and quantitatively that our model is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results. |
Tasks | Facial Inpainting, Semantic Parsing |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05838v1 |
http://arxiv.org/pdf/1704.05838v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-face-completion |
Repo | https://github.com/easternCar/Face-Parsing-Network |
Framework | pytorch |
Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning
Title | Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning |
Authors | Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum |
Abstract | Knowledge bases (KB), both automatically and manually constructed, are often incomplete — many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of paths, previous path-based models have considered only the problem of predicting a missing relation given two entities or evaluating the truth of a proposed triple. Additionally, these methods have traditionally used random paths between fixed entity pairs or more recently learned to pick paths between them. We propose a new algorithm MINERVA, which addresses the much more difficult and practical task of answering questions where the relation is known, but only one entity. Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query to find predictive paths. Empirically, this approach obtains state-of-the-art results on several datasets, significantly outperforming prior methods. |
Tasks | |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05851v2 |
http://arxiv.org/pdf/1711.05851v2.pdf | |
PWC | https://paperswithcode.com/paper/go-for-a-walk-and-arrive-at-the-answer |
Repo | https://github.com/markWJJ/rl |
Framework | tf |
Fast MCMC sampling algorithms on polytopes
Title | Fast MCMC sampling algorithms on polytopes |
Authors | Yuansi Chen, Raaz Dwivedi, Martin J. Wainwright, Bin Yu |
Abstract | We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former is based on volumetric-logarithmic barrier introduced by Vaidya whereas the latter uses John’s ellipsoids. We show that the Vaidya walk mixes in significantly fewer steps than the logarithmic-barrier based Dikin walk studied in past work. For a polytope in $\mathbb{R}^d$ defined by $n >d$ linear constraints, we show that the mixing time from a warm start is bounded as $\mathcal{O}(n^{0.5}d^{1.5})$, compared to the $\mathcal{O}(nd)$ mixing time bound for the Dikin walk. The cost of each step of the Vaidya walk is of the same order as the Dikin walk, and at most twice as large in terms of constant pre-factors. For the John walk, we prove an $\mathcal{O}(d^{2.5}\cdot\log^4(n/d))$ bound on its mixing time and conjecture that an improved variant of it could achieve a mixing time of $\mathcal{O}(d^2\cdot\text{polylog}(n/d))$. Additionally, we propose variants of the Vaidya and John walks that mix in polynomial time from a deterministic starting point. The speed-up of the Vaidya walk over the Dikin walk are illustrated in numerical examples. |
Tasks | |
Published | 2017-10-23 |
URL | http://arxiv.org/abs/1710.08165v3 |
http://arxiv.org/pdf/1710.08165v3.pdf | |
PWC | https://paperswithcode.com/paper/fast-mcmc-sampling-algorithms-on-polytopes |
Repo | https://github.com/yuachen/polytopewalk |
Framework | none |