Paper Group AWR 324
A Note on the Inception Score. DeepInf: Social Influence Prediction with Deep Learning. MOrdReD: Memory-based Ordinal Regression Deep Neural Networks for Time Series Forecasting. Learning Adversarially Fair and Transferable Representations. In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking. Understanding Short-Horizon …
A Note on the Inception Score
Title | A Note on the Inception Score |
Authors | Shane Barratt, Rishi Sharma |
Abstract | Deep generative models are powerful tools that have produced impressive results in recent years. These advances have been for the most part empirically driven, making it essential that we use high quality evaluation metrics. In this paper, we provide new insights into the Inception Score, a recently proposed and widely used evaluation metric for generative models, and demonstrate that it fails to provide useful guidance when comparing models. We discuss both suboptimalities of the metric itself and issues with its application. Finally, we call for researchers to be more systematic and careful when evaluating and comparing generative models, as the advancement of the field depends upon it. |
Tasks | |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.01973v2 |
http://arxiv.org/pdf/1801.01973v2.pdf | |
PWC | https://paperswithcode.com/paper/a-note-on-the-inception-score |
Repo | https://github.com/kozistr/gan-metrics |
Framework | pytorch |
DeepInf: Social Influence Prediction with Deep Learning
Title | DeepInf: Social Influence Prediction with Deep Learning |
Authors | Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, Jie Tang |
Abstract | Social and information networking activities such as on Facebook, Twitter, WeChat, and Weibo have become an indispensable part of our everyday life, where we can easily access friends’ behaviors and are in turn influenced by them. Consequently, an effective social influence prediction for each user is critical for a variety of applications such as online recommendation and advertising. Conventional social influence prediction approaches typically design various hand-crafted rules to extract user- and network-specific features. However, their effectiveness heavily relies on the knowledge of domain experts. As a result, it is usually difficult to generalize them into different domains. Inspired by the recent success of deep neural networks in a wide range of computing applications, we design an end-to-end framework, DeepInf, to learn users’ latent feature representation for predicting social influence. In general, DeepInf takes a user’s local network as the input to a graph neural network for learning her latent social representation. We design strategies to incorporate both network structures and user-specific features into convolutional neural and attention networks. Extensive experiments on Open Academic Graph, Twitter, Weibo, and Digg, representing different types of social and information networks, demonstrate that the proposed end-to-end model, DeepInf, significantly outperforms traditional feature engineering-based approaches, suggesting the effectiveness of representation learning for social applications. |
Tasks | Feature Engineering, Representation Learning |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05560v1 |
http://arxiv.org/pdf/1807.05560v1.pdf | |
PWC | https://paperswithcode.com/paper/deepinf-social-influence-prediction-with-deep |
Repo | https://github.com/xptree/DeepInf |
Framework | pytorch |
MOrdReD: Memory-based Ordinal Regression Deep Neural Networks for Time Series Forecasting
Title | MOrdReD: Memory-based Ordinal Regression Deep Neural Networks for Time Series Forecasting |
Authors | Bernardo Pérez Orozco, Gabriele Abbati, Stephen Roberts |
Abstract | Time series forecasting is ubiquitous in the modern world. Applications range from health care to astronomy, and include climate modelling, financial trading and monitoring of critical engineering equipment. To offer value over this range of activities, models must not only provide accurate forecasts, but also quantify and adjust their uncertainty over time. In this work, we directly tackle this task with a novel, fully end-to-end deep learning method for time series forecasting. By recasting time series forecasting as an ordinal regression task, we develop a principled methodology to assess long-term predictive uncertainty and describe rich multimodal, non-Gaussian behaviour, which arises regularly in applied settings. Notably, our framework is a wholly general-purpose approach that requires little to no user intervention to be used. We showcase this key feature in a large-scale benchmark test with 45 datasets drawn from both, a wide range of real-world application domains, as well as a comprehensive list of synthetic maps. This wide comparison encompasses state-of-the-art methods in both the Machine Learning and Statistics modelling literature, such as the Gaussian Process. We find that our approach does not only provide excellent predictive forecasts, shadowing true future values, but also allows us to infer valuable information, such as the predictive distribution of the occurrence of critical events of interest, accurately and reliably even over long time horizons. |
Tasks | Time Series, Time Series Forecasting |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09704v4 |
http://arxiv.org/pdf/1803.09704v4.pdf | |
PWC | https://paperswithcode.com/paper/mordred-memory-based-ordinal-regression-deep |
Repo | https://github.com/bperezorozco/ordinal_tsf |
Framework | none |
Learning Adversarially Fair and Transferable Representations
Title | Learning Adversarially Fair and Transferable Representations |
Authors | David Madras, Elliot Creager, Toniann Pitassi, Richard Zemel |
Abstract | In this paper, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. Motivated by a scenario where learned representations are used by third parties with unknown objectives, we propose and explore adversarial representation learning as a natural method of ensuring those parties act fairly. We connect group fairness (demographic parity, equalized odds, and equal opportunity) to different adversarial objectives. Through worst-case theoretical guarantees and experimental validation, we show that the choice of this objective is crucial to fair prediction. Furthermore, we present the first in-depth experimental demonstration of fair transfer learning and demonstrate empirically that our learned representations admit fair predictions on new tasks while maintaining utility, an essential goal of fair representation learning. |
Tasks | Representation Learning, Transfer Learning |
Published | 2018-02-17 |
URL | http://arxiv.org/abs/1802.06309v3 |
http://arxiv.org/pdf/1802.06309v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-adversarially-fair-and-transferable |
Repo | https://github.com/VectorInstitute/laftr |
Framework | none |
In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking
Title | In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking |
Authors | Yuezun Li, Ming-Ching Chang, Siwei Lyu |
Abstract | The new developments in deep generative networks have significantly improve the quality and efficiency in generating realistically-looking fake face videos. In this work, we describe a new method to expose fake face videos generated with neural networks. Our method is based on detection of eye blinking in the videos, which is a physiological signal that is not well presented in the synthesized fake videos. Our method is tested over benchmarks of eye-blinking detection datasets and also show promising performance on detecting videos generated with DeepFake. |
Tasks | Face Swapping |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02877v2 |
http://arxiv.org/pdf/1806.02877v2.pdf | |
PWC | https://paperswithcode.com/paper/in-ictu-oculi-exposing-ai-generated-fake-face |
Repo | https://github.com/danmohaha/WIFS2018_In_Ictu_Oculi |
Framework | tf |
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Title | Understanding Short-Horizon Bias in Stochastic Meta-Optimization |
Authors | Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse |
Abstract | Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training. There has been much recent interest in gradient-based meta-optimization, where one tunes hyperparameters, or even learns an optimizer, in order to minimize the expected loss when the training procedure is unrolled. But because the training procedure must be unrolled thousands of times, the meta-objective must be defined with an orders-of-magnitude shorter time horizon than is typical for neural net training. We show that such short-horizon meta-objectives cause a serious bias towards small step sizes, an effect we term short-horizon bias. We introduce a toy problem, a noisy quadratic cost function, on which we analyze short-horizon bias by deriving and comparing the optimal schedules for short and long time horizons. We then run meta-optimization experiments (both offline and online) on standard benchmark datasets, showing that meta-optimization chooses too small a learning rate by multiple orders of magnitude, even when run with a moderately long time horizon (100 steps) typical of work in the area. We believe short-horizon bias is a fundamental problem that needs to be addressed if meta-optimization is to scale to practical neural net training regimes. |
Tasks | |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02021v1 |
http://arxiv.org/pdf/1803.02021v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-short-horizon-bias-in |
Repo | https://github.com/renmengye/meta-optim-public |
Framework | tf |
Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark
Title | Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark |
Authors | Akshat Agarwal, Ryan Hope, Katia Sycara |
Abstract | Research in deep reinforcement learning (RL) has coalesced around improving performance on benchmarks like the Arcade Learning Environment. However, these benchmarks conspicuously miss important characteristics like abrupt context-dependent shifts in strategy and temporal sensitivity that are often present in real-world domains. As a result, RL research has not focused on these challenges, resulting in algorithms which do not understand critical changes in context, and have little notion of real world time. To tackle this issue, this paper introduces the game of Space Fortress as a RL benchmark which incorporates these characteristics. We show that existing state-of-the-art RL algorithms are unable to learn to play the Space Fortress game. We then confirm that this poor performance is due to the RL algorithms’ context insensitivity and reward sparsity. We also identify independent axes along which to vary context and temporal sensitivity, allowing Space Fortress to be used as a testbed for understanding both characteristics in combination and also in isolation. We release Space Fortress as an open-source Gym environment. |
Tasks | Atari Games |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02206v1 |
http://arxiv.org/pdf/1809.02206v1.pdf | |
PWC | https://paperswithcode.com/paper/challenges-of-context-and-time-in |
Repo | https://github.com/agakshat/spacefortress |
Framework | pytorch |
A generalizable approach for multi-view 3D human pose regression
Title | A generalizable approach for multi-view 3D human pose regression |
Authors | Abdolrahim Kadkhodamohammadi, Nicolas Padoy |
Abstract | Despite the significant improvement in the performance of monocular pose estimation approaches and their ability to generalize to unseen environments, multi-view (MV) approaches are often lagging behind in terms of accuracy and are specific to certain datasets. This is mainly due to the fact that (1) contrary to real world single-view (SV) datasets, MV datasets are often captured in controlled environments to collect precise 3D annotations, which do not cover all real world challenges, and (2) the model parameters are learned for specific camera setups. To alleviate these problems, we propose a two-stage approach to detect and estimate 3D human poses, which separates SV pose detection from MV 3D pose estimation. This separation enables us to utilize each dataset for the right task, i.e. SV datasets for constructing robust pose detection models and MV datasets for constructing precise MV 3D regression models. In addition, our 3D regression approach only requires 3D pose data and its projections to the views for building the model, hence removing the need for collecting annotated data from the test setup. Our approach can therefore be easily generalized to a new environment by simply projecting 3D poses into 2D during training according to the camera setup used at test time. As 2D poses are collected at test time using a SV pose detector, which might generate inaccurate detections, we model its characteristics and incorporate this information during training. We demonstrate that incorporating the detector’s characteristics is important to build a robust 3D regression model and that the resulting regression model generalizes well to new MV environments. Our evaluation results show that our approach achieves competitive results on the Human3.6M dataset and significantly improves results on a MV clinical dataset that is the first MV dataset generated from live surgery recordings. |
Tasks | 3D Pose Estimation, Pose Estimation |
Published | 2018-04-27 |
URL | https://arxiv.org/abs/1804.10462v2 |
https://arxiv.org/pdf/1804.10462v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generalizable-approach-for-multi-view-3d |
Repo | https://github.com/AimAlex/Multi-view-3D-Human-Pose |
Framework | pytorch |
Efficient First-Order Algorithms for Adaptive Signal Denoising
Title | Efficient First-Order Algorithms for Adaptive Signal Denoising |
Authors | Dmitrii Ostrovskii, Zaid Harchaoui |
Abstract | We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Each such estimator is associated with a time-invariant filter which is obtained adaptively, by solving a certain convex optimization problem. Adaptive convolution-type estimators were demonstrated to have favorable statistical properties. However, the question of their computational complexity remains largely unexplored, and in fact we are not aware of any publicly available implementation of these estimators. Our first contribution is an efficient implementation of these estimators via some known first-order proximal algorithms. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy. The proposed procedures and their analysis are illustrated on a simulated data benchmark. |
Tasks | Denoising |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11262v3 |
http://arxiv.org/pdf/1803.11262v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-first-order-algorithms-for-adaptive |
Repo | https://github.com/ostrodmit/AlgoRec |
Framework | none |
Incorporating Glosses into Neural Word Sense Disambiguation
Title | Incorporating Glosses into Neural Word Sense Disambiguation |
Authors | Fuli Luo, Tianyu Liu, Qiaolin Xia, Baobao Chang, Zhifang Sui |
Abstract | Word Sense Disambiguation (WSD) aims to identify the correct meaning of polysemous words in the particular context. Lexical resources like WordNet which are proved to be of great help for WSD in the knowledge-based methods. However, previous neural networks for WSD always rely on massive labeled data (context), ignoring lexical resources like glosses (sense definitions). In this paper, we integrate the context and glosses of the target word into a unified framework in order to make full use of both labeled data and lexical knowledge. Therefore, we propose GAS: a gloss-augmented WSD neural network which jointly encodes the context and glosses of the target word. GAS models the semantic relationship between the context and the gloss in an improved memory network framework, which breaks the barriers of the previous supervised methods and knowledge-based methods. We further extend the original gloss of word sense via its semantic relations in WordNet to enrich the gloss information. The experimental results show that our model outperforms the state-of-theart systems on several English all-words WSD datasets. |
Tasks | Word Sense Disambiguation |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08028v2 |
http://arxiv.org/pdf/1805.08028v2.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-glosses-into-neural-word-sense |
Repo | https://github.com/jimiyulu/WSD_MemNN |
Framework | tf |
Aspect Based Sentiment Analysis with Gated Convolutional Networks
Title | Aspect Based Sentiment Analysis with Gated Convolutional Networks |
Authors | Wei Xue, Tao Li |
Abstract | Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07043v1 |
http://arxiv.org/pdf/1805.07043v1.pdf | |
PWC | https://paperswithcode.com/paper/aspect-based-sentiment-analysis-with-gated |
Repo | https://github.com/wxue004cs/GCAE |
Framework | pytorch |
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
Title | Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise |
Authors | Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel |
Abstract | The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by data poisoning adversaries. Numerous previous works assume that no source of labels can be trusted. We relax this assumption and assume that a small subset of the training data is trusted. This enables substantial label corruption robustness performance gains. In addition, particularly severe label noise can be combated by using a set of trusted data with clean labels. We utilize trusted data by proposing a loss correction technique that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods. |
Tasks | data poisoning |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05300v4 |
http://arxiv.org/pdf/1802.05300v4.pdf | |
PWC | https://paperswithcode.com/paper/using-trusted-data-to-train-deep-networks-on |
Repo | https://github.com/mmazeika/glc |
Framework | pytorch |
Motion-based Object Segmentation based on Dense RGB-D Scene Flow
Title | Motion-based Object Segmentation based on Dense RGB-D Scene Flow |
Authors | Lin Shao, Parth Shah, Vikranth Dwaracherla, Jeannette Bohg |
Abstract | Given two consecutive RGB-D images, we propose a model that estimates a dense 3D motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes often consist of a set of rigidly moving objects. Our model jointly estimates (i) the segmentation of the scene into an unknown but finite number of objects, (ii) the motion trajectories of these objects and (iii) the object scene flow. We employ an hourglass, deep neural network architecture. In the encoding stage, the RGB and depth images undergo spatial compression and correlation. In the decoding stage, the model outputs three images containing a per-pixel estimate of the corresponding object center as well as object translation and rotation. This forms the basis for inferring the object segmentation and final object scene flow. To evaluate our model, we generated a new and challenging, large-scale, synthetic dataset that is specifically targeted at robotic manipulation: It contains a large number of scenes with a very diverse set of simultaneously moving 3D objects and is recorded with a simulated, static RGB-D camera. In quantitative experiments, we show that we outperform state-of-the-art scene flow and motion-segmentation methods on this data set. In qualitative experiments, we show how our learned model transfers to challenging real-world scenes, visually generating better results than existing methods. |
Tasks | Motion Segmentation, Semantic Segmentation |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.05195v2 |
http://arxiv.org/pdf/1804.05195v2.pdf | |
PWC | https://paperswithcode.com/paper/motion-based-object-segmentation-based-on |
Repo | https://github.com/stanford-iprl-lab/sceneflownet |
Framework | tf |
M-PACT: An Open Source Platform for Repeatable Activity Classification Research
Title | M-PACT: An Open Source Platform for Repeatable Activity Classification Research |
Authors | Eric Hofesmann, Madan Ravi Ganesh, Jason J. Corso |
Abstract | There are many hurdles that prevent the replication of existing work which hinders the development of new activity classification models. These hurdles include switching between multiple deep learning libraries and the development of boilerplate experimental pipelines. We present M-PACT to overcome existing issues by removing the need to develop boilerplate code which allows users to quickly prototype action classification models while leveraging existing state-of-the-art (SOTA) models available in the platform. M-PACT is the first to offer four SOTA activity classification models, I3D, C3D, ResNet50+LSTM, and TSN, under a single platform with reproducible competitive results. This platform allows for the generation of models and results over activity recognition datasets through the use of modular code, various preprocessing and neural network layers, and seamless data flow. In this paper, we present the system architecture, detail the functions of various modules, and describe the basic tools to develop a new model in M-PACT. |
Tasks | Action Classification, Activity Recognition |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05879v3 |
http://arxiv.org/pdf/1804.05879v3.pdf | |
PWC | https://paperswithcode.com/paper/m-pact-an-open-source-platform-for-repeatable |
Repo | https://github.com/MichiganCOG/M-PACT |
Framework | tf |
Learning Visually-Grounded Semantics from Contrastive Adversarial Samples
Title | Learning Visually-Grounded Semantics from Contrastive Adversarial Samples |
Authors | Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun |
Abstract | We study the problem of grounding distributional representations of texts on the visual domain, namely visual-semantic embeddings (VSE for short). Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e.g., MS-COCO) both quantitatively and qualitatively. The large gap between the number of possible constitutions of real-world semantics and the size of parallel data, to a large extent, restricts the model to establish the link between textual semantics and visual concepts. We alleviate this problem by augmenting the MS-COCO image captioning datasets with textual contrastive adversarial samples. These samples are synthesized using linguistic rules and the WordNet knowledge base. The construction procedure is both syntax- and semantics-aware. The samples enforce the model to ground learned embeddings to concrete concepts within the image. This simple but powerful technique brings a noticeable improvement over the baselines on a diverse set of downstream tasks, in addition to defending known-type adversarial attacks. We release the codes at https://github.com/ExplorerFreda/VSE-C. |
Tasks | Adversarial Attack, Image Captioning |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10348v1 |
http://arxiv.org/pdf/1806.10348v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-visually-grounded-semantics-from |
Repo | https://github.com/ExplorerFreda/VSE-C |
Framework | pytorch |