February 1, 2020

3217 words 16 mins read

Paper Group AWR 224

Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower. Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization. MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition. Universal audio synthesizer control with normalizing flows. Towards modular and programmable architecture sear …

Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower


Title	Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower
Authors	Giorgos Tolias, Filip Radenovic, Ondřej Chum
Abstract	Access to online visual search engines implies sharing of private user content - the query images. We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The generated image looks nothing like the user intended query, but leads to identical or very similar retrieval results. Transferring attacks to fully unseen networks is challenging. We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction. These include loss functions, for example, for unknown global pooling operation or unknown input resolution by the retrieval system. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.
Tasks	Adversarial Attack
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09163v1
PDF	https://arxiv.org/pdf/1908.09163v1.pdf
PWC	https://paperswithcode.com/paper/targeted-mismatch-adversarial-attack-query
Repo	https://github.com/gtolias/tma
Framework	pytorch

Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization


Title	Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization
Authors	Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo
Abstract	In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019. Temporal action localization is challenging since a target proposal may be related to several other candidate proposals in an untrimmed video. Existing methods cannot tackle this challenge well since temporal proposals are considered individually and their temporal dependencies are neglected. To address this issue, we propose sparse 2D temporal adjacent networks to model the temporal relationship between candidate proposals. This method is built upon the recent proposed 2D-TAN approach. The sampling strategy in 2D-TAN introduces the unbalanced context problem, where short proposals can perceive more context than long proposals. Therefore, we further propose a Sparse 2D Temporal Adjacent Network (S-2D-TAN). It is capable of involving more context information for long proposals and further learning discriminative features from them. By combining our S-2D-TAN with a simple action classifier, our method achieves a mAP of 23.49 on the test set, which win the first place in the HACS challenge.
Tasks	Action Localization, Temporal Action Localization
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03612v1
PDF	https://arxiv.org/pdf/1912.03612v1.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-2d-temporal-adjacent-networks
Repo	https://github.com/researchmm/2D-TAN
Framework	pytorch

MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition


Title	MIMAMO Net: Integrating Micro- and Macro-motion for Video Emotion Recognition
Authors	Didan Deng, Zhaokang Chen, Yuqian Zhou, Bertram Shi
Abstract	Spatial-temporal feature learning is of vital importance for video emotion recognition. Previous deep network structures often focused on macro-motion which extends over long time scales, e.g., on the order of seconds. We believe integrating structures capturing information about both micro- and macro-motion will benefit emotion prediction, because human perceive both micro- and macro-expressions. In this paper, we propose to combine micro- and macro-motion features to improve video emotion recognition with a two-stream recurrent network, named MIMAMO (Micro-Macro-Motion) Net. Specifically, smaller and shorter micro-motions are analyzed by a two-stream network, while larger and more sustained macro-motions can be well captured by a subsequent recurrent network. Assigning specific interpretations to the roles of different parts of the network enables us to make choice of parameters based on prior knowledge: choices that turn out to be optimal. One of the important innovations in our model is the use of interframe phase differences rather than optical flow as input to the temporal stream. Compared with the optical flow, phase differences require less computation and are more robust to illumination changes. Our proposed network achieves state of the art performance on two video emotion datasets, the OMG emotion dataset and the Aff-Wild dataset. The most significant gains are for arousal prediction, for which motion information is intuitively more informative. Source code is available at https://github.com/wtomin/MIMAMO-Net.
Tasks	Emotion Recognition, Optical Flow Estimation, Video Emotion Recognition
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09784v1
PDF	https://arxiv.org/pdf/1911.09784v1.pdf
PWC	https://paperswithcode.com/paper/mimamo-net-integrating-micro-and-macro-motion
Repo	https://github.com/wtomin/MIMAMO-Net
Framework	pytorch

Universal audio synthesizer control with normalizing flows


Title	Universal audio synthesizer control with normalizing flows
Authors	Philippe Esling, Naotake Masuda, Adrien Bardet, Romeo Despres, Axel Chemla–Romeu-Santos
Abstract	The ubiquity of sound synthesizers has reshaped music production and even entirely defined new music genres. However, the increasing complexity and number of parameters in modern synthesizers make them harder to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Here, we introduce a novel formulation of audio synthesizer control. We formalize it as finding an organized latent audio space that represents the capabilities of a synthesizer, while constructing an invertible mapping to the space of its parameters. By using this formulation, we show that we can address simultaneously automatic parameter inference, macro-control learning and audio-based preset exploration within a single model. To solve this new formulation, we rely on Variational Auto-Encoders (VAE) and Normalizing Flows (NF) to organize and map the respective auditory and parameter spaces. We introduce the disentangling flows, which allow to perform the invertible mapping between separate latent spaces, while steering the organization of some latent dimensions to match target variation factors by splitting the objective as partial density evaluation. We evaluate our proposal against a large set of baseline models and show its superiority in both parameter inference and audio reconstruction. We also show that the model disentangles the major factors of audio variations as latent dimensions, that can be directly used as macro-parameters. We also show that our model is able to learn semantic controls of a synthesizer by smoothly mapping to its parameters. Finally, we discuss the use of our model in creative applications and its real-time implementation in Ableton Live
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00971v1
PDF	https://arxiv.org/pdf/1907.00971v1.pdf
PWC	https://paperswithcode.com/paper/universal-audio-synthesizer-control-with
Repo	https://github.com/acids-ircam/flow_synthesizer
Framework	pytorch

Towards modular and programmable architecture search


Title	Towards modular and programmable architecture search
Authors	Renato Negrinho, Darshan Patil, Nghia Le, Daniel Ferreira, Matthew Gormley, Geoffrey Gordon
Abstract	Neural architecture search methods are able to find high performance deep learning architectures with minimal effort from an expert. However, current systems focus on specific use-cases (e.g. convolutional image classifiers and recurrent language models), making them unsuitable for general use-cases that an expert might wish to write. Hyperparameter optimization systems are general-purpose but lack the constructs needed for easy application to architecture search. In this work, we propose a formal language for encoding search spaces over general computational graphs. The language constructs allow us to write modular, composable, and reusable search space encodings and to reason about search space design. We use our language to encode search spaces from the architecture search literature. The language allows us to decouple the implementations of the search space and the search algorithm, allowing us to expose search spaces to search algorithms through a consistent interface. Our experiments show the ease with which we can experiment with different combinations of search spaces and search algorithms without having to implement each combination from scratch. We release an implementation of our language with this paper.
Tasks	Hyperparameter Optimization, Neural Architecture Search
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13404v1
PDF	https://arxiv.org/pdf/1909.13404v1.pdf
PWC	https://paperswithcode.com/paper/towards-modular-and-programmable-architecture
Repo	https://github.com/negrinho/deep_architect
Framework	tf

Pars-ABSA: an Aspect-based Sentiment Analysis dataset for Persian


Title	Pars-ABSA: an Aspect-based Sentiment Analysis dataset for Persian
Authors	Taha Shangipour Ataei, Kamyar Darvishi, Soroush Javdan, Behrouz Minaei-Bidgoli, Sauleh Eetemadi
Abstract	Due to the increased availability of online reviews, sentiment analysis had been witnessed a booming interest from the researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e. aspect-based sentiment analysis). Most of the available data resources were tailored to English and the other popular European languages. Although Persian is a language with more than 110 million speakers, to the best of our knowledge, there is a lack of public dataset on aspect-based sentiment analysis for Persian. This paper provides a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews. Moreover, as a baseline, this paper reports the performance of some state-of-the-art aspect-based sentiment analysis methods with a focus on deep learning, on Pars-ABSA. The obtained results are impressive compared to similar English state-of-the-art.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2019-07-26
URL	https://arxiv.org/abs/1908.01815v3
PDF	https://arxiv.org/pdf/1908.01815v3.pdf
PWC	https://paperswithcode.com/paper/pars-absa-an-aspect-based-sentiment-analysis
Repo	https://github.com/Titowak/Pars-ABSA
Framework	none

TU Wien @ TREC Deep Learning ‘19 – Simple Contextualization for Re-ranking


Title	TU Wien @ TREC Deep Learning ‘19 – Simple Contextualization for Re-ranking
Authors	Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury
Abstract	The usage of neural network models puts multiple objectives in conflict with each other: Ideally we would like to create a neural model that is effective, efficient, and interpretable at the same time. However, in most instances we have to choose which property is most important to us. We used the opportunity of the TREC 2019 Deep Learning track to evaluate the effectiveness of a balanced neural re-ranking approach. We submitted results of the TK (Transformer-Kernel) model: a neural re-ranking model for ad-hoc search using an efficient contextualization mechanism. TK employs a very small number of lightweight Transformer layers to contextualize query and document word embeddings. To score individual term interactions, we use a document-length enhanced kernel-pooling, which enables users to gain insight into the model. Our best result for the passage ranking task is: 0.420 MAP, 0.671 nDCG, 0.598 P@10 (TUW19-p3 full). Our best result for the document ranking task is: 0.271 MAP, 0.465 nDCG, 0.730 P@10 (TUW19-d3 re-ranking).
Tasks	Document Ranking, Word Embeddings
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01385v1
PDF	https://arxiv.org/pdf/1912.01385v1.pdf
PWC	https://paperswithcode.com/paper/tu-wien-trec-deep-learning-19-simple
Repo	https://github.com/thunlp/ReInfoSelect
Framework	pytorch

Neural Document Expansion with User Feedback


Title	Neural Document Expansion with User Feedback
Authors	Yue Yin, Chenyan Xiong, Cheng Luo, Zhiyuan Liu
Abstract	This paper presents a neural document expansion approach (NeuDEF) that enriches document representations for neural ranking models. NeuDEF harvests expansion terms from queries which lead to clicks on the document and weights these expansion terms with learned attention. It is plugged into a standard neural ranker and learned end-to-end. Experiments on a commercial search log demonstrate that NeuDEF significantly improves the accuracy of state-of-the-art neural rankers and expansion methods on queries with different frequencies. Further studies show the contribution of click queries and learned expansion weights, as well as the influence of document popularity of NeuDEF’s effectiveness.
Tasks
Published	2019-08-08
URL	https://arxiv.org/abs/1908.02938v1
PDF	https://arxiv.org/pdf/1908.02938v1.pdf
PWC	https://paperswithcode.com/paper/neural-document-expansion-with-user-feedback
Repo	https://github.com/thunlp/NeuDEF
Framework	pytorch

Modeling Tabular data using Conditional GAN


Title	Modeling Tabular data using Conditional GAN
Authors	Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni
Abstract	Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00503v2
PDF	https://arxiv.org/pdf/1907.00503v2.pdf
PWC	https://paperswithcode.com/paper/modeling-tabular-data-using-conditional-gan
Repo	https://github.com/DAI-Lab/SDGym
Framework	none

Learning-Free Iris Segmentation Revisited: A First Step Toward Fast Volumetric Operation Over Video Samples


Title	Learning-Free Iris Segmentation Revisited: A First Step Toward Fast Volumetric Operation Over Video Samples
Authors	Jeffery Kinnison, Mateusz Trokielewicz, Camila Carballo, Adam Czajka, Walter Scheirer
Abstract	Subject matching performance in iris biometrics is contingent upon fast, high-quality iris segmentation. In many cases, iris biometrics acquisition equipment takes a number of images in sequence and combines the segmentation and matching results for each image to strengthen the result. To date, segmentation has occurred in 2D, operating on each image individually. But such methodologies, while powerful, do not take advantage of potential gains in performance afforded by treating sequential images as volumetric data. As a first step in this direction, we apply the Flexible Learning-Free Reconstructoin of Neural Volumes (FLoRIN) framework, an open source segmentation and reconstruction framework originally designed for neural microscopy volumes, to volumetric segmentation of iris videos. Further, we introduce a novel dataset of near-infrared iris videos, in which each subject’s pupil rapidly changes size due to visible-light stimuli, as a test bed for FLoRIN. We compare the matching performance for iris masks generated by FLoRIN, deep-learning-based (SegNet), and Daugman’s (OSIRIS) iris segmentation approaches. We show that by incorporating volumetric information, FLoRIN achieves a factor of 3.6 to an order of magnitude increase in throughput with only a minor drop in subject matching performance. We also demonstrate that FLoRIN-based iris segmentation maintains this speedup on low-resource hardware, making it suitable for embedded biometrics systems.
Tasks	Iris Segmentation
Published	2019-01-06
URL	http://arxiv.org/abs/1901.01575v1
PDF	http://arxiv.org/pdf/1901.01575v1.pdf
PWC	https://paperswithcode.com/paper/learning-free-iris-segmentation-revisited-a
Repo	https://github.com/jeffkinnison/florin-iris
Framework	none

Data Diversification: An Elegant Strategy For Neural Machine Translation


Title	Data Diversification: An Elegant Strategy For Neural Machine Translation
Authors	Xuan-Phi Nguyen, Shafiq Joty, Wu Kui, Ai Ti Aw
Abstract	A common approach to improve neural machine translation is to invent new architectures. However, the research process of designing and refining such new models is often exhausting. Another approach is to resort to huge extra monolingual data to conduct semi-supervised training, like back-translation. But extra monolingual data is not always available, especially for low resource languages. In this paper, we propose to diversify the available training data by using multiple forward and backward peer models to augment the original training dataset. Our method does not require extra data like back-translation, nor additional computations and parameters like using pretrained models. Our data diversification method achieves state-of-the-art BLEU score of 30.7 in the WMT’14 English-German task. It also consistently and substantially improves translation quality in 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala).
Tasks	Machine Translation
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01986v1
PDF	https://arxiv.org/pdf/1911.01986v1.pdf
PWC	https://paperswithcode.com/paper/data-diversification-an-elegant-strategy-for
Repo	https://github.com/nxphi47/data_diversification
Framework	pytorch

A Large-Scale Comparison of Historical Text Normalization Systems


Title	A Large-Scale Comparison of Historical Text Normalization Systems
Authors	Marcel Bollmann
Abstract	There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder–decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.
Tasks	Machine Translation
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02036v1
PDF	http://arxiv.org/pdf/1904.02036v1.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-comparison-of-historical-text
Repo	https://github.com/coastalcph/histnorm
Framework	none

WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series


Title	WATTNet: Learning to Trade FX via Hierarchical Spatio-Temporal Representation of Highly Multivariate Time Series
Authors	Michael Poli, Jinkyoo Park, Ilija Ilievski
Abstract	Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability. Non-deliverable-forwards (NDF), a derivatives contract used in foreign exchange (FX) trading, presents additional difficulty in the form of long-term planning required for an effective selection of start and end date of the contract. In this work, we focus on tackling the problem of NDF tenor selection by leveraging high-dimensional sequential data consisting of spot rates, technical indicators and expert tenor patterns. To this end, we construct a dataset from the Depository Trust & Clearing Corporation (DTCC) NDF data that includes a comprehensive list of NDF volumes and daily spot rates for 64 FX pairs. We introduce WaveATTentionNet (WATTNet), a novel temporal convolution (TCN) model for spatio-temporal modeling of highly multivariate time series, and validate it across NDF markets with varying degrees of dissimilarity between the training and test periods in terms of volatility and general market regimes. The proposed method achieves a significant positive return on investment (ROI) in all NDF markets under analysis, outperforming recurrent and classical baselines by a wide margin. Finally, we propose two orthogonal interpretability approaches to verify noise stability and detect the driving factors of the learned tenor selection strategy.
Tasks	Time Series
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10801v1
PDF	https://arxiv.org/pdf/1909.10801v1.pdf
PWC	https://paperswithcode.com/paper/wattnet-learning-to-trade-fx-via-hierarchical
Repo	https://github.com/Zymrael/wattnet-fx-trading
Framework	pytorch

Continual Rare-Class Recognition with Emerging Novel Subclasses


Title	Continual Rare-Class Recognition with Emerging Novel Subclasses
Authors	Hung Nguyen, Xuejian Wang, Leman Akoglu
Abstract	Given a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We introduce RaRecognize, which (i) estimates a general decision boundary between the rare and the majority class, (ii) learns to recognize individual rare subclasses that exist within the training data, as well as (iii) flags instances from previously unseen rare subclasses as newly emerging. The learner in (i) is general in the sense that by construction it is dissimilar to the specialized learners in (ii), thus distinguishes minority from the majority without overly tuning to what is seen in the training data. Thanks to this generality, RaRecognize ignores all future instances that it labels as majority and recognizes the recurrent as well as emerging rare subclasses only. This saves effort at test time as well as ensures that the model size grows moderately over time as it only maintains specialized minority learners. Through extensive experiments, we show that RaRecognize outperforms state-of-the art baselines on three real-world datasets that contain corporate-risk and disaster documents as rare classes.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12218v1
PDF	https://arxiv.org/pdf/1906.12218v1.pdf
PWC	https://paperswithcode.com/paper/continual-rare-class-recognition-with
Repo	https://github.com/hungnt55/RaRecognize
Framework	none

Listwise View Ranking for Image Cropping


Title	Listwise View Ranking for Image Cropping
Authors	Weirui Lu, Xiaofen Xing, Bolun Cai, Xiangmin Xu
Abstract	Rank-based Learning with deep neural network has been widely used for image cropping. However, the performance of ranking-based methods is often poor and this is mainly due to two reasons: 1) image cropping is a listwise ranking task rather than pairwise comparison; 2) the rescaling caused by pooling layer and the deformation in view generation damage the performance of composition learning. In this paper, we develop a novel model to overcome these problems. To address the first problem, we formulate the image cropping as a listwise ranking problem to find the best view composition. For the second problem, a refined view sampling (called RoIRefine) is proposed to extract refined feature maps for candidate view generation. Given a series of candidate views, the proposed model learns the Top-1 probability distribution of views and picks up the best one. By integrating refined sampling and listwise ranking, the proposed network called LVRN achieves the state-of-the-art performance both in accuracy and speed.
Tasks	Image Cropping
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05352v1
PDF	https://arxiv.org/pdf/1905.05352v1.pdf
PWC	https://paperswithcode.com/paper/listwise-view-ranking-for-image-cropping
Repo	https://github.com/luwr1022/listwise-view-ranking
Framework	pytorch