February 2, 2020

3186 words 15 mins read

Paper Group AWR 16

Learning Compressed Sentence Representations for On-Device Text Processing. SESS: Self-Ensembling Semi-Supervised 3D Object Detection. DeepPrivacy: A Generative Adversarial Network for Face Anonymization. Scale Match for Tiny Person Detection. Neural Arabic Question Answering. Leveraging BERT for Extractive Text Summarization on Lectures. VizSeq: A …

Learning Compressed Sentence Representations for On-Device Text Processing


Title	Learning Compressed Sentence Representations for On-Device Text Processing
Authors	Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, Xinyuan Zhang, Qian Yang, Meng Tang, Asli Celikyilmaz, Lawrence Carin
Abstract	Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.
Tasks	Sentence Embeddings
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08340v1
PDF	https://arxiv.org/pdf/1906.08340v1.pdf
PWC	https://paperswithcode.com/paper/learning-compressed-sentence-representations
Repo	https://github.com/Linear95/BinarySentEmb
Framework	pytorch

SESS: Self-Ensembling Semi-Supervised 3D Object Detection


Title	SESS: Self-Ensembling Semi-Supervised 3D Object Detection
Authors	Na Zhao, Tat-Seng Chua, Gim Hee Lee
Abstract	The performance of existing point cloud-based 3D object detection methods heavily relies on large-scale high-quality 3D annotations. However, such annotations are often tedious and expensive to collect. Semi-supervised learning is a good alternative to mitigate the data annotation issue, but has remained largely unexplored in 3D object detection. Inspired by the recent success of self-ensembling technique in semi-supervised image classification task, we propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Furthermore, we propose three consistency losses to enforce the consistency between two sets of predicted 3D object proposals, to facilitate the learning of structure and semantic invariances of objects. Extensive experiments conducted on SUN RGB-D and ScanNet datasets demonstrate the effectiveness of SESS in both inductive and transductive semi-supervised 3D object detection. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
Tasks	3D Object Detection, Image Classification, Object Detection, Semi-Supervised Image Classification
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11803v1
PDF	https://arxiv.org/pdf/1912.11803v1.pdf
PWC	https://paperswithcode.com/paper/sess-self-ensembling-semi-supervised-3d
Repo	https://github.com/Na-Z/sess
Framework	pytorch

DeepPrivacy: A Generative Adversarial Network for Face Anonymization


Title	DeepPrivacy: A Generative Adversarial Network for Face Anonymization
Authors	Håkon Hukkelås, Rudolf Mester, Frank Lindseth
Abstract	We propose a novel architecture which is able to automatically anonymize faces in images while retaining the original data distribution. We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information. Our model is based on a conditional generative adversarial network, generating images considering the original pose and image background. The conditional information enables us to generate highly realistic faces with a seamless transition between the generated face and the existing background. Furthermore, we introduce a diverse dataset of human faces, including unconventional poses, occluded faces, and a vast variability in backgrounds. Finally, we present experimental results reflecting the capability of our model to anonymize images while preserving the data distribution, making the data suitable for further training of deep learning models. As far as we know, no other solution has been proposed that guarantees the anonymization of faces while generating realistic images.
Tasks	Face Anonymization
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04538v1
PDF	https://arxiv.org/pdf/1909.04538v1.pdf
PWC	https://paperswithcode.com/paper/deepprivacy-a-generative-adversarial-network
Repo	https://github.com/hukkelas/FDF
Framework	none

Scale Match for Tiny Person Detection


Title	Scale Match for Tiny Person Detection
Authors	Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han
Abstract	Visual object detection has achieved unprecedented ad-vance with the rise of deep convolutional neural networks.However, detecting tiny objects (for example tiny per-sons less than 20 pixels) in large-scale images remainsnot well investigated. The extremely small objects raisea grand challenge about feature representation while themassive and complex backgrounds aggregate the risk offalse alarms. In this paper, we introduce a new benchmark,referred to as TinyPerson, opening up a promising directionfor tiny object detection in a long distance and with mas-sive backgrounds. We experimentally find that the scale mis-match between the dataset for network pre-training and thedataset for detector learning could deteriorate the featurerepresentation and the detectors. Accordingly, we proposea simple yet effective Scale Match approach to align theobject scales between the two datasets for favorable tiny-object representation. Experiments show the significantperformance gain of our proposed approach over state-of-the-art detectors, and the challenging aspects of TinyPersonrelated to real-world scenarios. The TinyPerson benchmarkand the code for our approach will be publicly available(https://github.com/ucas-vg/TinyBenchmark).(Attention: evaluation rules of AP have updated in benchmark after this paper accepted, So this paper use old rules. we will keep old rules of AP in benchmark, but we recommand the new and we will use the new in latter research.)
Tasks	Human Detection, Object Detection
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10664v1
PDF	https://arxiv.org/pdf/1912.10664v1.pdf
PWC	https://paperswithcode.com/paper/scale-match-for-tiny-person-detection
Repo	https://github.com/ucas-vg/TinyBenchmark
Framework	none

Neural Arabic Question Answering


Title	Neural Arabic Question Answering
Authors	Hussein Mozannar, Karl El Hajal, Elie Maamary, Hazem Hajj
Abstract	This paper tackles the problem of open domain factual Arabic question answering (QA) using Wikipedia as our knowledge source. This constrains the answer of any question to be a span of text in Wikipedia. Open domain QA for Arabic entails three challenges: annotated QA datasets in Arabic, large scale efficient information retrieval and machine reading comprehension. To deal with the lack of Arabic QA datasets we present the Arabic Reading Comprehension Dataset (ARCD) composed of 1,395 questions posed by crowdworkers on Wikipedia articles, and a machine translation of the Stanford Question Answering Dataset (Arabic-SQuAD). Our system for open domain question answering in Arabic (SOQAL) is based on two components: (1) a document retriever using a hierarchical TF-IDF approach and (2) a neural reading comprehension model using the pre-trained bi-directional transformer BERT. Our experiments on ARCD indicate the effectiveness of our approach with our BERT-based reader achieving a 61.3 F1 score, and our open domain system SOQAL achieving a 27.6 F1 score.
Tasks	Information Retrieval, Machine Reading Comprehension, Machine Translation, Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05394v1
PDF	https://arxiv.org/pdf/1906.05394v1.pdf
PWC	https://paperswithcode.com/paper/neural-arabic-question-answering
Repo	https://github.com/husseinmozannar/SOQAL
Framework	tf

Leveraging BERT for Extractive Text Summarization on Lectures


Title	Leveraging BERT for Extractive Text Summarization on Lectures
Authors	Derek Miller
Abstract	In the last two decades, automatic extractive text summarization on lectures has demonstrated to be a useful tool for collecting key phrases and sentences that best represent the content. However, many current approaches utilize dated approaches, producing sub-par outputs or requiring several hours of manual tuning to produce meaningful results. Recently, new machine learning architectures have provided mechanisms for extractive summarization through the clustering of output embeddings from deep learning models. This paper reports on the project called Lecture Summarization Service, a python based RESTful service that utilizes the BERT model for text embeddings and KMeans clustering to identify sentences closes to the centroid for summary selection. The purpose of the service was to provide students a utility that could summarize lecture content, based on their desired number of sentences. On top of the summary work, the service also includes lecture and summary management, storing content on the cloud which can be used for collaboration. While the results of utilizing BERT for extractive summarization were promising, there were still areas where the model struggled, providing feature research opportunities for further improvement.
Tasks	Text Summarization
Published	2019-06-07
URL	https://arxiv.org/abs/1906.04165v1
PDF	https://arxiv.org/pdf/1906.04165v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-bert-for-extractive-text
Repo	https://github.com/dmmiller612/bert-extractive-summarizer
Framework	pytorch

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks


Title	VizSeq: A Visual Analysis Toolkit for Text Generation Tasks
Authors	Changhan Wang, Anirudh Jain, Danlu Chen, Jiatao Gu
Abstract	Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.
Tasks	Image Captioning, Machine Translation, Text Generation, Text Summarization, Video Description
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05424v1
PDF	https://arxiv.org/pdf/1909.05424v1.pdf
PWC	https://paperswithcode.com/paper/vizseq-a-visual-analysis-toolkit-for-text
Repo	https://github.com/facebookresearch/vizseq
Framework	none

Attribute-aware Pedestrian Detection in a Crowd


Title	Attribute-aware Pedestrian Detection in a Crowd
Authors	Jialiang Zhang, Lixiang Lin, Yang Li, Yun-chen Chen, Jianke Zhu, Yao Hu, Steven C. H. Hoi
Abstract	Pedestrian detection is an initial step to perform outdoor scene analysis, which plays an essential role in many real-world applications. Although having enjoyed the merits of deep learning frameworks from the generic object detectors, pedestrian detection is still a very challenging task due to heavy occlusion and highly crowded group. Generally, the conventional detectors are unable to differentiate individuals from each other effectively under such a dense environment. To tackle this critical problem, we propose an attribute-aware pedestrian detector to explicitly model people’s semantic attributes in a high-level feature detection fashion. Besides the typical semantic features, center position, target’s scale and offset, we introduce a pedestrian-oriented attribute feature to encode the high-level semantic differences among the crowd. Moreover, a novel attribute-feature-based Non-Maximum Suppression~(NMS) is proposed to distinguish the person from a highly overlapped group by adaptively rejecting the false-positive results in a very crowd settings. Furthermore, a novel ground truth target is designed to alleviate the difficulties caused by the attribute configuration and extremely class imbalance issues during training. Finally, we evaluate our proposed attribute-aware pedestrian detector on two benchmark datasets including CityPersons and CrowdHuman. The experimental results show that our approach outperforms state-of-the-art methods at a large margin on pedestrian detection.
Tasks	Pedestrian Detection
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09188v2
PDF	https://arxiv.org/pdf/1910.09188v2.pdf
PWC	https://paperswithcode.com/paper/csid-center-scale-identity-and-density-aware
Repo	https://github.com/kalyo-zjl/APD
Framework	pytorch

Controlling Neural Machine Translation Formality with Synthetic Supervision


Title	Controlling Neural Machine Translation Formality with Synthetic Supervision
Authors	Xing Niu, Marine Carpuat
Abstract	This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.
Tasks	Machine Translation
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08706v2
PDF	https://arxiv.org/pdf/1911.08706v2.pdf
PWC	https://paperswithcode.com/paper/controlling-neural-machine-translation
Repo	https://github.com/xingniu/multitask-ft-fsmt
Framework	mxnet

A general representation of dynamical systems for reservoir computing


Title	A general representation of dynamical systems for reservoir computing
Authors	Sidney Pontes-Filho, Anis Yazidi, Jianhua Zhang, Hugo Hammer, Gustavo B. M. Mello, Ioanna Sandvig, Gunnar Tufte, Stefano Nichele
Abstract	Dynamical systems are capable of performing computation in a reservoir computing paradigm. This paper presents a general representation of these systems as an artificial neural network (ANN). Initially, we implement the simplest dynamical system, a cellular automaton. The mathematical fundamentals behind an ANN are maintained, but the weights of the connections and the activation function are adjusted to work as an update rule in the context of cellular automata. The advantages of such implementation are its usage on specialized and optimized deep learning libraries, the capabilities to generalize it to other types of networks and the possibility to evolve cellular automata and other dynamical systems in terms of connectivity, update and learning rules. Our implementation of cellular automata constitutes an initial step towards a general framework for dynamical systems. It aims to evolve such systems to optimize their usage in reservoir computing and to model physical computing substrates.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01856v1
PDF	https://arxiv.org/pdf/1907.01856v1.pdf
PWC	https://paperswithcode.com/paper/a-general-representation-of-dynamical-systems
Repo	https://github.com/SocratesNFR/EvoDynamic
Framework	tf

Exact Hard Monotonic Attention for Character-Level Transduction


Title	Exact Hard Monotonic Attention for Character-Level Transduction
Authors	Shijie Wu, Ryan Cotterell
Abstract	Many common character-level, string-to-string transduction tasks, e.g. graphemeto-phoneme conversion and morphological inflection, consist almost exclusively of monotonic transduction. Neural sequence-to-sequence models with soft attention, which are non-monotonic, often outperform popular monotonic models. In this work, we ask the following question: Is monotonicity really a helpful inductive bias in these tasks? We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce. With the help of dynamic programming, we are able to compute the exact marginalization over all monotonic alignments. Our models achieve state-of-the-art performance on morphological inflection. Furthermore, we find strong performance on two other character-level transduction tasks. Code is available at https://github.com/shijie-wu/neural-transducer.
Tasks	Morphological Inflection
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06319v2
PDF	https://arxiv.org/pdf/1905.06319v2.pdf
PWC	https://paperswithcode.com/paper/exact-hard-monotonic-attention-for-character
Repo	https://github.com/shijie-wu/neural-transducer
Framework	pytorch

Adaptive Neural Signal Detection for Massive MIMO


Title	Adaptive Neural Signal Detection for Massive MIMO
Authors	Mehrdad Khani, Mohammad Alizadeh, Jakob Hoydis, Phil Fleming
Abstract	Symbol detection for Massive Multiple-Input Multiple-Output (MIMO) is a challenging problem for which traditional algorithms are either impractical or suffer from performance limitations. Several recently proposed learning-based approaches achieve promising results on simple channel models (e.g., i.i.d. Gaussian). However, their performance degrades significantly on real-world channels with spatial correlation. We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet’s design builds on the theory of iterative soft-thresholding algorithms and uses a novel training algorithm that leverages temporal and spectral correlation to accelerate training. Together, these innovations allow MMNet to train online for every realization of the channel. On i.i.d. Gaussian channels, MMNet requires two orders of magnitude fewer operations than existing deep learning schemes but achieves near-optimal performance. On spatially-correlated channels, it achieves the same error rate as the next-best learning scheme (OAMPNet) at 2.5dB lower SNR and with at least 10x less computational complexity. MMNet is also 4–8dB better overall than a classic linear scheme like the minimum mean square error (MMSE) detector.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04610v1
PDF	https://arxiv.org/pdf/1906.04610v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-neural-signal-detection-for-massive
Repo	https://github.com/mehrdadkhani/MMNet
Framework	tf

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation


Title	A Latent Morphology Model for Open-Vocabulary Neural Machine Translation
Authors	Duygu Ataman, Wilker Aziz, Alexandra Birch
Abstract	Translation into morphologically-rich languages challenges neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyntactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.
Tasks	Machine Translation, Morphological Inflection
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13890v3
PDF	https://arxiv.org/pdf/1910.13890v3.pdf
PWC	https://paperswithcode.com/paper/a-latent-morphology-model-for-open-vocabulary-1
Repo	https://github.com/d-ataman/lmm
Framework	pytorch

DADA: A Large-scale Benchmark and Model for Driver Attention Prediction in Accidental Scenarios


Title	DADA: A Large-scale Benchmark and Model for Driver Attention Prediction in Accidental Scenarios
Authors	Jianwu Fang, Dingxin Yan, Jiahuan Qiao, Jianru Xue
Abstract	Driver attention prediction has recently absorbed increasing attention in traffic scene understanding and is prone to be an essential problem in vision-centered and human-like driving systems. This work, different from other attempts, makes an attempt to predict the driver attention in accidental scenarios containing normal, critical and accidental situations simultaneously. However, challenges tread on the heels of that because of the dynamic traffic scene, intricate and imbalanced accident categories. With the hypothesis that driver attention can provide a selective role of crash-object for assisting driving accident detection or prediction, this paper designs a multi-path semantic-guided attentive fusion network (MSAFNet) that learns the spatio-temporal semantic and scene variation in prediction. For fulfilling this, a large-scale benchmark with 2000 video sequences (named as DADA-2000) is contributed with laborious annotation for driver attention (fixation, saccade, focusing time), accident objects/intervals, as well as the accident categories, and superior performance to state-of-the-arts are provided by thorough evaluations. As far as we know, this is the first comprehensive and quantitative study for the human-eye sensing exploration in accidental scenarios. DADA-2000 is available at https://github.com/JWFangit/LOTVS-DADA.
Tasks	Driver Attention Monitoring, Scene Understanding
Published	2019-12-18
URL	https://arxiv.org/abs/1912.12148v1
PDF	https://arxiv.org/pdf/1912.12148v1.pdf
PWC	https://paperswithcode.com/paper/dada-a-large-scale-benchmark-and-model-for
Repo	https://github.com/JWFangit/LOTVS-DADA
Framework	none

Position Focused Attention Network for Image-Text Matching


Title	Position Focused Attention Network for Image-Text Matching
Authors	Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan
Abstract	Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical large-scale news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method achieves the state-of-art performance on all of these three datasets.
Tasks	Text Matching
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09748v1
PDF	https://arxiv.org/pdf/1907.09748v1.pdf
PWC	https://paperswithcode.com/paper/position-focused-attention-network-for-image
Repo	https://github.com/HaoYang0123/Position-Focused-Attention-Network
Framework	pytorch