October 18, 2019

3003 words 15 mins read

Paper Group ANR 612

GlobalTrait: Personality Alignment of Multilingual Word Embeddings. A Stronger Baseline for Multilingual Word Embeddings. Generalised framework for multi-criteria method selection. Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting. Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight …

GlobalTrait: Personality Alignment of Multilingual Word Embeddings


Title	GlobalTrait: Personality Alignment of Multilingual Word Embeddings
Authors	Farhad Bin Siddique, Dario Bertero, Pascale Fung
Abstract	We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian. Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, GlobalTrait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset.
Tasks	Multilingual Word Embeddings, Word Embeddings
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00240v2
PDF	http://arxiv.org/pdf/1811.00240v2.pdf
PWC	https://paperswithcode.com/paper/globaltrait-personality-alignment-of
Repo
Framework

A Stronger Baseline for Multilingual Word Embeddings


Title	A Stronger Baseline for Multilingual Word Embeddings
Authors	Philipp Dufter, Hinrich Schütze
Abstract	Levy, S{\o}gaard and Goldberg’s (2017) S-ID (sentence ID) method applies word2vec on tuples containing a sentence ID and a word from the sentence. It has been shown to be a strong baseline for learning multilingual embeddings. Inspired by recent work on concept based embedding learning we propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. We perform experiments on the Parallel Bible Corpus across 1000+ languages and show that SC-ID yields up to 6% performance increase in a word translation task. In addition, we provide evidence that SC-ID is easily and widely applicable by reporting competitive results across 8 tasks on a EuroParl based corpus.
Tasks	Multilingual Word Embeddings, Word Embeddings
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00586v1
PDF	http://arxiv.org/pdf/1811.00586v1.pdf
PWC	https://paperswithcode.com/paper/a-stronger-baseline-for-multilingual-word
Repo
Framework

Generalised framework for multi-criteria method selection


Title	Generalised framework for multi-criteria method selection
Authors	Jarosław Wątróbski, Jarosław Jankowski, Paweł Ziemba, Artur Karczmarczyk, Magdalena Zioło
Abstract	Multi-Criteria Decision Analysis (MCDA) methods are widely used in various fields and disciplines. While most of the research has been focused on the development and improvement of new MCDA methods, relatively limited attention has been paid to their appropriate selection for the given decision problem. Their improper application decreases the quality of recommendations, as different MCDA methods deliver inconsistent results. The current paper presents a methodological and practical framework for selecting suitable MCDA methods for a particular decision situation. A set of 56 available MCDA methods was analyzed and, based on that, a hierarchical set of methods characteristics and the rule base were obtained. This analysis, rules and modelling of the uncertainty in the decision problem description allowed to build a framework supporting the selection of a MCDA method for a given decision-making situation. The practical studies indicate consistency between the methods recommended with the proposed approach and those used by the experts in reference cases. The results of the research also showed that the proposed approach can be used as a general framework for selecting an appropriate MCDA method for a given area of decision support, even in cases of data gaps in the decision-making problem description. The proposed framework was implemented within a web platform available for public use at www.mcda.it.
Tasks	Decision Making
Published	2018-10-25
URL	http://arxiv.org/abs/1810.11078v1
PDF	http://arxiv.org/pdf/1810.11078v1.pdf
PWC	https://paperswithcode.com/paper/generalised-framework-for-multi-criteria
Repo
Framework

Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting


Title	Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting
Authors	Clement Fung, Jamie Koerner, Stewart Grant, Ivan Beschastnikh
Abstract	Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-party ML, to construct TorMentor: an anonymous hidden service that supports private multi-party ML. We define a new threat model by characterizing, developing and evaluating new attacks in the brokered learning setting, along with new defenses for these attacks. We show that TorMentor effectively protects data providers against known ML attacks while providing them with a tunable trade-off between model accuracy and privacy. We evaluate TorMentor with local and geo-distributed deployments on Azure/Tor. In an experiment with 200 clients and 14 MB of data per client, our prototype trained a logistic regression model using stochastic gradient descent in 65s. Code is available at: https://github.com/DistributedML/TorML
Tasks
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09712v2
PDF	http://arxiv.org/pdf/1811.09712v2.pdf
PWC	https://paperswithcode.com/paper/dancing-in-the-dark-private-multi-party
Repo
Framework

Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight CNN


Title	Smart Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight CNN
Authors	Seyed Yahya Nikouei, Yu Chen, Sejun Song, Ronghua Xu, Baek-Young Choi, Timothy R. Faughnan
Abstract	Edge computing efficiently extends the realm of information technology beyond the boundary defined by cloud computing paradigm. Performing computation near the source and destination, edge computing is promising to address the challenges in many delay-sensitive applications, like real-time human surveillance. Leveraging the ubiquitously connected cameras and smart mobile devices, it enables video analytics at the edge. In recent years, many smart video surveillance approaches are proposed for object detection and tracking by using Artificial Intelligence (AI) and Machine Learning (ML) algorithms. This work explores the feasibility of two popular human-objects detection schemes, Harr-Cascade and HOG feature extraction and SVM classifier, at the edge and introduces a lightweight Convolutional Neural Network (L-CNN) leveraging the depthwise separable convolution for less computation, for human detection. Single Board computers (SBC) are used as edge devices for tests and algorithms are validated using real-world campus surveillance video streams and open data sets. The experimental results are promising that the final algorithm is able to track humans with a decent accuracy at a resource consumption affordable by edge devices in real-time manner.
Tasks	Human Detection, Object Detection
Published	2018-04-24
URL	http://arxiv.org/abs/1805.00331v2
PDF	http://arxiv.org/pdf/1805.00331v2.pdf
PWC	https://paperswithcode.com/paper/smart-surveillance-as-an-edge-network-service
Repo
Framework

Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding


Title	Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding
Authors	Xi Zhang, Xiaolin Wu
Abstract	Deep convolutional neural networks (DCNN) have enjoyed great successes in many signal processing applications because they can learn complex, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of sequential prediction of multidimensional signals, such as images, and have the potential of improving the performance of traditional linear predictors. In this research we investigate how far DCNNs can push the envelop in terms of prediction precision. We propose, in a case study, a two-stage deep regression DCNN framework for nonlinear prediction of two-dimensional image signals. In the first-stage regression, the proposed deep prediction network (PredNet) takes the causal context as input and emits a prediction of the present pixel. Three PredNets are trained with the regression objectives of minimizing $\ell_1$, $\ell_2$ and $\ell_\infty$ norms of prediction residuals, respectively. The second-stage regression combines the outputs of the three PredNets to generate an even more precise and robust prediction. The proposed deep regression model is applied to lossless predictive image coding, and it outperforms the state-of-the-art linear predictors by appreciable margin.
Tasks
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12568v1
PDF	http://arxiv.org/pdf/1810.12568v1.pdf
PWC	https://paperswithcode.com/paper/nonlinear-prediction-of-multidimensional
Repo
Framework

Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs


Title	Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs
Authors	Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos
Abstract	Visual and audiovisual speech recognition are witnessing a renaissance which is largely due to the advent of deep learning methods. In this paper, we present a deep learning architecture for lipreading and audiovisual word recognition, which combines Residual Networks equipped with spatiotemporal input layers and Bidirectional LSTMs. The lipreading architecture attains 11.92% misclassification rate on the challenging Lipreading-In-The-Wild database, which is composed of excerpts from BBC-TV, each containing one of the 500 target words. Audiovisual experiments are performed using both intermediate and late integration, as well as several types and levels of environmental noise, and notable improvements over the audio-only network are reported, even in the case of clean speech. A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word. Finally, we examine difficult word pairs and discuss how visual information helps towards attaining higher recognition accuracy.
Tasks	Lipreading, Speech Recognition
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01194v1
PDF	http://arxiv.org/pdf/1811.01194v1.pdf
PWC	https://paperswithcode.com/paper/pushing-the-boundaries-of-audiovisual-word
Repo
Framework

Predicting the Argumenthood of English Prepositional Phrases


Title	Predicting the Argumenthood of English Prepositional Phrases
Authors	Najoung Kim, Kyle Rawlins, Benjamin Van Durme, Paul Smolensky
Abstract	Distinguishing between arguments and adjuncts of a verb is a longstanding, nontrivial problem. In natural language processing, argumenthood information is important in tasks such as semantic role labeling (SRL) and prepositional phrase (PP) attachment disambiguation. In theoretical linguistics, many diagnostic tests for argumenthood exist but they often yield conflicting and potentially gradient results. This is especially the case for syntactically oblique items such as PPs. We propose two PP argumenthood prediction tasks branching from these two motivations: (1) binary argument-adjunct classification of PPs in VerbNet, and (2) gradient argumenthood prediction using human judgments as gold standard, and report results from prediction models that use pretrained word embeddings and other linguistically informed features. Our best results on each task are (1) $acc.=0.955$, $F_1=0.954$ (ELMo+BiLSTM) and (2) Pearson’s $r=0.624$ (word2vec+MLP). Furthermore, we demonstrate the utility of argumenthood prediction in improving sentence representations via performance gains on SRL when a sentence encoder is pretrained with our tasks.
Tasks	Semantic Role Labeling, Word Embeddings
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07889v4
PDF	http://arxiv.org/pdf/1809.07889v4.pdf
PWC	https://paperswithcode.com/paper/predicting-the-argumenthood-of-english
Repo
Framework

The speaker-independent lipreading play-off; a survey of lipreading machines


Title	The speaker-independent lipreading play-off; a survey of lipreading machines
Authors	Jake Burton, David Frank, Madhi Saleh, Nassir Navab, Helen L. Bear
Abstract	Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines, but greater than previously reported in independence experiments.
Tasks	Lipreading
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10597v1
PDF	http://arxiv.org/pdf/1810.10597v1.pdf
PWC	https://paperswithcode.com/paper/the-speaker-independent-lipreading-play-off-a
Repo
Framework

Measuring and Characterizing Generalization in Deep Reinforcement Learning


Title	Measuring and Characterizing Generalization in Deep Reinforcement Learning
Authors	Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen
Abstract	Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent’s performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.
Tasks	Representation Learning
Published	2018-12-07
URL	http://arxiv.org/abs/1812.02868v2
PDF	http://arxiv.org/pdf/1812.02868v2.pdf
PWC	https://paperswithcode.com/paper/measuring-and-characterizing-generalization
Repo
Framework

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?


Title	Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?
Authors	Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard Jäger
Abstract	We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cognate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We conclude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family’s phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies.
Tasks
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05416v1
PDF	http://arxiv.org/pdf/1804.05416v1.pdf
PWC	https://paperswithcode.com/paper/are-automatic-methods-for-cognate-detection
Repo
Framework

Compressive Single-pixel Fourier Transform Imaging using Structured Illumination


Title	Compressive Single-pixel Fourier Transform Imaging using Structured Illumination
Authors	Amirafshar Moshtaghpour, José M. Bioucas-Dias, Laurent Jacques
Abstract	Single Pixel (SP) imaging is now a reality in many applications, e.g., biomedical ultrathin endoscope and fluorescent spectroscopy. In this context, many schemes exist to improve the light throughput of these device, e.g., using structured illumination driven by compressive sensing theory. In this work, we consider the combination of SP imaging with Fourier Transform Interferometry (SP-FTI) to reach high-resolution HyperSpectral (HS) imaging, as desirable, e.g., in fluorescent spectroscopy. While this association is not new, we here focus on optimizing the spatial illumination, structured as Hadamard patterns, during the optical path progression. We follow a variable density sampling strategy for space-time coding of the light illumination, and show theoretically and numerically that this scheme allows us to reduce the number of measurements and light-exposure of the observed object compared to conventional compressive SP-FTI.
Tasks	Compressive Sensing
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13200v2
PDF	http://arxiv.org/pdf/1810.13200v2.pdf
PWC	https://paperswithcode.com/paper/compressive-single-pixel-fourier-transform
Repo
Framework

Single-channel Speech Dereverberation via Generative Adversarial Training


Title	Single-channel Speech Dereverberation via Generative Adversarial Training
Authors	Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu
Abstract	In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). In order to obtain better speech quality instead of only minimizing a mean square error (MSE), GAT is employed to make the dereverberated speech indistinguishable form the clean samples. Besides, our system can deal with wide range reverberation and be well adapted to variant environments. The experimental results show that the proposed model outperforms weighted prediction error (WPE) and deep neural network-based systems. In addition, DeReGAT is extended to an online speech dereverberation scenario, which reports comparable performance with the offline case.
Tasks
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09325v1
PDF	http://arxiv.org/pdf/1806.09325v1.pdf
PWC	https://paperswithcode.com/paper/single-channel-speech-dereverberation-via
Repo
Framework

Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN


Title	Real-Time Human Detection as an Edge Service Enabled by a Lightweight CNN
Authors	Seyed Yahya Nikouei, Yu Chen, Sejun Song, Ronghua Xu, Baek-Young Choi, Timothy R. Faughnan
Abstract	Edge computing allows more computing tasks to take place on the decentralized nodes at the edge of networks. Today many delay sensitive, mission-critical applications can leverage these edge devices to reduce the time delay or even to enable real time, online decision making thanks to their onsite presence. Human objects detection, behavior recognition and prediction in smart surveillance fall into that category, where a transition of a huge volume of video streaming data can take valuable time and place heavy pressure on communication networks. It is widely recognized that video processing and object detection are computing intensive and too expensive to be handled by resource limited edge devices. Inspired by the depthwise separable convolution and Single Shot Multi-Box Detector (SSD), a lightweight Convolutional Neural Network (LCNN) is introduced in this paper. By narrowing down the classifier’s searching space to focus on human objects in surveillance video frames, the proposed LCNN algorithm is able to detect pedestrians with an affordable computation workload to an edge device. A prototype has been implemented on an edge node (Raspberry PI 3) using openCV libraries, and satisfactory performance is achieved using real world surveillance video streams. The experimental study has validated the design of LCNN and shown it is a promising approach to computing intensive applications at the edge.
Tasks	Decision Making, Human Detection, Object Detection
Published	2018-04-24
URL	http://arxiv.org/abs/1805.00330v1
PDF	http://arxiv.org/pdf/1805.00330v1.pdf
PWC	https://paperswithcode.com/paper/real-time-human-detection-as-an-edge-service
Repo
Framework

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video


Title	CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video
Authors	Huizi Mao, Taeyoung Kong, William J. Dally
Abstract	Detecting objects in a video is a compute-intensive task. In this paper we propose CaTDet, a system to speedup object detection by leveraging the temporal correlation in video. CaTDet consists of two DNN models that form a cascaded detector, and an additional tracker to predict regions of interests based on historic detections. We also propose a new metric, mean Delay(mD), which is designed for latency-critical video applications. Experiments on the KITTI dataset show that CaTDet reduces operation count by 5.1-8.7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0.3 frame. On CityPersons dataset, CaTDet achieves 13.0x reduction in operations with 0.8% mAP loss.
Tasks	Object Detection
Published	2018-09-30
URL	http://arxiv.org/abs/1810.00434v2
PDF	http://arxiv.org/pdf/1810.00434v2.pdf
PWC	https://paperswithcode.com/paper/catdet-cascaded-tracked-detector-for
Repo
Framework