Paper Group AWR 74
FA*IR: A Fair Top-k Ranking Algorithm. Deep Binary Reconstruction for Cross-modal Hashing. Robust Spatial Filtering with Graph Convolutional Neural Networks. weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming. How intelligent are convolutional neural networks?. Virtual to Real Reinforcement Learning for …
FA*IR: A Fair Top-k Ranking Algorithm
Title | FA*IR: A Fair Top-k Ranking Algorithm |
Authors | Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates |
Abstract | In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n » k candidates, maximizing utility (i.e., select the “best” candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above or indistinguishable from a given minimum. Utility is operationalized in two ways: (i) every candidate included in the top-$k$ should be more qualified than every candidate not included; and (ii) for every pair of candidates in the top-k, the more qualified candidate should be ranked above. An efficient algorithm is presented for producing the Fair Top-k Ranking, and tested experimentally on existing datasets as well as new datasets released with this paper, showing that our approach yields small distortions with respect to rankings that maximize utility without considering fairness criteria. To the best of our knowledge, this is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list. |
Tasks | |
Published | 2017-06-20 |
URL | https://arxiv.org/abs/1706.06368v3 |
https://arxiv.org/pdf/1706.06368v3.pdf | |
PWC | https://paperswithcode.com/paper/fair-a-fair-top-k-ranking-algorithm |
Repo | https://github.com/MilkaLichtblau/FA-IR_Ranking |
Framework | none |
Deep Binary Reconstruction for Cross-modal Hashing
Title | Deep Binary Reconstruction for Cross-modal Hashing |
Authors | Xuelong Li, Di Hu, Feiping Nie |
Abstract | With the increasing demand of massive multimodal data storage and organization, cross-modal retrieval based on hashing technique has drawn much attention nowadays. It takes the binary codes of one modality as the query to retrieve the relevant hashing codes of another modality. However, the existing binary constraint makes it difficult to find the optimal cross-modal hashing function. Most approaches choose to relax the constraint and perform thresholding strategy on the real-value representation instead of directly solving the original objective. In this paper, we first provide a concrete analysis about the effectiveness of multimodal networks in preserving the inter- and intra-modal consistency. Based on the analysis, we provide a so-called Deep Binary Reconstruction (DBRC) network that can directly learn the binary hashing codes in an unsupervised fashion. The superiority comes from a proposed simple but efficient activation function, named as Adaptive Tanh (ATanh). The ATanh function can adaptively learn the binary codes and be trained via back-propagation. Extensive experiments on three benchmark datasets demonstrate that DBRC outperforms several state-of-the-art methods in both image2text and text2image retrieval task. |
Tasks | Cross-Modal Retrieval |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05127v2 |
http://arxiv.org/pdf/1708.05127v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-binary-reconstruction-for-cross-modal |
Repo | https://github.com/yolo2233/cross-modal-hasing-playground |
Framework | tf |
Robust Spatial Filtering with Graph Convolutional Neural Networks
Title | Robust Spatial Filtering with Graph Convolutional Neural Networks |
Authors | Felipe Petroski Such, Shagan Sah, Miguel Dominguez, Suhas Pillai, Chao Zhang, Andrew Michael, Nathan Cahill, Raymond Ptucha |
Abstract | Convolutional Neural Networks (CNNs) have recently led to incredible breakthroughs on a variety of pattern recognition problems. Banks of finite impulse response filters are learned on a hierarchy of layers, each contributing more abstract information than the previous layer. The simplicity and elegance of the convolutional filtering process makes them perfect for structured problems such as image, video, or voice, where vertices are homogeneous in the sense of number, location, and strength of neighbors. The vast majority of classification problems, for example in the pharmaceutical, homeland security, and financial domains are unstructured. As these problems are formulated into unstructured graphs, the heterogeneity of these problems, such as number of vertices, number of connections per vertex, and edge strength, cannot be tackled with standard convolutional techniques. We propose a novel neural learning framework that is capable of handling both homogeneous and heterogeneous data, while retaining the benefits of traditional CNN successes. Recently, researchers have proposed variations of CNNs that can handle graph data. In an effort to create learnable filter banks of graphs, these methods either induce constraints on the data or require preprocessing. As opposed to spectral methods, our framework, which we term Graph-CNNs, defines filters as polynomials of functions of the graph adjacency matrix. Graph-CNNs can handle both heterogeneous and homogeneous graph data, including graphs having entirely different vertex or edge sets. We perform experiments to validate the applicability of Graph-CNNs to a variety of structured and unstructured classification problems and demonstrate state-of-the-art results on document and molecule classification problems. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00792v3 |
http://arxiv.org/pdf/1703.00792v3.pdf | |
PWC | https://paperswithcode.com/paper/robust-spatial-filtering-with-graph |
Repo | https://github.com/fps7806/Graph-CNN |
Framework | tf |
weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming
Title | weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming |
Authors | Inkyu Sa, Zetao Chen, Marija Popovic, Raghav Khanna, Frank Liebisch, Juan Nieto, Roland Siegwart |
Abstract | Selective weed treatment is a critical step in autonomous crop management as related to crop health and yield. However, a key challenge is reliable, and accurate weed detection to minimize damage to surrounding plants. In this paper, we present an approach for dense semantic weed classification with multispectral images collected by a micro aerial vehicle (MAV). We use the recently developed encoder-decoder cascaded Convolutional Neural Network (CNN), Segnet, that infers dense semantic classes while allowing any number of input image channels and class balancing with our sugar beet and weed datasets. To obtain training datasets, we established an experimental field with varying herbicide levels resulting in field plots containing only either crop or weed, enabling us to use the Normalized Difference Vegetation Index (NDVI) as a distinguishable feature for automatic ground truth generation. We train 6 models with different numbers of input channels and condition (fine-tune) it to achieve about 0.8 F1-score and 0.78 Area Under the Curve (AUC) classification metrics. For model deployment, an embedded GPU system (Jetson TX2) is tested for MAV integration. Dataset used in this paper is released to support the community and future work. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03329v1 |
http://arxiv.org/pdf/1709.03329v1.pdf | |
PWC | https://paperswithcode.com/paper/weednet-dense-semantic-weed-classification |
Repo | https://github.com/inkyusa/weedNet |
Framework | none |
How intelligent are convolutional neural networks?
Title | How intelligent are convolutional neural networks? |
Authors | Zhennan Yan, Xiang Sean Zhou |
Abstract | Motivated by the Gestalt pattern theory, and the Winograd Challenge for language understanding, we design synthetic experiments to investigate a deep learning algorithm’s ability to infer simple (at least for human) visual concepts, such as symmetry, from examples. A visual concept is represented by randomly generated, positive as well as negative, example images. We then test the ability and speed of algorithms (and humans) to learn the concept from these images. The training and testing are performed progressively in multiple rounds, with each subsequent round deliberately designed to be more complex and confusing than the previous round(s), especially if the concept was not grasped by the learner. However, if the concept was understood, all the deliberate tests would become trivially easy. Our experiments show that humans can often infer a semantic concept quickly after looking at only a very small number of examples (this is often referred to as an “aha moment”: a moment of sudden realization), and performs perfectly during all testing rounds (except for careless mistakes). On the contrary, deep convolutional neural networks (DCNN) could approximate some concepts statistically, but only after seeing many (x10^4) more examples. And it will still make obvious mistakes, especially during deliberate testing rounds or on samples outside the training distributions. This signals a lack of true “understanding”, or a failure to reach the right “formula” for the semantics. We did find that some concepts are easier for DCNN than others. For example, simple “counting” is more learnable than “symmetry”, while “uniformity” or “conformance” are much more difficult for DCNN to learn. To conclude, we propose an “Aha Challenge” for visual perception, calling for focused and quantitative research on Gestalt-style machine intelligence using limited training examples. |
Tasks | |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.06126v2 |
http://arxiv.org/pdf/1709.06126v2.pdf | |
PWC | https://paperswithcode.com/paper/how-intelligent-are-convolutional-neural |
Repo | https://github.com/zhennany/synthetic |
Framework | none |
Virtual to Real Reinforcement Learning for Autonomous Driving
Title | Virtual to Real Reinforcement Learning for Autonomous Driving |
Authors | Xinlei Pan, Yurong You, Ziyan Wang, Cewu Lu |
Abstract | Reinforcement learning is considered as a promising direction for driving policy learning. However, training autonomous driving vehicle with reinforcement learning in real environment involves non-affordable trial-and-error. It is more desirable to first train in a virtual environment and then transfer to the real environment. In this paper, we propose a novel realistic translation network to make model trained in virtual environment be workable in real world. The proposed network can convert non-realistic virtual image input into a realistic one with similar scene structure. Given realistic frames as input, driving policy trained by reinforcement learning can nicely adapt to real world driving. Experiments show that our proposed virtual to real (VR) reinforcement learning (RL) works pretty well. To our knowledge, this is the first successful case of driving policy trained by reinforcement learning that can adapt to real world driving data. |
Tasks | Autonomous Driving, Domain Adaptation, Image-to-Image Translation, Synthetic-to-Real Translation, Transfer Learning |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.03952v4 |
http://arxiv.org/pdf/1704.03952v4.pdf | |
PWC | https://paperswithcode.com/paper/virtual-to-real-reinforcement-learning-for |
Repo | https://github.com/lh-wang/ACC |
Framework | tf |
Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets
Title | Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets |
Authors | Rotem Dror, Gili Baumer, Marina Bogomolov, Roi Reichart |
Abstract | With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous setups. However, such multiple comparisons pose significant challenges to traditional statistical analysis methods in NLP and can lead to erroneous conclusions. In this paper, we propose a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks. We discuss the theoretical advantages of this framework over the current, statistically unjustified, practice in the NLP literature, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction. |
Tasks | Dependency Parsing, Sentiment Analysis |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09500v1 |
http://arxiv.org/pdf/1709.09500v1.pdf | |
PWC | https://paperswithcode.com/paper/replicability-analysis-for-natural-language |
Repo | https://github.com/rtmdrr/replicability-analysis-NLP |
Framework | none |
Snorkel: Rapid Training Data Creation with Weak Supervision
Title | Snorkel: Rapid Training Data Creation with Weak Supervision |
Authors | Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré |
Abstract | Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10160v1 |
http://arxiv.org/pdf/1711.10160v1.pdf | |
PWC | https://paperswithcode.com/paper/snorkel-rapid-training-data-creation-with |
Repo | https://github.com/HazyResearch/metal |
Framework | pytorch |
Robust Physical-World Attacks on Deep Learning Models
Title | Robust Physical-World Attacks on Deep Learning Models |
Authors | Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, Dawn Song |
Abstract | Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier. |
Tasks | |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08945v5 |
http://arxiv.org/pdf/1707.08945v5.pdf | |
PWC | https://paperswithcode.com/paper/robust-physical-world-attacks-on-deep |
Repo | https://github.com/evtimovi/robust_physical_perturbations |
Framework | tf |
Large-scale Cloze Test Dataset Created by Teachers
Title | Large-scale Cloze Test Dataset Created by Teachers |
Authors | Qizhe Xie, Guokun Lai, Zihang Dai, Eduard Hovy |
Abstract | Cloze tests are widely adopted in language exams to evaluate students’ language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck. |
Tasks | Language Modelling |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03225v3 |
http://arxiv.org/pdf/1711.03225v3.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-cloze-test-dataset-created-by |
Repo | https://github.com/qizhex/Large-scale-Cloze-Test-Dataset-Designed-by-Teachers |
Framework | pytorch |
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Title | LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes |
Authors | Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake |
Abstract | Deep neural network (DNN) architectures have been shown to outperform traditional pipelines for object segmentation and pose estimation using RGBD data, but the performance of these DNN pipelines is directly tied to how representative the training data is of the true data. Hence a key requirement for employing these methods in practice is to have a large set of labeled data for your specific robotic manipulation task, a requirement that is not generally satisfied by existing datasets. In this paper we develop a pipeline to rapidly generate high quality RGBD data with pixelwise labels and object poses. We use an RGBD camera to collect video of a scene from multiple viewpoints and leverage existing reconstruction techniques to produce a 3D dense reconstruction. We label the 3D reconstruction using a human assisted ICP-fitting of object meshes. By reprojecting the results of labeling the 3D scene we can produce labels for each RGBD image of the scene. This pipeline enabled us to collect over 1,000,000 labeled object instances in just a few days. We use this dataset to answer questions related to how much training data is required, and of what quality the data must be, to achieve high performance from a DNN architecture. |
Tasks | 3D Reconstruction, Pose Estimation, Semantic Segmentation |
Published | 2017-07-15 |
URL | http://arxiv.org/abs/1707.04796v3 |
http://arxiv.org/pdf/1707.04796v3.pdf | |
PWC | https://paperswithcode.com/paper/labelfusion-a-pipeline-for-generating-ground |
Repo | https://github.com/RobotLocomotion/LabelFusion |
Framework | tf |
Fast Reading Comprehension with ConvNets
Title | Fast Reading Comprehension with ConvNets |
Authors | Felix Wu, Ni Lao, John Blitzer, Guandao Yang, Kilian Weinberger |
Abstract | State-of-the-art deep reading comprehension models are dominated by recurrent neural nets. Their sequential nature is a natural fit for language, but it also precludes parallelization within an instances and often becomes the bottleneck for deploying such models to latency critical scenarios. This is particularly problematic for longer texts. Here we present a convolutional architecture as an alternative to these recurrent architectures. Using simple dilated convolutional units in place of recurrent ones, we achieve results comparable to the state of the art on two question answering tasks, while at the same time achieving up to two orders of magnitude speedups for question answering. |
Tasks | Question Answering, Reading Comprehension |
Published | 2017-11-12 |
URL | http://arxiv.org/abs/1711.04352v1 |
http://arxiv.org/pdf/1711.04352v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-reading-comprehension-with-convnets |
Repo | https://github.com/felixgwu/FastFusionNet |
Framework | pytorch |
Detection-aided liver lesion segmentation using deep learning
Title | Detection-aided liver lesion segmentation using deep learning |
Authors | Miriam Bellver, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Xavier Giro-i-Nieto, Jordi Torres, Luc Van Gool |
Abstract | A fully automatic technique for segmenting the liver and localizing its unhealthy tissues is a convenient tool in order to diagnose hepatic diseases and assess the response to the according treatments. In this work we propose a method to segment the liver and its lesions from Computed Tomography (CT) scans using Convolutional Neural Networks (CNNs), that have proven good results in a variety of computer vision tasks, including medical imaging. The network that segments the lesions consists of a cascaded architecture, which first focuses on the region of the liver in order to segment the lesions on it. Moreover, we train a detector to localize the lesions, and mask the results of the segmentation network with the positive detections. The segmentation architecture is based on DRIU, a Fully Convolutional Network (FCN) with side outputs that work on feature maps of different resolutions, to finally benefit from the multi-scale information learned by different stages of the network. The main contribution of this work is the use of a detector to localize the lesions, which we show to be beneficial to remove false positives triggered by the segmentation network. Source code and models are available at https://imatge-upc.github.io/liverseg-2017-nipsws/ . |
Tasks | Computed Tomography (CT), Lesion Segmentation, Medical Image Segmentation |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.11069v1 |
http://arxiv.org/pdf/1711.11069v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-aided-liver-lesion-segmentation |
Repo | https://github.com/Sempronius/LITS_test |
Framework | tf |
Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms
Title | Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms |
Authors | Wenpeng Yin, Hinrich Schütze |
Abstract | In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text t^x. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text t^x that are distant or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs. |
Tasks | Natural Language Inference, Representation Learning, Sentiment Analysis |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00519v2 |
http://arxiv.org/pdf/1710.00519v2.pdf | |
PWC | https://paperswithcode.com/paper/attentive-convolution-equipping-cnns-with-rnn |
Repo | https://github.com/kenkenling/NLP |
Framework | pytorch |
Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation
Title | Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation |
Authors | Lei Cai, Hongyang Gao, Shuiwang Ji |
Abstract | Variational auto-encoder (VAE) is a powerful unsupervised learning framework for image generation. One drawback of VAE is that it generates blurry images due to its Gaussianity assumption and thus L2 loss. To allow the generation of high quality images by VAE, we increase the capacity of decoder network by employing residual blocks and skip connections, which also enable efficient optimization. To overcome the limitation of L2 loss, we propose to generate images in a multi-stage manner from coarse to fine. In the simplest case, the proposed multi-stage VAE divides the decoder into two components in which the second component generates refined images based on the course images generated by the first component. Since the second component is independent of the VAE model, it can employ other loss functions beyond the L2 loss and different model architectures. The proposed framework can be easily generalized to contain more than two components. Experiment results on the MNIST and CelebA datasets demonstrate that the proposed multi-stage VAE can generate sharper images as compared to those from the original VAE. |
Tasks | Image Generation |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07202v1 |
http://arxiv.org/pdf/1705.07202v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-stage-variational-auto-encoders-for |
Repo | https://github.com/divelab/msvae |
Framework | tf |