什么是查准率和查全率(precision and recall)













查准率 = 提取出的正确信息条数 / 提取出的信息条数

查全率 = 提取出的正确信息条数 / 样本中的信息条数



F = (b^2 + 1) * PR / b^2P + R

其中:b 是一个预设值,是P和R的相对权重,b大于1时表示P更重要,b小于1时表示R更重要。通常设定为1,表示二者同等重要。


在文本分类领域,查准率和查全率还可以用来衡量文本分类器的性能。例如,在观点挖掘(opinion mining)领域,衡量分类器识别出正面观点(positive opinion)的性能:

查准率 = 识别出的真正的正面观点数 / 所有的识别为正面观点的条数

查全率 = 识别出的真正的正面观点数 / 样本中所有的真正正面观点的条数


In a statistical classification task, the Precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been).

In a classification task, a Precision score of 1.0 for a class C means that every item labeled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labeled correctly) whereas a Recall of 1.0 means that every item from class C was labeled as belonging to class C (but says nothing about how many other items were incorrectly also labeled as belonging to class C).

在观点挖掘领域还有一个有趣的应用(参看 Bing Liu, "Sentiment Analysis and Subjectivity")

One of the bottlenecks in applying supervised learning is the manual effort involved in annotating a large number of training examples. To save the manual labeling effort, a bootstrapping approach to label training data automatically is reported in [80, 81]. The algorithm works by first using two high precision classifiers (HP-Subj and HP-Obj) to automatically identify some subjective and objective sentences. The high-precision classifiers use lists of lexical items (single words or n-grams) that are good subjectivity clues. HP-Subj classifies a sentence as subjective if it contains two or more strong subjective clues. HPObj classifies a sentence as objective if there are no strongly subjective clues. These classifiers will give very high precision but low recall. The extracted sentences are then added to the training data to learn patterns. The patterns (which form the subjectivity classifiers in the next iteration) are then used to automatically identify more subjective and objective sentences, which are then added to the training set, and the next iteration of the algorithm begins.

