EMBL
CAIPIRINI

Home

FAQs

About


Set A (Please, input a list of relevant/interesting examples) :

Examples: #1, #2, #3, #4.
The topic of interest is defined via:

Gene Ids (Entrez/Ensembl) [*]
PubMed Ids [*]
PubMed Query [#]

[*] One id per line (up to 25000).

[#] Up to 25000 retrieved results.

Note: The abstracts extracted from Set A will be used as positive training set. Thus, Caipirini strongly suggests that Set A is carefully selected.

Set B (Please, enter a background set) :

Please, specify the input type:

Gene Ids (Entrez/Ensembl) [*]
PubMed Ids [*]
PubMed Query [#]
Random

[*] One id per line (up to 25000).

[#] Up to 25000 retrieved results.

Hint: While Set B is typically a listing of non-relevant or uninteresting examples, Caipirini suggests that the background (interchaneagbly called reference) set should consist of more specific examples with which Set A should be compared with.

Note: When selecting the option Random, Set B will consist of a list of abstracts (other than Set A but equal in number) randomly selected from PubMed; these abstracts will be considered as not interesting examples and will be used as negative training set. In this case, it is possible that abstracts trully interesting may be sampled and considered as not relevant examples. Thus, Caipirini strongly suggests that Set B is more carefully selected.

Note: While usually Set B is a background set with irrelevant examples, it is possible that for some users Set B refers to a specific context; in these cases, Caipirini is good for performing a two-class separation of Set C into abstracts more relevant to Set A & abstracts more relevant to Set B.

Set C (Please, define the abstracts to be classified) :

Please, specify the input type:

Remaining PubMed [+]
PubMed Ids [*]
PubMed Query [#]

[+] Only the remaining abstracts (i.e., all other abstracts that are not in Set A or B) that contain terms from the Training Sets A & B.

[*] One id per line (up to 25000).

[#] Up to 25000 retrieved results.

Hint: When the user knows what the results should be, then Set C can be considered as test set; in any case, the abstracts of Set C will be ranked according to similarity with Sets A & B, based on trained linear-SVM results.

Note: When Set B is a background set with irrelevant examples, then Caipirini will rank Set C and the top abstracts will be related to Set A.

Note: When Set B consists of a second set of relevant/interesting examples (other than Set A), Caipirini will separate Set C in two-classes; highly ranked abstracts will be related to Set A whereas smaller scores will indicate relevance to Set B.

Term Types (Please, select semantic categories to be used for training) :
Organisms Gene/Proteins Small Molecules Diseases Symptoms
Note: The default is that all terms mentioned in the abstracts will be used. Please, unselect certain types to specify otherwise.

Data Description (Please, enter a name or title for the task; optional) :