|
BLAST Search Help
What is BLAST
The Basic Local Alignment Search Tool (BLAST) allows
rapid sequence comparison that optimizes the high-scoring segment pair (HSP), a measure of local
similarity. For more information visit
NCBI-BLAST.
BLAST Search
Format
The BLAST 'Search' input area accepts a number of
different types of input and automatically determines
the format. To enable this feature there are certain
conventions required with regard to the input of
unique identifiers (e.g., UniProtKB accession numbers).
Accepted input types are:
-
Unique Sequence
Identifiers.
BLAST allows search via UniProt database unique identifiers. These
include UniProtKB, UniRef, or UniParc IDs (e.g., CYC_HUMAN, P00001, UniRef100_P99999,
UniRef90_P99999, UniRef50_P99999, UPI0000128BBF).
For more examples, please see the
complete table of searchable UniProtKB
database unique identifiers. Spaces between letters in the
input are not allowed, although spaces before or
after the identifier are allowed.
-
Raw amino acid sequence
format:
MSEPQRLFFAIDLPAEIREQIIHWRATHFPPEAGRPVAADNLHLTLAFLGEVS
AEKEKALSLLAGRIRQPGFTLTLDDAGQWLRSRVVWLGMRQPPRGLIQLAN
MLRSQAARSGCFQSNRPFHPHITLLRDASEAVTIPPPGFNWSYAVTEFTLYA
SSFARGRTRYTPLKRWALTQ
The sequence can be interspersed with
numbers and/or spaces, such as
1 msepqrlffa idlpaeireq iihwrathfp peagrpvaad nlhltlaflg evsaekekal
61 sllagrirqp gftltlddag qwlrsrvvwl gmrqpprgli qlanmlrsqa arsgcfqsnr
121 pfhphitllr daseavtipp pgfnwsyavt eftlyassfa rgrtrytplk rwaltq
-
FASTA format:
A sequence in FASTA format begins with a
single-line description, followed by lines of
sequence data. The description line is
distinguished from the sequence data by a
greater-than (">") symbol in the first
column. One benefit of using FASTA format is that
the sequence identifier will be reported with the
results. An example sequence in FASTA format
is:
>gi|3287971|sp|P37025|LIGT_ECOLI 2'-5' RNA ligase
MSEPQRLFFAIDLPAEIREQIIHWRATHFPPEAGRPVAADNLHLTLAFLGEVSAEK
EKALSLLAGRIRQPGFTLTLDDAGQWLRSRVVWLGMRQPPRGLIQLANMLRSQA
ARSGCFQSNRPFHPHITLLRDASEAVTIPPPGFNWSYAVTEFTLYASSFARGRTRY
TPLKRWALTQ
Sequences are expected to be represented in
the standard IUB/IUPAC amino acid and codes,
with these exceptions: lower-case letters are
accepted and are mapped into upper-case; U and
* are acceptable letters (see below).
Numerical digits in the query sequence are
automatically removed. The amino acid sequence
codes supported are:
A alanine B aspartate or asparagine C cysteine D aspartate E glutamate F phenylalanine G glycine H histidine I isoleucine K lysine L leucine M methionine N asparagine P proline Q glutamine R arginine S serine T threonine U selenocysteine V valine W tryptophan Y tyrosine Z glutamate or glutamine X any * translation stop
Databases Available for BLAST and Text Search
There are three protein databases that can be
searched using the BLAST program as well as the Text Search Form. These databases
are:
UniProt Knowledgebase (UniProtKB)
UniProtKB is the central
access point for extensive curated protein information,
including function, classification, and cross-references.
Search UniProtKB to retrieve "everything that is known"
about a particular sequence. You can query UniProtKB using
Text Search or BLAST search. More...
Download UniProtKB
UniProt Reference Clusters(UniRef) Databases
The UniRef databases provide clustered sets of sequences from UniProt
Knowledgebase and
selected UniParc records.
If you are interested in searching UniRefs that do not contain UniParc
sequences, choose the UniRef-no UniParc option from the drop-down menu
Select a database to search.
More...
Download UniRef
UniProt Archive (UniParc)
UniParc contains available protein sequences
collected from many different sources. The sequence data are archived
to facilitate examination of changes to sequence data. Search UniParc
if you want to examine the "history" of a particular sequence. You can
query UniParc using Text Search or BLAST search.
More...
BLAST Options
-
Composition-based
Statistics
BLAST permits calculated E-values to take
into account the amino acid composition of the
individual database sequences involved in
reported alignments. This improves E-value
accuracy, thereby reducing the number of false
positive results.
The improved statistics are achieved with a
scaling procedure that in effect employs a
slightly different scoring system for each
database sequence. As a result, raw BLAST
alignment scores in general will not
correspond precisely to those implied by any
standard substitution matrix. Furthermore,
identical alignments can receive different
scores, based upon the compositions of the
sequences they involve.
-
Filter
Low-complexity
Masks off segments of the query sequence that
have low compositional complexity, as
determined by the SEG program of Wootton &
Federhen (Computers and Chemistry, 1993). Filtering can
eliminate statistically significant but
biologically uninteresting reports from the
BLAST output (e.g., hits against common
acidic-, basic- or proline-rich regions),
leaving the more biologically interesting
regions of the query sequence available for
specific matching against database sequences.
Filtering is only applied to the query
sequence, not to
database sequences.
Mask for Lookup Table Only
This option masks only for purposes of
constructing the lookup table used by BLAST.
BLAST searches consist of two phases, finding
hits based upon a lookup table and then
extending them. The option to "Mask for lookup
table only" masks only for the lookup table so
that no hits are found based upon
low-complexity sequence. The BLAST extensions
are performed without masking and so they can
be extended through low-complexity sequence.
This option is still experimental and may
change in the near future.
Mask Lower Case
With this option selected you can cut and
paste a FASTA sequence in upper case
characters and denote areas you would like
filtered with lower case. This allows you to
customize what is filtered from the sequence
during the comparison to the BLAST
databases.
-
Expect
The Expect threshold establishes a
statistical significance threshold for
reporting database sequence matches. The
default value is 10, meaning that 10 matches
are expected to be found merely by chance.
Lower Expect thresholds are more stringent,
leading to fewer chance matches being
reported. Increasing the expected threshold
shows less stringent matches and is
recommended when performing searches with
short sequences as a short query is more
likely to occur by chance in the database than
a longer one, so even a perfect match (no
gaps) can have low statistical significance
and may not be reported. Increasing the
Expect threshold allows you to look farther
down in the hit list and see matches that
would normally be discarded because of low
statistical significance.
-
Word Size
The word size indicates the length of the initial sequence that
must be matched between the database and the
query sequence.
-
Matrix
A key element in evaluating the quality of a pairwise
sequence alignment is the "substitution matrix", which
assigns a score for aligning any possible pair of residues.
The matrix used in a BLAST search can be changed
depending on the type of sequences you are searching
with. The user may choose from a list of matrices that cover
various evolutionary constraints (more information can be
found in a description of BLAST scoring matrices).
For each matrix, a default matrix-dependent gap cost is
displayed. Gap costs are described below.
-
Matrix-dependent Gap
Cost
The pull down menu shows the Gap Costs
(penalty to open gap and penalty to extend
gap). There are a limited number of
options for these parameters. Increasing the
Gap Costs will result in alignments that
decrease the number of Gaps introduced. The
gap open penalty is the score taken away for
the initiation of a gap in a sequence. To make
the match more significant the user can try making the gap penalty larger. The gap extension
penalty is added to the gap open
penalty for each residue in the gap,
effectively penalizing longer gaps. If the user
does not like long gaps, they can increase the
extension gap penalty. Usually one would
expect a few long gaps rather than many short
gaps, so the gap extension penalty should be
lower than the gap penalty. An exception is
where one or both sequences are single reads
with possible sequencing errors, in which case
you would expect many single base gaps. The user
can get this result by setting the gap open
penalty to zero (or very low) and using the
gap extension penalty to control gap
scoring.
-
Adjust Gap Costs
Alignments between sequences are often optimized by
allowing gaps within one or both sequences. Like mismatches
between aligned residues, gaps have a "cost" associated with
them. There are separate penalties to open and to extend gaps.
Increasing the Gap Costs will result in alignments that decrease
the number and size of Gaps introduced. The Gap Open cost
(or Gap Existence cost) is the score taken away for the initiation
of a gap in a sequence. To make the match more significant the
user can try making this gap penalty larger. The Gap Extend
cost is added to the Gap Open cost for each residue in the gap,
effectively penalizing longer gaps. The user can therefore select
against long gaps by increasing this penalty. Usually one would
expect a few long gaps rather than many short gaps, so the Gap
Extend cost should be lower than the Gap Open cost. The Gap
Costs can be adjusted relative to the default value using the pull
down menu.
-
Number of Hits to Display
Restricts the number of BLAST hits of
matching sequences that will be
reported.
-
Alignment
Aligns your query sequence and database
matches in pairs. Matches are connected with a
"|" symbol. Mismatches are opposed with a
space. Gaps are introduced with a "-"
symbol.
References
NCBI BLAST Help Pages.
Altschul, SF, Madden, TL, Schaffer, AA, Zhang, J,
Zhang, Z, Miller, W, and DJ Lipman (1997) Gapped BLAST
and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res.
25(17):3389-402.
Altschul, S. F., M. S. Boguski, W. Gish and J. C.
Wootton (1994) Issues in searching molecular sequence
databases. Nature Genetics 6:119-129.
Pearson WR. (1991). Searching protein sequence
libraries: comparison of the sensitivity and
selectivity of the Smith-Waterman and FASTA
algorithms. Genomics 1991 Nov;11(3):635-50.
Altschul, Stephen F., Warren Gish, Webb Miller,
Eugene W. Myers, and David J. Lipman (1990) Basic
local alignment search tool. J. Mol. Biol.
215:403-10.
Additional questions If you have additional questions
please contact
UniProt help.
BLAST Result Help

BLAST Result Main Sections
1. Display Options - Organizes the type and order of information displayed
2. Save Options - Allows user to save chosen results in different formats
3. Tools - Performs BLAST or multiple alignment on chosen results
4. Result Contents - Default BLAST search result information
|
Sort Table By
Results can be sorted by the values in any column. The
default sort is by Score.
Limit display
The limit display function shows only those records from
the BLAST results that fit the indicated criteria.
Save Search Results As
Search results can be saved to the user's local computer.
The results will be saved for selected entries or, if no proteins
are selected, for all entries. Clicking "Table" will save the results
as a tab-delimited text file, which may be imported into a
spreadsheet for easier viewing or analysis. Clicking "FASTA"
will save the query and target IDs and sequences in FASTA format.
Clicking "Alignment" will save the BLAST and SSearch results
as a text file. The BLAST and SSearch sections are headed
by the titles "Sequence Alignments Generated by BLAST
Search" and "Sequence Alignments Generated by Similarity
Search" respectively.
Tools: BLAST and Multiple Alignment
Retrieved entries can be further analyzed using the
sequence analysis programs available in the Tools section
of the results page (section 3 in the figure above).
For BLAST, select one protein using the
checkboxes on the left side of the results table,
then click BLAST. A new BLAST query page will be
displayed, along with whatever parameters were
selected in the initial search.
For multiple alignment, check at least 2 proteins (but no
more than 50), then click the Multiple Alignment
button. A ClustalW generated
multiple alignment and neighbor-joining tree will be
generated.
Results Display
Results of the search are displayed in a table. The exact columns
displayed will depend on the database selected, but all will
display an ID column, a protein length column, two SSearch columns
and three BLAST Search columns. UniRef and UniProtKB
searches will also show the protein name and organism.
The description below refers to the columns displayed
for UniProtKB search results.
ID/Accession
The UniProtKB ID refers to the record identifier, while the
accession number refers to the sequence identifier. Each record
may contain multiple accessions (for example, combined sequences
in the UniRef databases). This column displays the primary
accession. See the FAQ
for more information regarding IDs vs. Accessions.
When you BLAST against UniRefs in the BLAST results page you will
retrieve either UniRef-UniProtKB (UniRef100_P99999) sequences or
UniRef-UniParc (UniRef100_UPI000011E43A) sequences. UniRef-UniParc
sequences do not have any protein or organism name attached to it.
Protein Name
The common or trivial name given to a protein that
identifies its function or specifies its features.
Organism
The genus and species of the source organism from which
the sequence originated.
Length
Number of amino acids in the peptide or protein.
SSearch Columns
SSearch is a pairwise implementation of the Smith-Waterman
alignment algorithm. When two sequences are aligned, only the shared
region is shown. Within the shared region, amino acids
from one or both sequences can be aligned with either
amino acids or gaps from the other sequence. The total
length of the shared region, including gaps, is the Overlap.
The percent of identical residues in the alignment is given
under %iden. Clicking on that number displays the SSearch
full-length alignment.
BLAST Sequence Similarity Columns
There are three columns related to BLAST results. The
column in the extreme right shows the BLAST results in
graphical format. The top bar represents the query
sequence. The bars below show the region(s) on other
sequences matched to the query sequence. The bar color
indicates the magnitude of the BLAST score. Click on
one of these bars to see its BLAST alignment paired
with the query sequence.
Text Search Help
Select a UniProt Database
Depending on the route by which the Text Search page
was accessed, you may need to select a
database to search.
The default database is UniProtKB
Selecting a Field
Searchable fields can be selected from a dropdown menu.
The items will vary according to database selected, since
each database contains different types of information.
The most comprehensive is the UniProt Knowledgebase (UniProtKB),
and its dropdown selections have been customized to
facilitate searches. The following paragraph describes the
UniProtKB dropdown items.
Other than the "Any Field" and "All Unique IDs" choices,
items are grouped by Main Category (the category names are shown
in red on the dropdown menu). Each Main Category has multiple
subcategories (shown in black on the dropdown menu). Certain
subcategories themselves represent multiple specific fields of
information (see the table below). For example, selecting
"Class UIDs" will attempt to match the query against
multiple types of domain, motif, or classification unique
identifiers.
Examples of UniParc and
UniRef searchable fields and ID are also found in their
corresponding searchable fields tables below.
Query Input
Enter a unique identifier or other search string in the
box provided. Certain items (such as "Length" or anything
with "ID") will be exact match searches. Other items will be
substring searches (as if preceded and followed by wild cards).
Entering "not null" will cause the search to return only
those entries that have some data in the selected field, while
entering "null" will return only those that lack
data in the selected field.
Add Input Box Button
If desired, multiple fields can be searched simultaneously.
Pressing the "Add input box" button adds another query line,
up to a maximum of 10 lines. The added input boxes are
connected by logical operator choices (see below), with the
default being the Add operator.
Using the Logical Operators AND, OR, NOT
Text Search supports the logical 'AND', 'OR', and
'NOT' operators. For example, to retrieve
results that include Pfam domains A or B, type A in
your first query field and add a query line by clicking
the "Add input box" button. Enter your
2nd query (B) and select the OR operator.
Similarly, to retrieve multi-domain proteins that have
both Pfam domains A and B, use the 'AND'
operator. Proteins that have domain A and not domain B
can be retrieved using the 'NOT' operator.
Number of Results
The default number of results shown on any one page
is 50, to provide the fastest results. However, this
number can be adjusted by clicking on the
Display Options button from within
the results page.
Search Categorizations and Unique Identifiers
The following tables indicate the searchable
fields for UniProtKB,
UniParc, UniRef
and Batch Retrieval search functions and
corresponding sample unique identifiers.
Searchable UniProtKB Fields and Identifiers
The following table indicates the searchable fields and
the corresponding sample unique identifiers for
the UniProtKB database, for performing Text and BLAST
searches.
Searchable UniProtKB Fields and Identifiers
The following table indicates the searchable fields and
the corresponding sample unique identifiers for
the UniProtKB database, for performing Batch Retrieval
.
Searchable UniParc Fields and Identifiers
The following table indicates the searchable fields and
the corresponding sample unique identifiers for
the UniParc database, for performing Text and Batch Retrieval
searches.
| Category | Example |
| UniParc ID | UPI0000037924 |
| Length | 990<(aa)<=1000 |
| Checksum | 63F249C54CE90737 |
| EMBL ID | AAA56753 |
| ENSEMBL ID | ENSP00000259755 |
| ENSEMBLCBRIGGSAE ID | ENSCBRP00000013291 |
| ENSEMBLCELEGANS ID | F48E3.3 |
| ENSEMBLFLY ID | CG8713-PA |
| ENSEMBLFUGU ID | SINFRUP00000164896 |
| ENSEMBLHUMAN ID | ENSP00000259755 |
| ENSEMBLMOSQUITO ID | ENSANGP00000022463 |
| ENSEMBLMOUSE ID | ENSMUSP00000057690 |
| ENSEMBLRAT ID | ENSRNOP00000000393 |
| ENSEMBLZEBRAFISH ID | ENSDARP00000014354 |
| EMBLWGS ID | EAA29811 |
| EPOID | AX090287 |
| FLYBASE ID | CG8713-PA |
| IPI ID | IPI00218934 |
| JPO ID | BD533907 |
| PDB ID | 1FRY |
| PIR ID | T09415 |
| PIRARC ID | A55135 |
| REFSEQP ID | NP_005851 |
| REMTREMBL ID | AAB35171 |
| SPTREMBL ID | Q40315 |
| SPTREMBLSplicevariantID | Q9UBR5-1 |
| SWISSPROTID | P01032 |
| SWISSPROTSplicevariant ID | Q9Y2G2-1 |
| TREMBLNEW ID | AAQ89276 |
| USPO ID | AAN99133 |
| WORMBASE ID | R10E11.2 |
Searchable UniRef Fields and Identifiers
The following table indicates the searchable fields and
the corresponding sample unique identifiers for
the UniRef database, for performing Text and Batch Retrieval
searches.
| Category | Example |
| Common name | human |
| Checksum | 289B4B554A61870E |
| Length | 990<(aa)<=1000 |
| Lineage | Homo |
| NCBI Taxon ID | 9606 |
| Organism name | Homo sapiens |
| Protein name | polymerase |
| PubMed ID | 12798038 |
| Taxon Group | Euk/mammal |
| Taxon Group ID | 40674 |
| UniParc ID | UPI0000037924 |
| UniProtKB Accession | P00001 |
| UniProtKB ID | CYC_HUMAN |
| UniRef ID | UniRef100_Q9H244 |
Text Search Result Help

Text Search Result Main Sections
1. Refine Search - Allows user to include more search parameters
2. Display Options - Organizes the type of information displayed
3. Save Options - Allows user to save chosen results in different formats
4. Tools - Performs BLAST or multiple alignment on chosen results
5. Result Contents - Default Text Search result information
|
Refining the search
The Refine Search area (section 1 in the figure above)
allows modification of your search or a new search.
See the Text Search Help for more
detailed information.
Changing Display
You can modify and improve the output of the
search result by choosing which columns to
display. Additional information can be shown by
highlighting the relevant field(s) in the "Columns not
in display" list and transferring it to the "Columns in display"
list via the ">>" button. Conversely,
columns can be removed from display. Click "Apply"
for the changes to take effect.
Save Search Results As
Search results can be saved to the user's local computer.
The results will be saved for selected entries or, if no proteins
are selected, for all entries. Clicking "Table" will save the displayed
columns as a tab-delimited text file, which may be imported into a
spreadsheet for easier viewing or analysis. Clicking "FASTA"
will save the IDs and sequences in FASTA format, while clicking "XML"
will save all the available information for each entry
in an XML format text file.
Tools: BLAST and Multiple Alignment
Retrieved entries can be further analyzed using the
sequence analysis programs available in the Tools section
of the results page (section 4 in the figure above).
For BLAST, select one protein using the
checkboxes on the left side of the results table,
then click BLAST. A new BLAST query page will be
displayed, along with whatever parameters were
selected in the initial search.
For multiple alignment, check at least 2 proteins (but no
more than 50), then click the Multiple Alignment
button. A ClustalW generated
multiple alignment and neighbor-joining tree will be
generated.
Results Display
Results of the search are displayed in a customizable table.
The exact columns displayed will depend on the database selected,
the fields searched for, and user preference. By default, UniProtKB
searches will display the ID/Accession, the Protein Name and its
Length, the Organism Name and its Taxonomic Group, and the Matched Fields.
UniRef searches will display the same as UniProtKB, except that
the ID/Accession column is replaced by the UniRef and
UniProtKB ID columns (the latter being the primary UniProt
entry for the UniRef entry). For Archive searches, the columns are
Archive ID, SPTREMBL ID, Length, sequence Checksum, and Matched Fields.
The following describe the default columns displayed
for UniProtKB searches.
ID/Accession
The UniProtKB ID refers to the record identifier, while the
accession number refers to the sequence identifier. Each record
may contain multiple accessions. This column displays the primary
accession. See the FAQ
for more information regarding IDs vs. Accessions.
Protein Name
The common or trivial name given to a protein that
identifies its function or specifies its features.
Length
Number of amino acid residues in the peptide or protein.
Organism
The genus and species of the source organism from which
the sequence was extracted.
Taxon Group
The taxonomic group to which the organism belongs.
For example, any Arabidopsis sequence will display
the Euk/Plant taxonomic group.
Matched Fields
The field(s) which the query matched.
Batch Retrieval Search Help
Batch retrieval allows you to retrieve
multiple entries from UniProt databases.
Multiple entry IDs should be separated
by lines or spaces. The maximum numbers of entries that you can retrieve depends on the type of entry, connection speed and server load, among others. To avoid excessive server load we recommend that you do not try to retrieve more than 2000 entries at a time.
You may use different types of IDs by choosing "Any Unique ID" in the ID field, but if your entries have the same type of ID, then define the ID field to speed up the retrieval process.
Follow the link for a list of searchable fields and
the corresponding unique identifiers.
Bibliography Submission Help
General submission information
The protein bibliography of the UniProtKB database
aims to provide comprehensive and categorized
information for each protein entry. The
bibliography information is highly cross-
referenced with other protein and literature
databases and will greatly facilitate information
retrieval on a structural and functional basis as
well as by other conventional searches. We invite
and encourage submission of any new or missing
information in all categories, published or
unpublished, from the scientific community on a
continuing basis. To submit bibliographical
information to the UniProtKB database, please use a
UniProtKB unique identifier (ID, accession number)
or search for UniProtKB IDs or accession numbers
using sequence or text searches. For new protein
sequences, please first submit the protein sequence
to UniProtKB and acquire a protein entry ID or
accession number before submitting bibliography
information.
For protein bibliography submission, please provide
author names, journal or book name, the year/volume/page
information, and the title or description of the
citation. Please also include your contact information
for submission verification and further correspondence.
For submission of detailed protein information in
different categories, please follow the instructions below.
Sample
Submission
How to describe and categorize protein information
For any protein entry listed in the UniProtKB database, please
provide additional protein information in any of the following
categories (at least one category of information is required)
and specify whether the information is derived directly from
experiments (Exp.) or is predicted (Pred.) from modeling,
inference, similarity, or any other analyses. The description
in the Description/Method field should be as concise as possible,
and is limited to 400 words or 4000 characters. For experimentally
determined protein information, please indicate whether obtained
from in vitro experiments or from experiments conducted on intact
organisms or in cultured cells, and also specify those organisms
or cell types.
- Protein name
Specify the name of the protein as a brief, free-text
description derived from structural features, function, protein
family or combinations. Indicate whether the naming was derived
from experimental evidence or from prediction through modeling
or comparison. For enzymes, please refer to the Nomenclature
Committee of the International Union of Biochemistry and Molecular
Biology for the recommended name and EC number. If no number has
been assigned, please give the partial number (e.g., 1.1.-.-) that
most accurately reflects what is known (or inferred) about the
activity of the enzyme. For names of other specialized proteins,
please follow standard names and those recommended by nomenclature
committees.
- Organism
Specify the organism in which the protein naturally occurs, i.e., the
organism in which the protein is genetically expressed. The organism
is also referred to as the biological source of the protein; in this
usage, it is the natural source of the protein rather than the
experimental source that is described.
- Genetic information
Specify information related to the gene expression of the protein
represented in the entry, which may include gene name, map position,
genome localization, gene origin, genetic code, start codon, and
introns.
- Tissue/cellular localization
Specify the tissues or cell types in which the native proteins are
present or expressed as determined by mRNA detection (e.g., in situ
hybridization, Northern blot), or protein detection (e.g., protein
isolation and purification, immunocytochemistry). Please also
specify what subcellular compartment the protein is localized
in or isolated from, and whether the protein is secreted from
or taken up into the cells.
- Structure
Provide information on 3D structures regarding conformation
(e.g., helix, loop, folding), physical contacts between
side chains and interactions with other molecules such
as substrates and ions as determined by X-ray crystallography.
- Features
It is highly recommended that information of the following
features is provided along with the specified residues
and their ranges.
- Product: any relatively stable (i.e., can be
isolated) peptide chain, including chains cleaved from
a precursor form and remaining bound together in the
same molecule.
- Domain: this depicts structural or evolutionary
domains. A "domain" carries the connotation of having some
degree of spatial coherence, i.e., secondary or tertiary
structure or is evolutionarily mobile.
- Region: this remains generally unstandardized
at this time to allow the annotation of new features
that are not yet well understood or standardized. A
"Region" should carry only the connotation of being
contiguous sequence, as opposed to the spatial connotation
of a "Domain".
- Bonds: including "cleavage sites" in activation
processing, "cross-link" when two or more residues form
a covalent bond through side chains, and the "disulfide bond".
- Sites: including active sites, binding sites,
modified sites, inhibitory sites.
- Molecular complex/interactions
Specify whether the protein forms complex structure with other
molecules such as proteins, DNA/RNA, carbohydrates and lipids.
Also specify what interactions the protein has with other
molecules, stable or transient, as its functional
requirements, such as activated by or activating other
signal transducers in a signal transduction pathway.
- Function
Describe any functionality pertaining to the protein. This
may include biological, biochemical, or pharmacological
functions at whole body, cellular, or molecular levels.
- Regulation
Specify how the protein is regulated or modulated by biological
and chemical agents including hormones, drugs, environmental,
and any other agents. The regulation may occur at gene, mRNA,
or protein levels.
- Degradation
Specify the cellular degradation pathway of the final protein
product, the degradation enzyme(s), and the protein turnover
rate.
- Disease-related information
Specify any disease conditions associated with genetic
mutations or with transcriptional, translational and/or
posttranslational alterations in primary sequence or
tertiary structure, or in content and expression pattern.
- Sequence
Please provide information on either partial or complete
sequences of protein, mRNA/cDNA, or gene/chromosome regarding
sequence corrections, reinterpretation, confirmation,
and variant, partial or fragmentary sequences.
- Other
Please add any interesting information that does not apparently
fall into above categories.
Bibliography Retrieval help
References in UniProt literature information are mapped to
individual UniProtKB protein entries. You may retrieve bibliographic
information by entering protein's UniProtKB ID or accession number.
|
This help page was last modified on
August 22, 2003. If you have any
comments or questions please contact
UniProt help.
|