The chemoinformatic methods used in building QSAR models can be divided into three groups:
(i) Isolating descriptors from molecular structure.
(ii) Choosing those informative in the context of the analyzed activity and
(iii) Finally, using the values of the descriptors as independent variables to define a mapping that correlates them with the activity in question.

2.1. Generation of Molecular Descriptors from Structure
The small-molecule compounds are defined by their structure, encoded as a set of atoms and covalent bonds between them.
First, the biological activity should not be clearly stated, this information has to be obtained from the structure. Various rationally designed molecular descriptors make more noticeable different chemical properties clearly mentioned in the structure of the molecule. Only those properties may correlate more directly with the activity. Such properties range from physicochemical and quantum-chemical to geometrical and topological features.
The second reason stated use and development of molecular descriptors, stems from the paradigm of feature space prevailing in statistical data analysis. Most methods employed to predict the activity requires as input numerical vectors of features of uniform length for all molecules. Chemical structures of compounds are diverse in size and nature and as such do not fit into this model directly. To find a way this problem, molecular descriptors convert the structure to the form of well-defined sets of numerical values.

2.2. Selection of Relevant Molecular Descriptors
Many applications are capable of generating hundreds or thousands of different molecular descriptors. Large number of descriptors also effects interpretability of the final model. This has the negative effects on several aspects of QSAR analysis. To handle these problems, a wide range of methods for automated narrowing of the set of descriptors to the most informative ones are used in QSAR analysis.

2.3. A diagrammatic representation of the Descriptors to Activity
After the calculation and selection of relevant molecular descriptors, the final task of creating a function among their values and the analyzed activity remains.These include linear or non-linear ones, and many methods for carrying out the training to obtain the best function.


3.1. MOLECULAR DESCRIPTORS: Properties of molecules are characterized by numerical values called molecular descriptors. Molecular descriptors encoded structural features of molecules as numerical descriptors. Vary in complexity of encoded information and in compute time. These are truly structural descriptors because they are based only on the two-dimensional representation of a chemical structure.

Molecular descriptors can be of diverse types. It is categorized into fragment descriptors, involving properties of sections of molecules, and whole molecule descriptors, based on the properties of the intact molecule.

3.1.1. FRAGMENT DESCRIPTORS: The very earliest descriptors used in QSAR were of this type. QSAR was performed using substituent constants such as hydrophobic constants π, molar refractivity MR, Hammett constants σ and several other, less well-known constants. The recent explosion in the number of molecular descriptors to a certain degree is due to the ease by which they may be generated by computational methods, such as molecular orbital calculations [20-22]. There has also been a focus on developing fragment descriptors that are very computationally efficient. The reason is that rapid searching for leads in large chemical libraries (databases of real chemical compounds) or virtual libraries (databases of chemically reasonable molecules that have not yet been synthesized) require efficient information-rich descriptors. Surprisingly simple descriptors can yield useful models. For example, molecules may be represented simply by counting the numbers of atoms of specific elemental type, with specific numbers of connections (a measure of atomic hybridization). A current trend is to employ fragment descriptors based on important molecular properties such as hydrophobic (e.g. Aromatic rings), hydrogen bond donors (e.g. amines), hydrogen bond acceptors (e.g. carbonyls), positive charges (e.g. NH4+) and negative charges (e.g. PO3-). The rational for this was first described by Andrews and coworkers [23]. Other fingerprint and general fragment based methods such as molecular holograms [24, 25] generalize this approach of breaking molecules into fragments. Another important class of fragment-based descriptors, the Vander wall’s surface area descriptors (VSA) has been reported by Labute to have attributes that make them widely applicable QSAR descriptors [26]. VSA descriptors are derived by adding together the vander walls surface area contributions of atoms exhibiting a given property (chosen from steric, electrostatic and lipophilic properties) within a given binned property range. Linear combinations of VSA descriptors correlate well with most commonly used descriptors. Fragment- based descriptors have advantages of being computationally efficient and independent of molecular confirmation or 3D structure.

3.1.2. WHOLE MOLECULE DESCRIPTORS:They typically capture information on molecular size and lipophilicity through properties such as the molecular weight or molecular volume and log of the octanol-water partition coefficient (log P). The relationship between log P and some biological responses was often inverse parabolic, in which a maximum in the biological response occurred at some optimum log P value. The explanation for this relationship was that it described the partitioning of drug molecules into biological membranes. An important class of whole molecule descriptors is the topological descriptors [27-31].These involve treating molecules as topological objects where atoms become the vertices, and bonds the edges, of a molecular graph.

Figure shows the conversion of a molecular structure into a molecular graph. Three- dimensional structure (left), two-dimensional, hydrogen suppressed structure (centre) and hydrogen- suppressed molecular graph (right).

Topological indices are 2D descriptors based on graph theory concepts. These indices have been widely used in QSAR studies. They help to differentiate the molecules according mostly to their size, degree of branching, flexibility, and overall shape. The most widely known descriptors are those that were originally proposed by Randic [32] and extensively developed by Kier and Hall [33]. The strength of this approach is that the required information is embedded in the hydrogen- suppressed framework and thus no experimental measurements are needed to define molecular connectivity indices. For each bond the Ckterm is calculated. The summation of these terms then leads to the derivation of X, the molecular connectivity index for the molecule.

Ck = (δi δj)-0.5 where δ = σ - h

δ is the count of formerly bonded carbons and his the number of bonds to hydrogen atoms.To correct for differences in valence, Kier and Hall proposed a valence delta (δv)term to calculate valence connectivity indices [34].

Molecular connectivity indices have been shown to be closely related to many physicochemical parameters such as boiling points, melting points, dipole moment, solubility, molar refraction, polarizability, and partition coefficients [35, 36].

Recently, descriptors derived from Eigen values of molecular matrices derived from graphs have shown promise in generating descriptors useful for QSAR [37-39] and for molecular diversity purposes (e.g., characterization of chemical libraries and databases, and for design of optimally diverse combinatorial libraries. Modified adjacency matrices describe how atoms in a molecule are connected. They provide a means of combining the molecular properties with topological information encoding the way a molecule is connected.

Figure below is showing an example of a modified adjacency matrix. Diagonalisation of these matrices provides Eigen value descriptors. A modification of this Eigen value approach has been particularly useful in the description of molecular diversity (dissimilarity between molecules).

Conversion of molecule into an adjacency matrix. Off-diagonal elements are 1 if the two atoms are bonded, 0 if not.


SUBMIT YOUR ARTICLE/PROJECT AT articles@pharmatutor.org

Subscribe to PharmaTutor Alerts by Email