Scientific paper

Web resource DiZyme is well described in scientific paper "DiZyme: Open-Access Expandable Resource for Quantitative Prediction of Nanozyme Catalytic Activity" in Small journal Wiley-VCH.
A user-friendly expandable database of >300 existing inorganic nanozymes is developed by data collection from >100 articles. Data analysis is performed to reveal the features responsible for catalytic activities of nanozymes, and new descriptors are proposed for its ML-assisted prediction. A random forest regression model for evaluation of nanozyme peroxidase activity is developed and optimized by correlation-based feature selection and hyperparameter tuning, achieving performance up to R2 = 0.796 for Kcat and R2 = 0.627 for Km. Experiment-confirmed unknown nanozyme activity prediction is also demonstrated. Moreover, the DiZyme expandable, open-access resource containing the database, predictive algorithm, and visualization tool is developed to boost novel nanozyme discovery worldwide.

Nanozymes activity

Nanozymes are defined as nanomaterials with enzyme-like characteristics. Among the currently existing nanozymes, the most common are nanozymes with peroxidase and oxidase activities. Other, more complex hydrolase, catalase, phosphatase, laccase, and superoxide dismutase activities start to appear but are much less presented in the literature. Several nanomaterials can have multi-enzymatic activity, which is usually due to a single reaction mechanism, peroxidase, oxidase, and catalase belong to the class of oxidoreductases that catalyze oxidation and reduction reactions. Due to the high stability, long storage time and stability under various conditions nanozymes have been extensively exploited in cancer theranostics, environmental protection, cytoprotection, biosensing, and other applications and of major attention is the ability to regulate the catalytic activity of nanomaterials by changing its composition, shape, size, crystal structure, as well as surface chemistry.
The catalytic activity of nanomaterials is subject to Michaelis-Menten kinetics and is standardly evaluated by the following kinetic parameters: Michaelis-Menten constant (Km, mM) - the substrate concentration required to achieve half the maximum enzyme rate, the catalytic rate constant Maximum reaction velocity (Vmax, mM/s) - the reaction rate when the enzyme is fully saturated by substrate, indicating that all the binding sites are being constantly reoccupied.


We calculated and selected essential parameters for predicting the catalytic activity of nanozymes, describing the composition as mean electronegativity, mean redox potential, mean charge density; material properties as crystal system; shape as dimensionality; size as volume; synthesis conditions as the presence/absence of a neutral polymer and surfactant; analysis conditions as pH, temperature, substrate type, H2O2, substrate and catalyst concentrations.
The categorical parameter, which is crystal system, was encoded as follows: 1 – triclinic, 2 – monoclinic, 3 – orthorhombic, 4 – tetragonal, 5 – trigonal, 6 – hexagonal, 7 – cubic.
Nanozymes were categorized according to the parameter dimensionality into three categories, where first (1D) are nanoparticles with one of the sizes >6 times larger than the others, second (2D) category are the particles that had one of the dimensions >6 times smaller than the others, the rest were assigned to 3rd (3D) type.

where AxByOz – chemical formula of nanozyme; n - number of non-oxygen elements; X – Electronegativity by Pauling, eV; mCD - mean charge density; R – low spin ionic radius, pm; OS – oxidation state; RedOx – redox potential, V; x, y, z - stoichiometric coefficients; A and B indicate that the characteristic belongs to the non-oxygen elements A and B respectively. Therefore, essential parameters for predicting the catalytic activity of nanozymes, describing the elemental composition, material physicochemical properties, shape, size, as well as the synthesis and analysis conditions, were calculated and selected to describe the processes to be predicted comprehensively.

Resource functions

Web resource structure:
Open database of existing inorganic nanozymes with original article links and sorting and filtering functions;
Interactive data visualization tool, with parameter selection and graph scale functions;
Data loading, if you want to add new nanozyme to the platform.
Predictive algorithm for different levels of user request:
The basic level involves an initial step with the catalytic activity of the material, the user enters only the material formula, and the output is a table with different ranges of Km and Kcat, under peroxidase-like activity standardized assay for different nanozyme sizes.
The progressive level assumes complete knowledge of the physical characteristics of the nanomaterial, on which the result is a variation of the target constants depending on each of the analysis conditions.
The advanced level is needed for researchers who use their methodology to analyze the peroxidase activity of the target nanomaterial, so the user needs to enter all 13 parameters (formula, crystal system, length, width, depth, neutral polymer, surfactant, substrate type, pH, temperature, substrates and catalyst concentrations) to predict Km and Vmax.

Prediction algorithm

Predictions of peroxidase activity represented as Km and Kcat are performed by a random forest regression algorithm. Random forest - based ML regression model for quantitative evaluation of nanozyme peroxidase activity achieving performance up to R2 = 0.80 for Kcat and R2 = 0.63 for Km.
Random forest is a supervised Learning algorithm which uses ensemble learning method for classification and regression. The trees in random forests are run in parallel. There is no interaction between these trees while building the trees. A random forest is a meta-estimator, it combines the result of multiple predictions.
To calculate parameters for the ML model, the platform contains algorithms that calculate descriptors, elements from the formula and a database of element constants.

the table shows the elements present in the table of constants

Prediction limitations

Explains error 500:

1.The non-stoichiometric nanozyme formula was introduced
2.Elements are introduced with a small letter
3.In the database of constants there are no inputted elements or there is no corresponding oxidation degree of the inputted element

Limitations of prediction efficiency

It is also essential to explore the current limitations of the developed model to provide insights into the model inter-and extrapolative power as well as to define the reliability of single predictions and to highlight what data is needed to improve the model performance further. Due to the data sparsity and the prevalent non-Gaussian distribution of the parameters, the algorithms predict catalytic activity constants with high accuracy in areas of the maximum probability density of the selected features and constants. Many samples outside the predictive intervals have such under-represented values as the low concentrations of substrate, H2O2, as well as the catalyst, as well as the poorly represented types of crystal system, dimensionality, and substrate type.