Prediction of the physical properties of pure chemical compounds through different computational methods.
Liquid thermal conductivities, viscosities, thermal decomposition temperatures, electrical conductivities, normal boiling point temperatures, sublimation and vaporization enthalpies, saturated liquid speeds of sound, standard molar chemical exergies, refractive indices, and freezing point temperatures of pure organic compounds and ionic liquids are important thermophysical properties needed for the design and optimization of products and chemical processes. Since sufficiently purification of pure compounds as well as experimentally measuring their thermophysical properties are costly and time consuming, predictive models are of great importance in engineering. The liquid thermal conductivity of pure organic compounds was the first investigated property, in this study, for which, a general model, a quantitative structure property relationship, and a group contribution method were developed. The novel gene expression programming mathematical strategy [1, 2], firstly introduced by our group, for development of non-linear models for thermophysical properties, was successfully implemented to develop an explicit model for determination of the thermal conductivity of approximately 1600 liquids at different temperatures but atmospheric pressure. The statistical parameters of the obtained correlation show about 9% absolute average relative deviation of the results from the corresponding DIPPR 801 data . It should be mentioned that the gene expression programing technique is a complicated mathematical algorithm and needs a significant computer power and this is the largest databases of thermophysical property that has been successfully managed by this strategy. The quantitative structure property relationship was developed using the sequential search algorithm and the same database used in previous step. The model shows the average absolute relative deviation (AARD %), standard deviation error, and root mean square error of 7.4%, 0.01, and 0.01 over the training, validation and test sets, respectively. The database used in previous sections was used to develop a group contribution model for liquid thermal conductivity. The statistical analysis of the performance of the obtained model shows approximately a 7.1% absolute average relative deviation of the results from the corresponding DIPPR 801  data. In the next stage, an extensive database of viscosities of 443 ionic liquids was initially compiled from literature (more than 200 articles). Then, it was employed to develop a group contribution model. Using this model, a training set composed of 1336 experimental data was correlated with a low AARD% of about 6.3. A test set consists of 336 data point was used to validate this model. It shows an AARD% of 6.8 for the test set. In the next part of this study, an extensive database of thermal decomposition temperature of 586 ionic liquids was compiled from literature. Then, it was used to develop a quantitative structure property relationship. The proposed quantitative structure property relationship produces an acceptable average absolute relative deviation (AARD) of less than 5.2 % taking into consideration all 586 experimental data values. The updated database of thermal decomposition temperature including 613 ionic liquids was subsequently used to develop a group contribution model. Using this model, a training set comprised of 489 data points was correlated with a low AARD of 4.5 %. A test set consisting of 124 data points was employed to test its capability. The model shows an AARD of 4.3 % for the test set. Electrical conductivity of ionic liquids was the next property investigated in this study. Initially, a database of electrical conductivities of 54 ionic liquids was collected from literature. Then, it was used to develop two models; a quantitative structure property relationship and a group contribution model. Since the electrical conductivities of ionic liquids has a complicated temperature- and chemical structure- dependency, the least square support vector machines strategy was used as a non-linear regression tool to correlate the electrical conductivity of ionic liquids. The deviation of the quantitative structure property relationship from the 783 experimental data used in its development (training set) is 1.8%. The validity of the model was then evaluated using another experimental data set comprising 97 experimental data (deviation: 2.5%). Finally, the reproducibility and reliability of the model was successfully assessed using the last experimental dataset of 97 experimental data (deviation: 2.7%). Using the group contribution model, a training set composed of 863 experimental data was correlated with a low AARD of about 3.1% from the corresponding experimental data. Then, the model was validated using a data set composed of 107 experimental data points with a low AARD of 3.6%. Finally, a test set consists of 107 data points was used for its validation. It shows an AARD of 4.9% for the test set. In the next stage, the most comprehensive database of normal boiling point temperatures of approximately 18000 pure organic compounds was provided and used to develop a quantitative structure property relationship. In order to develop the model, the sequential search algorithm was initially used to select the best subset of molecular descriptors. In the next step, a three-layer feed forward artificial neural network was used as a regression tool to develop the final model. It seems that this is the first time that the quantitative structure property relationship technique has successfully been used to handle a large database as large as the one used for normal boiling point temperatures of pure organic compounds. Generally, handling large databases of compounds has always been a challenge in quantitative structure property relationship world due to the handling large number of chemical structures (particularly, the optimization of the chemical structures), the high demand of computational power and very high percentage of failures of the software packages. As a result, this study is regarded as a long step forward in quantitative structure property relationship world. A comprehensive database of sublimation enthalpies of 1269 pure organic compounds at 298.15 K was successfully compiled from literature and used to develop an accurate group contribution. The model is capable of predicting the sublimation enthalpies of organic compounds at 298.15 K with an acceptable average absolute relative deviation between predicted and experimental values of 6.4%. Vaporization enthalpies of organic compounds at 298.15 K were also studied in this study. An extensive database of 2530 pure organic compounds was used to develop a comprehensive group contribution model. It demonstrates an acceptable %AARD of 3.7% from experimental data. Speeds of sound in saturated liquid phase was the next property investigated in this study. Initially, A collection of 1667 experimental data for 74 pure chemical compounds were extracted from the ThermoData Engine of National Institute of Standards and Technology . Then, a least square support vector machines-group contribution model was developed. The model shows a low AARD% of 0.5% from the corresponding experimental data. In the next part of this study, a simple group contribution model was presented for the prediction of the standard molar chemical exergy of pure organic compounds. It is capable of predicting the standard chemical exergy of pure organic compounds with an acceptable average absolute relative deviation of 1.6% from the literature data of 133 organic compounds. The largest ever reported databank for refractive indices of approximately 12 000 pure organic compounds was initially provided. A novel computational scheme based on coupling the sequential search strategy with the genetic function approximation (GFA) strategy was used to develop a model for refractive indices of pure organic compounds. It was determined that the strategy can have both the capabilities of handling large databases (the advantage of sequential search algorithm over other subset variable selection methods) and choosing most accurate subset of variables (the advantages of genetic algorithm-based subset variable selection methods such as GFA). The model shows a promising average absolute relative deviation of 0.9 % from the corresponding literature values. Subsequently, a group contribution model was developed based on the same database. The model shows an average absolute relative deviation of 0.83% from corresponding literature values. Freezing Point temperature of organic compounds was the last property investigated. Initially, the largest ever reported databank in open literature for freezing points of more than 16 500 pure organic compounds was provided. Then, the sequential search algorithm was successfully applied to derive a model. The model shows an average absolute relative deviations of 12.6% from the corresponding literature values. The same database was used to develop a group contribution model. The model demonstrated an average absolute relative deviation of 10.76%, which is of adequate accuracy for many practical applications.