Missing body measurements prediction in fashion industry: a comparative approach

The use of artificial intelligence to predict body dimensions rather than measuring them by stylists or 3D scanners permits to obtain easily all measurements of individual consumers and can consequently reduce costs of population survey campaigns. In this paper, we have compared several models of machine learning to predict about 30 measurements used in fashion industry to construct clothes from 6 easy-to-measure body dimensions and demographic information. The four types of models we have studied are linear regressions, random forests, gradient boosting trees and support vector regressions. To construct and train them we have used anthropometric measurements of 9000 adult individuals of the French population collected by the French Institute of Textiles and Clothing (IFTH) during a national measurement campaign collected between 2003 and 2005. We have analyzed the model prediction performance in terms of individual and global predictions as well as the effect of the training dataset size and the importance of the input features. The linear and the support vector regressions have given the best results with respect to evaluation metrics, predicted distributions and have required less training data than tree-based models. It turns out that the weight and height have been the most important input features for the models considered while the hip girth has been the less important among the input measurements. Since the set of body dimensions used in fashion industry and the morphology depend on the gender, we have decided to treat men and women separately and to compare them.


Introduction
In the design of clothing, many measurements of the human body are required to obtain garments adapted to human morphologies.Some of these dimensions, such as the height or the waist girth, are said to be easy-to-measure since it is possible to get them easily oneself with a tape measure by finding the associated key body points.However, other dimensions are difficult to obtain without the help of a professional stylist.Among them we can cite for example the back width and length which are requested to the manufacture of men's shirts.Moreover, even with a professional measurement, we can find ourselves faced with errors due to the variability of measurements between stylists.
The 3D scanning devices have been used by the textile industry to minimize errors with manual measurements and to reduce their acquisition time.In general, 3D scanning model the human body in the form of a mesh made up of points and triangles.Software such as Anthroscan from Human Solutions or 3D Measurement Software from ProtoTech Solutions permit to extract up to 200 measurements of the human body from a mesh.This extraction is usually performed by using computational geometry together with landmarks placed on the subject body before the acquisition (Lu and Wang 2008) or more recently with deep learning-based methods (Kaashki et al. 2021).For a comprehensive survey about 3D scanning technologies and the associated processing stages one can read (Bartol et al. 2021).
On the other hand, Smartphone scanning applications have been developed in order to extract measurements.The principle is to position the device in front of the person so that the phone camera collects pictures of the individual for the extraction.The technologies generally involved use computer vision, plane geometry (Hung et al. 2004), artificial intelligence (Ashmawi et al. 2019; de Souza et al. 2020) and permit the user to obtain its body measurements in a limited number.
The ever-increasing development of artificial intelligence in recent years has allowed its application to many fields of the fashion and apparel industry.These methods include machine learning, decision support system, expert system, optimization, image recognition and vision and have given interesting results to the apparel manufacturing, apparel design, distribution and fabric production.Exhaustive reviews of the impact of these techniques to this specific range of applications can be consulted in Guo et al. (2011) and Giri et al. (2019).
The aim of this paper is to apply artificial intelligence methods to facilitate the extraction of measurements and to accurately predict full-body anthropometric measurements from a subset of easy-to-measure dimensions.Our approach is particularly interesting to allow all measurements of individual consumers to be obtained from measurements taken by themselves as well as to consequently reduce costs of population survey campaigns.
The extraction of the individual customer body dimensions is helpful nowadays where the internet shopping for clothes has become increasingly popular, especially since the pandemic crisis.In practice, the user can measure some of its body dimensions and deduce all of them with the help of artificial intelligence.Thus, he can buy on online clothing stores by filling in the measures in order to obtain clothes adapted as best as possible to its morphology.
On the other hand, since there is a substantial variation in body measurements between people, the usual method to provide accurate sizing charts of a population for the garment pattern making is to survey a representative sample of the target population.These surveys generally include general information about the clothing habits and the demographic characteristics of the individual surveyed together with the measure of its body sizes by stylists or by scanning them with 3D scanning devices.The implementation of such campaigns can therefore require many professionals as well as advanced technology deployed across a territory or a country and is consequently very expensive.Thus, the prediction of full-body anthropometric measurements can help to reduce significantly the complexity of the process of the survey and its costs.The idea is to collect only some of the easy-to-measure key body dimensions and recover all of them with artificial intelligence.In these conditions, the survey can be conducted online where respondents measure themselves with a tape or a scanning application even in pandemic situations.
In this paper, we have used and compared different machine learning models to deduce all the necessary body dimensions for making all type of clothing (36 for women, 31 for men) from 6 easy-to-measure body dimensions, such as the height or the waist girth and demographic information.The machine learning models we have used are linear regressions, random forests, gradient boosting trees and support vector regressions.We have computed and compared their accuracies in terms of individual and global predictions.Moreover, we have studied the influence of the training dataset size and the importance of the input features in the model performance.To train and test our methods, we have used a database of 9000 adult individuals surveyed by French Institute of Textiles and Clothing (IFTH) during a national measurement campaign between 2003 and 2005 with the help of 3D scanning device from Human Solutions coupled with Anthroscan and professional stylists.
The paper is organized as follows.In "General scheme" section we present the general scheme of our work.The dataset and its preprocessing are presented in "Anthropometric data and preprocessing" section and the models, their evaluation metrics and the associated hyperparameters tuning in "Models, evaluation metrics and hyperparameters tuning" section.The "Results" section reports the results about the comparison of the performance of the models, the influence of the training dataset size and the importance of the input features.We discuss and interpret these results in "Discussion" and we conclude in "Conclusion" section.
The estimation of body measurements is intrinsically related to the study of proportions of the human body and is not only a problem considered within the framework of the textile industry.The beginnings of this old subject can be found in the work of the roman architect of the first century BC Vitruvius in the first chapter titled "On Symmetry: In Temples And In The Human Body" of the book III of its treatise "De Architectura" and its considerable influence on the architects and artists of the Renaissance.
The first modern statistical studies in this direction have been made by Rollet in 1892 (Rollet 1892) and Pearson in 1899 (Pearson 1899) by the estimation of the human height from its long bone lengths.This question also arises in the health sector and has been particularly studied for the health of elderly people, where the measurements can become difficult to measure because of the position required or the deformations of the skeleton.Chumea and other researchers have used several linear regressions to predict the height and/or the weight from the easy-to-measure knee height in Chumlea et al. (1985) or Chumlea and Guo (1992) using the National Health Examination Survey (NHES) database (Gordon and Miller 1964).
Afterward, the development of artificial intelligence encouraged many researchers to apply machine learning models to this problem.The analysis of correlation of body dimensions shows that there are important linear dependencies between them.Thus, in the literature, several studies have been conducted using linear machine learning models.For example in Indah et al. (2016), Indah et al. estimated 35 measurements of a small dataset of 45 elderly Javanese from age and body mass index by linear regression models.The sex, height, weight and foot size have been predicted from 22 hand related features from linear and logistic regression by Miguel-Hurtado et al. (2016).
The ever-increasing interest of the application of deep learning in industrial problems led to several research studies applying artificial neural networks to apparel industry problems.Liu et al. (2017) exposed that back propagation neural networks is accurate and stable in the prediction of 10 lower body dimensions from the height, hip and waist girths.Wang et al. (2019) showed that this type of neural network can be outperformed by radial basis function artificial neural networks.They deduced 8 measurements from 4 easy-to-measure dimensions and studied the effect of the volume of the training dataset initially composed of 180 samples.In Wang et al. (2021), generalized regression neural networks have been used to predict 76 body measurements from 7 easy-to-measure dimensions with the Anthropometric Survey of US Army Personnel (ANSUR) II dataset (Gordon et al. 2014).
In order to take into account linear and non-linear aspects of body dimensions, Liu et al. (2014) used a combination of multiple linear regression and radial basis function network models to predict 60 body dimensions from 8 feature parameters.Similarly (Chan et al. 2005;AP et al. 2003) studied the relationship between men's tailor-made shirt pattern parameters and body parameters with multiple linear regression and artificial neural networks.Rativa et al. (2018) compared several machine learning regression models (support vector regression, Gaussian process and artificial neural network) to estimate the height and the weight from different groups of anthropometric measurements on the National Health and Nutrition Examination Survey (NHANES) III (NCHS 1994) and the ANSUR datasets (Gordon et al. 1989).

Methods
In this section we explain our approach to estimate body dimensions from a set of easy-to-measure measurements.

General scheme
The institute IFTH has identified in collaboration with several modelers 36 (respectively 31) measurements for women (respectively men) that are needed for garment pattern making.These measurements are given in Table 1 together with precision steps and associated units.The precision step of a measurement is defined as the acceptable error in measuring a measurement by a professional stylist.Among them, we have chosen the six key easy-to-measure body dimensions which are the height, weight, chest/bust, waist and hip girths and inside leg length as input features, see Fig. 1.They have the particularity that they can be measured alone with a tape measure or with a scan application installed on a Smartphone.Additionally to these dimensions, we have included two demographic information, the age and the socioprofessional category, and the shoe size of the individual.Then, four types of machine learning algorithms have been trained and tested on a dataset of a French national measurement campaign to predict the 30 (respectively 25) remaining measurements for women (respectively men).

Anthropometric data and preprocessing
In this paper we have used a dataset composed of individuals of the French population collected by the institute IFTH in a national measurement campaign between 2003 and 2005 by using 3D scanners.Each person is thus represented by standing and sitting meshes composed of around 400k vertices and 800k triangles.The software Anthroscan has been used to extract around 90 body measurements for each position.Hence, about 180 measurements have been collected per individual together with some additional manual measures taken by professional stylists and demographic information such as the gender or the education level and clothing size habits.This campaign has been made in 37 different locations in France to represent correctly the target population.The sample collected of size 11 500 is aged from 5 to 70 years.In this work we have restricted the sample to the 9000 adults composed of 5000 women and 4000 men.Missing values in measurement features have been filled using a k-nearest neighbors algorithm with the Gower's distance (Gower 1971).The continuous input variables have all been standardized.The socioprofessional categorical nominal variable is a statistical nomenclature to classify professions of the French population created by the National Institute of Statistics and Economic Studies (Insee).It defines seven groups (managers, students, employees, farmers/merchants, laborers, inactives and retirees) and it has been encoded by a one-hot-encoding forgetting one category (farmers/merchant) to avoid multicollinearity issues.

Models, evaluation metrics and hyperparameters tuning
In this section we present the machine learning models used in this paper, the evaluation metrics considered and how we have chosen the associated hyperparameters.The models that have been compared for our regression tasks are linear regression models (LR), random forest models (RF), gradient boosting tree models (GB) and support vector regression models (SVR).
The random forest is an ensemble machine learning algorithm based on the use of combination of multiple decision trees trained in parallel on randomly extracted subsets of the dataset.This algorithm can be used to solve regression problems by averaging the predictions of decision trees.Random forests are usually used on large datasets and have much better performance and fewer overfitting issues than decision trees.However, they can have a high training time and are difficult to interpret (Ho 1995).
Gradient boosting trees is also an ensemble learning algorithm that can be used in regression and which is based on decision trees.This time, the trees are trained iteratively by eliminating the errors of the previous trees.This model is generally more accurate than random forests but unfortunately once again hard to interpret (Breiman 1997).
The important idea behind support vector machine models is to map the dataset into high-dimensional space where it is easier to perform regression analysis.To this end kernel functions are used and permit non-linear analysis.These models are robust Fig. 1 The 6 easy-to-measure key body measurements prediction methods and produce significant accuracy than models based on decision trees with less computation time (Cortes and Vapnik 1995).
To evaluate the prediction accuracy of the models, we have used the average mean absolute error (MAE) defined by where x = (x 1 , . . ., x n ) and y = (y 1 , . . ., y n ) are n-dimensional vectors, and we have eval- uated it between the real test values y and the predicted values x by a model.This metric has been used to be compared with the precision steps given in Table 1.The precision ratio PR of a measurement is defined to be the ratio between the MAE associated to the best model and the precision step of the measurement, that is to say: where y is a measurement of Table 1, PS(y) is its precision step, m is a trained model where m ∈ models , X test is the input test matrix and y test is the real test values.
The machine learning algorithms considered in this work have a wide range of hyperparameters with significant effects on the performances of models that we have tuned with bayesian optimization methods.This method comes from global optimization theory (Mockus et al. 1978) and is applicable to the problem of minimizing a scalar-valued function which is costly to evaluate.This approach has been applied to solve machine learning problems where the function is an evaluation metric of a model.This optimization method has better results and is much faster than the grid and random search cross-validations.
Hence we have used bayesian optimization to tune hyperparameters of the RF, GB and SVR models by minimizing the evaluation metric MAE.The hyperparameters that we have considered to tune the RF and GB models are the number and the maximum depth of trees, the minimum number of samples required to split an internal node and to be at a leaf node.For the SVR model the hyperparameters tuned are the kernel function, its coefficient γ , the regularization parameter and the epsilon-tube in which no penalty is associated in the training loss function.

Model prediction performance
We have trained the machine learning models considered in "Models, evaluation metrics and hyperparameters tuning" to estimate each key measurement used in fashion industry (see Table 1) from our input features separating men from women.To this end we have split the dataset into two sets, one set to train the algorithms and another set to test the algorithms, following a 70-30% division.The resulting evaluation scores comparing the test values and the predicted values on the test set sorted in descending (1)

PS(y)
order with respect to the precision ratios are given in Tables 2 and 3 for women and men respectively.
The hyperparameters tuned for the models by the bayesian optimization are not much affected by gender.For both the RF and GB models, except for the maximum depth of trees, the average of the hyperparameters obtained are similar.We have obtained on average 550 trees, and 6 minimum of samples required to split an internal node and to be at a leaf node.However the maximum depth of trees is on average 14 for the RF models while 5 for the GB models.The tuning of the hyperparameters of the SVR models give on average 58 for the regularization parameter, 0.72 for the epsilon-tube and 10.64 for the γ coefficient.The linear, sigmoid and radial basis function kernels have also been compared and for every measurement and both genders the linear kernel has the best performance.

Comparison of model prediction distributions
The objective of this work was to make individual predictions as well as to update past sizing systems.To this end we were not only interested into average model prediction performances, but also into the distributions of the values predicted by models.In this section we have computed and compared the Kullback-Leibler (KL) divergence (Kullback and Leibler 1951) of the density estimations between the real values of the test set and the predicted values by our models for each measurement.For two distributions p = (p k ) and q = (q k ) , the KL divergence (or relative entropy) This statistical distance is used in probability and information theories to measure the difference between probability distributions and quantifies which model best respects the expected distribution although it is non-symmetric and doesn't satisfy to the triangle inequality.
In Table 4 we have summarized for each measurement and both gender which model has the lowest KL divergence compared to the distribution of the test set.Again, for almost all measurements LR and SVR models are better than RF and GB models.

The effect of the training dataset size
Since the training dataset is one of the key factors that can affect the performance of our models, various training datasets with increasing sizes were established in order to investigate the effects.For each training dataset, the data were extracted randomly from the remaining data in the dataset.Then our 4 models have been trained with bayesian optimization hyperparameter tuning and tested on these subsets.To illustrate this in Fig. 2 we have shown it for the measurement waist-knee height for women which is a well-estimated measurement.

Features importance
The permutation feature importance measures the importance of the input features of a model by calculating the increase in the model's prediction error after randomly shuffling the feature.A feature is said to be important if permuting its values increases the model error.This notion was introduced in Breiman (2001) for random forests and in Fisher et al. (2019) for a general model.We now explain how to calculate the permutation feature importance.For a trained model m where m ∈ models , the input test matrix X test , a column feature j in X test and the target test feature y test : • We compute the error metric MAE(y test , m(X test )); • We define the matrix X test,perm to be the matrix X test where the column feature j has been randomly permuted; • We compute the associated error metric MAE(y test , m(X test,perm )); • We calculate the permutation feature importance FI j of j by In Table 5 we have presented the means of permutation feature importances along all measurements for all models and in Table 6 we have counted the number of measurements where each feature is the most important.

Discussion
We have obtained that, for both genders, all measurements are estimated with a MAE and a precision ratio smaller than 3.6 and that models have similar performances.Measurements can have different prediction accuracies depending on the gender, for example the knee girth has a precision ratio of 2.45 for women and 2.10 for men.For both genders, the five same measurements have a precision ratio less than 1 and almost exactly in the same order, these measurements are given in Fig. 3.The three least well estimated measurements are the same for women and men and are given in Fig. 3.One can see that neck measurements are not easily obtainable by our models.We have got that the   average MAE is between 1.25 and 1.4% for each model and that the predictions for women are slightly better than for men.The linear and the support vector regressions have an average MAE between 1.25 and 1.3% and are therefore better than the models based on decision trees which have an average MAE between 1.3 and 1.4%.For almost all measurements LR and SVR models are better than RF and GB models.The computation of KL divergences in our case have corroborated the previous discussion and the results of "Model prediction performance" section since we have obtained Fig. 3 The five best estimated measurements by the models for both genders in figures (a) to (e).The three least well estimated measurements by the models for both genders in figures (f-h) and the fourth least well estimated measurement for women (respectively men) in figure (i) (respectively (j)) Fig. 4 Boxplots and density estimations of the predicted values by the models compared to the real test values of the bustpoint-waist length for women.LR: linear regression.RF: random forest.GB: gradient boosting.SVR: support vector regression that the LR and SVR models have better distributions than the RF and GB models.We note that this time the men have slightly better results than women contrary to results with the metric MAE.To illustrate this we have shown in Fig. 4 the boxplots and the density estimations of the predicted values by the models compared to the real test values of the bustpoint-waist length for women which is one of the less well estimated measurements.
Moreover, the effect of the training dataset size is almost always the same for both genders and all measurements.One can see that the LR and SVR models rapidly have good results with an important stability while the RF and GB models need a more important dataset size and have rather unstable results.This instability is certainly due to the fact that these two models have a part of randomness in their construction and learning.For the LR and SVR models, it turns out that to obtain good results we have needed a dataset of size approximately 500.One of the only measurements having a different behavior by evaluating its model prediction performance metric along different training dataset sizes is the upper arm girth.This particularity is more pronounced for the men dataset where we have that the RF model has better result than the LR and SVR models and continue to improve without stabilization, see Fig. 5.
We have also studied the evolution of the KL divergence of the density estimations between the real values of the test set and the predicted values by our models when increasing the size of the training datasets and the results are very similar to the evolution of the MAE.
It turns out that, even if the weight and height have the most influence for both sexes, the weight is the most important input feature for women while the height is the most important feature for men.The inside leg length, the chest/bust, waist and hip girths have more influence for women than for men.It is shared for both genders and all models that the inside leg length, the chest/bust and waist girths are more important than the hip girth.It is interesting to note that the only measurement for which the hip girth is the most important feature is the thigh girth for the RF model and both sexes and that this measurement has the particularity to be not Fig. 5 MAE of the models trained by datasets with various sizes for the upper arm girth measurement for men.LR: linear regression.RF: random forest.GB: gradient boosting.SVR: support vector regression well estimated in terms of evaluation metrics (see Tables 2 and 3) while having good distribution results (see Table 4).The age, the shoe size and the socioprofessional category don't have much importance.We also see that the LR model is less stable than the other ones with respect to the variation of the inputs while tree based models are more resistant.
The only measurement for which the age is the most important input feature is the left shoulder slope (acromiale-neck base) for the model RF (respectively GB) for women (respectively men) and is the most difficult measurement to predict for both genders (see Tables 2, 3 and 4).

Conclusions
In this article, we have compared linear regressions, random forests, gradient boosting trees and support vector regressions to predict about thirty measurements used in fashion industry to construct clothes from 6 easy-to-measure body dimensions and demographic information.Our work shows that for both genders the models have good prediction accuracies and distributions.More precisely, the average MAE per model is less than 1.4 and is slightly better for women than for men, while the KL divergences between test values and predicted values in the test set are inferior to 0.03 and are slightly better for men than for women.The results suggest to use linear and support vector regressions to estimate body dimensions.These models have better MAE and distribution results, they generally need only 500 samples to be correctly trained and are more stable than tree-based models.It turns out that girth measurements are more difficult to estimate than height measurements.The study of the importance of the input features indicates that for both genders, the clothing habits and the demographic characteristics are not important and that the weight and height have the most influence for the models considered while the hip girth is the less important among the input measurements.This result is positive since it is much easier to measure its weight and height rather than the hip girth which is often confused with the high hip girth.Actually we have that the weight is the most important feature for women while the height is the most important feature for men.It is shown in the literature that artificial neural networks have efficient results to predict body dimensions.Hence, future research can be conducted to compare the models used in this work with deep learning models and study their individual and global prediction results and the influence of the input features and training dataset size.

Fig. 2
Fig.2MAE of the models trained by datasets with various sizes for the waist-knee height measurement for women.LR: linear regression.RF: random forest.GB: gradient boosting.SVR: support vector regression

Table 1
Measurements used in fashion industry according to the modelers of the French Institute of Textiles and Clothing (IFTH)

Table 2
Prediction performance metrics of the measurements used in fashion industry for the women dataset

Table 3
Prediction performance metrics of the measurements used in fashion industry for the men dataset

Table 4
Models with lowest KL divergence from the expected distribution sorted by lowest divergence

Table 5
Table of means of permutation feature importances for all measurements and both genders

Table 6
Count table of permutation feature importances for all measurements and both genders