With the development of new technologies, new marketing strategies also emerges. The integration of artificial intelligence technologies into the businesses allows significant innovations in the marketing and advertisement sectors. As the Golive R&D department, we closely monitor such changes and develop innovative projects for various companies. One of those project is Artificial Intelligence Based Video Marketing Application.
The aim of this project is to detect products such as clothing, shoes and similar marketing items in a given image and change the properties of these products in terms of color and item size as well as detect humans. With human detection, it is possible to change the person's pose, to create an image with a different human postures; allowing brands to conduct their marketing and advertising efforts more effectively and efficiently.
The project has 4 phases:
- Image Synthesis
- Pixel-Based Detection of the Product in the Image
- Pixel-Based Detection of the Humans in the Image
- 3D Human Modelling
Image Synthesis
The synthesis of three-dimensional models of the products existing in the inventories of e-commerce websites is aimed as the first stage. Nerf and 3D inpainting are chosen as primary candidate algorithms. However it is observed that GAN based achieves higher success rates than either of those methods.
Therefore, it was decided to use GAN-based methods in the project. Figures 1 and 2 show examples of results produced with GAN.
During the image synthesis stage, the similarity level between the synthetic images obtained by GAN and the real images used in their generation or other real images in the dataset to which synthetic images are related, are compared.
Pixel-Based Detection of the Product in the Image
During the pixel-based detection of the product in the image, the UNet network was modified to provide pixel-based detection of the product in the image (shoes in this project). This model is trained with a dataset consisting only of shoe images or shoes worn by humans. Figure 3 shows an example of the results. The results produced by the model and the existing masks were compared with the accuracy_score, jaccard_score and f1_score parameters and the results were found as follows:
- Accuracy Score: 99.3077840684336
- Jaccard Score: 98.65040165159472
- F-1 Score: 99.3077840684336
Pixel-Based Detection of the Humans in the Image
In the pixel-based detection phase of the person in the image, a UNet model that produces results in four categories was used. This model divides the image into four classes: hair, skin, clothing and background. By combining the hair, skin and clothing categories produced as a result of the model, pixel-based human detection was achieved with this model. Figure 4 shows an example of the results produced. The success rate of the model was tested with the u2net model, which makes a pre-trained human segmentation. The test results are as follows:
- Accuracy Score: 90.80222778320308
- Jaccard Score: 83.42822207931677
- F-1 Score: 90.80222778320308
3D Human Modelling
In the last stage, 3D Human Modeling, the success of the 3D human modeling project with clothes was measured by “lpips”. Images of 34 different people were taken from 3 different angles. Figure 5 shows an example of the results produced. Lpips similarity ratio was found to be 0.788610.
High Success Rate
As a result, the project's success rate in pose transfer in 3D human modeling was up to %88. The success rate was %99.30in product detection and %90.8 in human detection. Success in 3D human modeling increased to %78. A presentation was made for TÜBİTAK support.
We believe that this project holds great potential for companies in the marketing and advertisement sectors. Thanks to the final application, companies will be able to promote their products more effectively and provide more attractive and compelling advertisements to potential customers.