Product Search and Product classification for E-commerce
Product search involves categories as a helper system. Categorized offer reduces mistakes. That is why product classification plays a crucial role in quality shopping. Product catalog can be full of perfectly named and described items but if they miss user’s vocabulary then there are missed user perception in search. Therefore, the idea of using user’s search history to extend product data looks promising. If we extend product names and classification hierarchy with user’s phrases, then we also increase friendliness of search mechanisms and clarity of classification.
The article is presenting a research framework for machine learning classifiers application to check product database consistency. Investigation results are presented with implication for academia and practitioners.
Background
If you do product search or browse products on e-commerce services, it is natural that you are looking for content that you would like to find. Users usually identify products and services by their forms, functions, and features (3F). These dimensions are also utilized in the product classification process. Different classifications support decision making process delivering a double check. Users can be sure that it is the right product because of 3F based on the product description (name, description, product codes, etc.), but user also can check what the category product belongs. Is this the right or wild category? Therefore, all these factors together form strong convenience for decision making. This provides then to the acceptance of technology used and the way of making purchase on this portal, and finally ultimately influence the customer experience in proper way.
Recently, companies are investing a large budget and a lot of attention to have product search built in the most viable way for searching and categorization, and easiness to buy for customers. There is also an emerging phenomenon where customers are also delivering product related content. They score products and services, they write recommendation notes, opinions, and reviews. They are also ready to help others in product evaluation for features and functionalities (Sun et al. 2019). Investigations related to user’s activity in web shops are continuously reported (Naab & Sehl 2017, Yi et al. 2019; Ukpabi & Karjaluoto 2018). The study also confirms that user’s content can be the field of analysis for latent expectations, which causes bounce rate or abandoned baskets.
Product name shortness
Product names for web are usually short texts. This makes it easy to present on the smartphone or printed in fiscal receipt. But on the other hand, the shortness of information raises financial and operational risk for both the customer and the shop’s operator. Shortness of the information delivered does not exist only in product description but comes also from the user. For example, in the case of booking.com (Bernardi, Mavridis & Estevez 2019) the poorness of the query, that customers enter when they are searching for the offer generates a real consequence. Users are facing extremely important decisions where, for example, they are searching for the reservation of several dozen rooms for a company event. In such a case user even provokes misleading offer by the incomplete queries. If a shop is responsible supplier, they strive to precise the query by suggest system and the quote by multi-level classification and set of parametric filters grouping products into sets that are helping users in selections (e.g., Ingram, & Gaskell 2019).
The search relevance means that the results cover individual user’s expectation (search for meaning). Therefore, the judge of search relevancy includes a lot of subjective bias (Nelson, 2015, Ma et al 2014). The search relevancy has a contextual nature because users compare obtained results with others available on the internet. Search result relevance can also be perceived as the adequacy to the expectation in dynamic context where the progress of knowledge gathering from the market make it variable in time (Rokonuzzaman et al. 2020).
recommendation systems
Product search on the web is usually supported by several recommendation systems. They provide prompts, options, product variants or sets (Beel, Gipp, Langer & Breitinger 2016). If recommendation system works beyond up-selling it can reduce customer effort, however, it rises the following challenges (which can be also applied to product classification mechanisms): a) the needs to distinguish between random associations of products viewed but not purchased from products deliberately purchased together, b) capture short-range language patterns, that is used in search, and c) master user vocabulary (Nigam at al. 2019). Thus, the recommender can interactively display suggestions in a drop-down list as the searcher types a query phrase. The suggestions can be retrieved based on similar queries submitted by other users or previously prepared classification variants where some can include users’ content (Sandvig 2011).

Search phrase variants demonstrate the process of how users adjust terms from personal vocabulary to the product names (descriptions) or product classification until intended search meet results. Product classification is a part of the research fields known as ontology and taxonomy matching. In taxonomy matching, data are annotated for relationship (not for meaning). If you are going to find an object in a certain category, a matching algorithm must establish the meaning in external data or in context within the schema. In contrast, the ontologies logical systems of axioms work for data annotation according to functional meaning (Shvaiko & Euzenat 2005). When classification is created in the ontology manner comparable items are grouped according to the functionality. During a search process, the user focuses on matching expectations with functionalities, however only felt by her/himself. That is why, we can state that search phrases links hidden variables on user site with products features represented by text objects (product names). The query expansion mechanisms can help searchers refine their queries by recommending additional or alternative search terms based on grouping functionalities. User-side automated assistance can improve searcher performance (as measured by the number of relevant documents found) by approximately 20% (Jansen & McNeese 2005). Thus, effectiveness product classification should provide at least two major enhancements: a) recall the right products on the screen in a way that the collection contains as many products as possible but all responding to the customer’s request, and b) limit the match (precision), to the resulting subset of objects representing customer request with the minimal number of a missed objects (Nigam at al. 2019). Thus, we formulate a research hypothesis as follows:
H1: Supplementing product names as textual data that represent real object’s features and functionalities with users’ content as the additional vocabulary of product description can significantly improve classification accuracy.
Despite helping technology play a key role in delivering relevant results during search and browsing, the quality of the product database, as the source of content, cannot be indignant. A good practice for online stores is to present products categorized for major user profiles (Koehn at al., 2020; Gottlieb & Lorimor, 2017). If we look at the classification criteria from the customer perspective, we found focus on maximizing customer satisfaction by data clarity, high precision on acceptable recall which pretend to be self-service effective (Kettunen at al., 2018). The conjunctions of user content (reviews) with product features provide to understand the rationale for purchase decision and helpfulness of online reviews (Hou et al., 2019, Min and Park 2012). Search phrases, that customers type include hidden variable related to all contextual factors interacting in the moment of action. Therefore, despite these phrases can be very well converted to purchase in certain circumstances, they can be useless or harmful in another situation. For example, when a customer is looking for an item to replace in an emergency, she/he is usually preferring the same product, the same producer, brand, and functionality because this reduces risk of wrong choice. When the same user is looking for the same product for a new construction (design) she/he is going to look around to check viable options. Thus, we state, that user content can positively influence product classification and then browsing and searchability according to user vocabulary in certain contexts but in another could be harmful. Thus, user content could work both positively for one user group and negatively for other when set up match or mismatch certain context, as shown in example above. As the users’ vocabulary represented by users’ search phrases also represents a considerable number of users and their context of action, thus typical application of users’ content as the universal extension of product content (names or descriptions) can provoke data ambiguity. Thus, the hypothesis H2 is formulated as follows:
H2: Supplementing the professionally classified textual objects with users’ content provides ambiguous classification.
In regular business purchases, where customers repeat selection and context of purchase, it can be beneficial individual vocabulary utilization. Specific vocabulary can be obtained not only from user dialect but also from manufacturer’s catalogue. Thus, this manner can produce product catalogue personalization for e-commerce as well as data ambiguity. Therefore, product managers and web shop operators would face a dilemma when users’ content from one perspective adds value, but from other tilt product information to subjective side of certain customer group or even provide data worse quality (Nepomuceno et al., 2014).


The positive influence of users’ phrases in the e-commerce systems on the search results can trigger the company professionals for systematic work for the product description and classification improvements to satisfying customers expectation, which the one is match their vocabulary used for search with search results. This will finally define a successful self-service concept in the ecommerce business (Lee & Yoon, 2018).
Based on: Pawłowski, M. (2021). Machine Learning Based Product Classification for eCommerce. Journal of Computer Information Systems, 1-10. http://wwww.tandfonline.com/10.1080/08874417.2021.1910880



