Sunday, March 11, 2018

Elastic Search - Employee



Elastic Search – Fast Employee search and Aggregation.
Couple of years back, I was reading on Elastic Search and found it to be very useful. I came up with a proof of concept to show some of the data used by our clients in a very useful way. I cannot talk about that PoC here, as the data is PII. The data provided was very intuitive and there was lot of actionable and intelligent info generated. It had tremendous impact on the clients that resulted in lot of efficiency savings and increasing their bottom line. This PoC was very well received by the clients and this was added as a feature to the product. Elastic Search and Kibana features were used in this product.
Here I am demonstrating similar features used on the PoC with generic data. I am using randomly generated employee database as sample dataset for the demo.
Elastic Search has an interesting feature about Aggregation. You can bucket your employee data in to specific logical units for further aggregation and search. For Ex. A typical employee class data consists of Employee Number, First Name, Last Name, Address, City, State, Zip. We can do aggregation of City, State and Zip, so that we can localize our searches again within these buckets. Also, with aggregation, we can figure how many students are there in a particular, City or state or a zip code.

You can look at the attached code for the implementation. I am going to illustrate some key concepts which are associated with ElasticSearch.
Indexing: While Indexing/seeding the data, EdgeNGrams, Analyzers and Filters. This will help us fast searching of the text, either partial or complete. The key thing is, should be able to search partial text too and also the complete text should be available for aggregation. This is achieved with EdgeGrams and Analyzers.
Code Snippet:

public void CreateEmployeeIndex3()
{
    client.CreateIndex(IndexName, i => i
    .Settings(s => s
        .Analysis(a => a
            .TokenFilters(tf => tf
                .EdgeNGram("edge_ngrams", e => e
                    .MinGram(1)
                    .MaxGram(50)
                    .Side(EdgeNGramSide.Front)))
                .Analyzers(analyzer => analyzer
                    .Custom("partial_text", ca => ca
                        .Filters(new string[] { "lowercase""edge_ngrams" })
                        .Tokenizer("standard"))
                    .Custom("full_text", ca => ca
                        .Filters(new string[] { "standard""lowercase" })
                        .Tokenizer("standard")))))

                        .Mappings(m => m
                        .Map<EmployeeInfo>(mm => mm
                        .AutoMap()
                        .Properties(p => p
                        .Text(t => t
                        .Name(n => n.Employee_Num)
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Text(t => t
                        .Name(n => n.First_Name)
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Text(t => t
                        .Name(n => n.Last_Name)
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Text(t => t
                        .Name(n => n.Address)
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Text(t => t
                        .Name(n => n.City)
                        .Fields(f => f
                        .Text(tt => tt
                        .Name("mycity")
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Keyword(k => k
                        .Name("keyword_city")
                        .IgnoreAbove(256)
                        )))
                        .Text(t => t
                        .Name(n => n.State)
                        .Fields(f => f
                        .Text(tt => tt
                        .Name("mystate")
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Keyword(k => k
                        .Name("keyword_state")
                        .IgnoreAbove(256)
                        )))
                        .Text(t => t
                        .Name(n => n.Zip)
                        .Fields(f => f
                        .Text(tt => tt
                        .Name("myzip")
                        .Analyzer("partial_text")
                        .SearchAnalyzer("full_text"))
                        .Keyword(k => k
                        .Name("keyword_zip")
                        .IgnoreAbove(256)
                        )))

                        )))

                        );
}







Aggregation:
Illustrating aggregation using the UI.






Here the City, State and Zip are the aggregating buckets. When we click any or all of these and click update, it will list down all the cities, states and Zips in which the employees belong to.
We can further select any of these and refine our search even further.


Search:
The search in ES is really really fast. 







Lets say I am searching for an employee. I can search either based on employee number or first name or last name. Complete text is not required only partial text also is enough.  ES will try to find the matching element.

NOTE:
·         I have used Visual Studio 2017 Community Edition.
·         Elastic Search Version is 5.5.0


Complete code can be downloaded here. I have not included the packages, try getting them from NuGet

https://sites.google.com/site/letuscodecrazy/home-1

For Elastic search download from its website.


Important Links: