Mosaic ITES Services	Rank Smart
     India  
+120 (0) 4626501
    US  
+(704) 966-6006
Free LSI Check
corn_left
Partner

PARTNER

corn_left

Partner Partner with us
and get special discount on our service packages

Read more
corn_left
Forum

FORUM

corn_left
SEO Basics

All that a newbee needs to know
corn_left
Blog

BLOG

corn_left

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!.....

Read more
corn_left
Gold Sift

GOLD SIFT

corn_left

Gold Sift Mosaic proudly announces publication of
"GOLD SIFT"
- a comprehensive online marketing handbook.

Author: VIKAS MALHOTRA
(Founder Mosaic Service)

corn_left
forum

NEWSLETTER

corn_left
 

HOW HAS GOOGLE ADOPTED LSI ?

 

 

.

Issue: 8th Oct, 2003
Author: Vikas Malhotra, eBiz Consultant

Proof of LSI

A normal way of representing this is the grid or the matrix form; this is the reason why experts call the LSI method as ?thinking inside the grid’??

Semantic text processing essentially understands linguists. Think of a statement; say, I am optimizing a paragraph for search engine. At least three to four words (I, am, a, for) in the statement are excesses, in the sense that they don’t contribute actively towards the meaning of the sentence. They simply add value to the sentence grammatically. In this way, natural language contains numerous redundant and unnecessary words, from the point of view of search engines or semantic meanings. Functional words, conjunctions, prepositions, auxiliary verbs, and several other forms of words just add meaning to a sentence but do not add much content. Ironically, these are the most frequently used words in English.

In the very the first step in LSI, these words are picked up and ignored. The document is then left with words that may have some semantic meaning. We can discard:

  • Articles, prepositions, and conjunctions
  • Common verbs and pronouns
  • Common adjectives (big, late, high)
  • Frilly words (therefore, thus, however, albeit, etc.)
  • Any word that appear uniquely in every document or in a particular document

Inside the Grid

Now, our document has a much-reduced collection of words on which we can apply our statistical methodology. We can now start to index this collection of words in the document. A normal way of representing this is the grid or the matrix form; this is the reason why experts call the LSI method as ?thinking inside the grid’. The grid or matrix contains the documents listed along the horizontal axis and the words contained in the documents along the vertical axis.

For the conventional keyword search, we just put a cross (X) in the column for any

document where a particular word (listed on the row) appears or just leave the column blank if the word does not appear. The grid then shows like this:

Document name/ Keywords contained Elevation Topography Height Tiger
GIS mapping   x x  
Topology   x x  
Rainfall harvesting   x x  
Poetries of William Blake        

Obviously, a grid may contain a cross or a blank. There is no midway and this way we can have an analysis of our document on keyword search. Note that we have left out any word or may have included it under any other column head if the form of the word varies, say it is ?topologies’ that appear somewhere in the document and not ?topology’. If instead of looking for the presence of each keyword in a document we take into account how many times a word appears in the given document, the grid may appear something like this:

 

Document name/ Keywords contained Elevation Topography Height Tiger
GIS mapping 6 8 6 1
Topology 6 6 3 0
Rainfall harvesting 2 3 7 0
Poetries of William Blake 0 0 0 5

 

These figures give certain mathematical meaning. We can calculate the mean, median, and mode of the occurrence of certain words in the document and the correlation between them. This gives us a detailed analysis on our document collection. In case of LSI, we do exactly this. After removing unnecessary words from the documents, we generate the term-document matrix. A graphical representation of this matrix would give you the term-space and will have as many dimension as the number of content-wise meaningful words. This is because, to graphically represent the matrix, you will need as many axes to the graph as there are content words.

Going by this application of the theory, if we try to analyse a real-life document collection and note down the occurrence of each content word, we will get numerous relevant content words. If these are recorded in the matrix, as above, and plotted on a graph, the result in the term space will also have numerous dimensions. This is true for each document in our collection. Each document is considered as a vector with the content words as their component. The documents with several common words will have vectors that are near to each other and hence, will be concluded to be semantically close. Documents with fewer common words will have vectors that are far apart and hence, are semantically distant.

It is mathematically possible to describe this space, although it is difficult to visualize such a space. However, if you try to visualize this multi-dimensional space, you can gain another interesting insight into LSI. Try looking at a branch of a tree full of green leaves. Since, there are leaves propping out at every possible direction, you will always fail to see all the leaves. That is, from whichever angle you try to look at the branch, few leaves will be hidden behind few others so that you can never see all the leaves at one go.

This idea can be contemplated as ?loss in information’ and is a similar idea that you can use to visualize your n-dimensional term space. From whichever angle you look from, some vectors in your n-dimensional term space always overlaps others and the boundaries blur or collapse. In other words, similar keywords or content words loses their distinct identity and get squeezed together. Hence, the difference between singular and plurals, or synonyms or similar meaning words tend to attain a null value

One thing to note here is that, although loss of information is deemed as a bad idea, it is

converted into a blessing when it comes to LSI. This technique of using or exploiting the feature of natural language, namely, similar-meaning words occur together, cuts off noise or unnecessary information. In the final lap, we can remove the hash from the hay.

Everyday, Google is taking a step to convert its whole search mechanism into an LSI-enabled one. Although, LSI is not adapted uniformly and in entirety, and not all searches will return a semantic word set now, the transition is visible in the search results. Conducting a search for ’phone’ will show results in which the keyword ’phone’ is contained and highlighted. However, if you add the tilde (~) before your keyword and search, (’~phone’) your result will show the Web site for Nokia and the word ?Nokia’ is now highlighted. From its new method of indexing, Google has determined that Nokia is relevant to phone.

To download LSI E-book click here

About The Author

Vikas Malhotra is a successful Internet marketing consultant having helped over 100 businesses increase their visibility and sales online. He is an online branding expert with experience in online media planning, buying and also search engine marketing. To learn more, visit mosaic-service.com. Read more online marketing articles at index.html

                 

Get in Touch

Call an expert on: +91 11 41623530/31
E-mail : vikas[at]mosaic-service.com

Using This Article

If you want to publish this article on your own web site, please use the whole article up to and including About the Author.

corn_left
Hot Service

HOT SERVICES

corn_left
  • Free SMM Analysis

Ask for a free SMM analysis of your website - worth $50

corn_left
Request A Quote

REQUEST A QUOTE

corn_left




corn_left
Services

SERVICES

corn_left
  • Search Engine Optimization
  • Pay Per Click
  • Content Syndication
  • Social Media Optimization
  • Link Building
  • Website Design & Development Service
  • Local Marketing
  • Affiliate Marketing
  • Blog Marketing
  • Consultancy Services

 

corn_left
Bookmark

BOOKMARK

corn_left

corn_left
RSS Feeds

RSS FEEDS

corn_left

Subscribe
Google SEMPO SEO Google
  • Home
  • |
  • About Us
  • |
  • Our Services
  • |
  • Careers
  • |
  • Casestudy
  • |
  • Contact Us
  • |
  • Site Map
  • |
  • Press Releases
  • |
  • E-Book
  • |
  • Articles
  • |
  • Privacy Policy
  • |
  • Testimonials

  • Seo India
  • |
  • Seo Firm India
  • |
  • Seo Company India
  • |
  • Seo Services India
  • |
  • Seo Consultants India


Mosaic ITES Services Pvt. Ltd.A- 14, Sector - 7,Noida, UP - 201301 India.

Tel: (0120) 4626500 Email: sales[at]mosaic-service.com
This site is best viewed in Internet Explorer 5.5 & Netscape 7.0 or higher versions, at a resolution of 1024 x 768 pixels