A Data Science Central Community
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. NLP is considered as a sub-field of artificial intelligence and has significant overlap with the field of computational linguistics. It is concerned with the interactions between computers and human (natural) languages.
Natural language generation systems convert information from computer databases into readable human language, and Natural language understanding systems convert human language into representations that are easier for computer programs to manipulate. NLP encompasses both text and speech, but work on speech processing has evolved into a separate field.
Why Natural Language Processing? Applications for processing large amounts of texts require NLP expertise; this is specifically the case when you want:
For computer systems the task is tough. When people see text, they understand its meaning (by and large). For example: According to research, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. When computers see text, they get only character strings (and perhaps HTML tags). We'd like computer agents to see meanings and be able to intelligently process text. These desires have led to many proposals for structured, semantically marked up formats, but often human beings still resolutely make use of text in human languages. This problem isn’t likely to just go away.
The NLP is difficult because the language is flexible, there is constantly new words, new meanings, different meanings in different contexts, language is subtle, the language is complex, there are many hidden variables (knowledge to the world, knowledge of the context, knowledge of the techniques of human communication (example: can you tell me the time), the problem of scale (infinite possible words, meanings, context), problem of scarcity (very difficult to do statistical analysis, most things, words, concepts are never seen before), long range correlation, ...
In this area Teradata offers Aster analytic solutions involving Attensity and they make it easy to handle large volumes of textual data, analyze them and give them meaning. Specifically they facilitate the application of linguistic principles to extract the context of entities and relationships, similar to what a human would; facilitate the automatic detection and extraction of entities such as name, place...; facilitate the use of custom classification rules to classify texts in content, sorted by relevance, and discover information. It is also to bring these historical data transactions or contacts, and understanding based on what customers have expressed on the web, what it is wrong or what they are interested in, to define communications, appropriate offers , or to identify customers, high-potential targets.
To go further on this subject you can usefully consult the link below: