What is Big data: collected all the most important things about big data. What is Big Data: characteristics, classification, examples of Big date and how social networks work

It was predicted that the total global volume of data created and replicated in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

More complex definition

Nevertheless` big data` involve more than just analyzing vast amounts of information. The problem is not that organizations create huge amounts of data, but that most of it is presented in a format that does not fit well with the traditional structured database format - it is web logs, videos, text documents, machine code, or, for example, geospatial data . All this is stored in many different repositories, sometimes even outside the organization. As a result, corporations can have access to a huge amount of their data and not have the necessary tools to establish relationships between these data and draw meaningful conclusions from them. Add to this the fact that data is now updated more and more often, and you get a situation in which traditional methods of information analysis cannot keep up with huge amounts of constantly updated data, which ultimately paves the way for technology. big data.

Best Definition

In essence, the concept big data involves working with information of a huge volume and diverse composition, very often updated and located in different sources in order to increase work efficiency, create new products and increase competitiveness. The consulting company Forrester puts it succinctly: ` big data bring together techniques and technologies that extract meaning from data at the extreme limit of practicality`.

How big is the difference between business intelligence and big data?

Craig Bathy, Chief Marketing Officer and Chief Technology Officer of Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business in a given period of time, while the processing speed big data allows you to make the analysis predictive, able to offer business recommendations for the future. Big data technologies also allow you to analyze more types of data than business intelligence tools, which makes it possible to focus not only on structured storage.

Matt Slocum from O "Reilly Radar believes that although big data and business intelligence have the same goal (finding answers to a question), they differ from each other in three aspects.

Big data is designed to process larger amounts of information than business intelligence, and this, of course, fits the traditional definition of big data.
Big data is designed to process faster and more rapidly changing information, which means deep exploration and interactivity. In some cases, the results are generated faster than the web page loads.
Big data is designed to handle unstructured data that we are only just beginning to explore how to use it after we have been able to collect and store it, and we need algorithms and dialogue to make it easier to find the trends contained within these arrays.

According to the Oracle Information Architecture: An Architect's Guide to Big Data white paper published by Oracle, we approach information differently when working with big data than when doing business analysis.

Working with big data is not like a typical business intelligence process, where simply adding together known values yields results: for example, adding bills paid together becomes sales for a year. When working with big data, the result is obtained in the process of cleaning them through sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model is built, on its basis the correctness of the hypothesis put forward is checked, and then the next one is put forward. This process requires the researcher to either interpret visual meanings or make interactive knowledge-based queries, or develop adaptive `machine learning` algorithms capable of producing the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big Data Analysis Techniques

There are many different methods for analyzing data arrays, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not claim to be complete, but it reflects the most popular approaches in various industries. At the same time, it should be understood that researchers continue to work on the creation of new methods and the improvement of existing ones. In addition, some of the techniques listed are not necessarily applicable exclusively to large data and can be successfully used for smaller arrays (for example, A / B testing, regression analysis). Of course, the more voluminous and diversifiable the array is analyzed, the more accurate and relevant data can be obtained at the output.

A/B testing. A technique in which a control sample is compared with others in turn. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing offer. big data allow to carry out a huge number of iterations and thus obtain a statistically significant result.

association rule learning. A set of techniques for identifying relationships, i.e. association rules between variables in large data arrays. Used in data mining.

classification. A set of techniques that allows you to predict consumer behavior in a particular market segment (purchase decisions, churn, consumption volume, etc.). Used in data mining.

cluster analysis. A statistical method for classifying objects into groups by identifying common features that are not known in advance. Used in data mining.

Crowdsourcing. A technique for collecting data from a large number of sources.

Data fusion and data integration. A set of techniques that allows you to analyze user comments social networks and compare with real-time sales results.

data mining. A set of techniques that allows you to determine the most susceptible categories of consumers for the promoted product or service, identify the characteristics of the most successful employees, and predict the behavioral model of consumers.

Ensemble learning. This method uses a lot of predictive models, which improves the quality of the predictions made.

Genetic algorithms. In this technique possible solutions represented as `chromosomes` that can combine and mutate. As in the process of natural evolution, the fittest individual survives.

machine learning. A direction in computer science (historically, the name `artificial intelligence` has been assigned to it), which aims to create self-learning algorithms based on the analysis of empirical data.

natural language processing (NLP). A set of natural language recognition techniques borrowed from computer science and linguistics.

network analysis. A set of techniques for analyzing links between nodes in networks. With regard to social networks, it allows you to analyze the relationship between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesigning complex systems and processes to improve one or more indicators. Helps in making strategic decisions, for example, the composition of the product line introduced to the market, conducting investment analysis, etc.

pattern recognition. A set of techniques with elements of self-learning for predicting the behavioral model of consumers.

predictive modeling. A set of techniques that allow you to create a mathematical model of a predetermined probable scenario for the development of events. For example, the analysis of the CRM-system database for possible conditions that will push subscribers to change providers.

regression. A set of statistical methods for identifying patterns between changes in a dependent variable and one or more independent variables. Often used for forecasting and predictions. Used in data mining.

sentiment analysis. The techniques for assessing consumer sentiment are based on human natural language recognition technologies. They allow you to isolate messages related to the subject of interest (for example, a consumer product) from the general information flow. Next, evaluate the polarity of the judgment (positive or negative), the degree of emotionality, and so on.

signal processing. A set of techniques borrowed from radio engineering, which aims to recognize a signal against a background of noise and its further analysis.

Spatial analysis. A set of techniques, partly borrowed from statistics, for analyzing spatial data - terrain topology, geographic coordinates, geometry of objects. source big data in this case geographic information systems (GIS) often act.

Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest on this list is Apache Hadoop, an open source software that has been tested as a data analyzer by most stock trackers over the past five years. As soon as Yahoo opened up the Hadoop code to the open source community, a whole new trend in the IT industry quickly emerged to create products based on Hadoop. Almost all modern analysis tools big data provide integration with Hadoop. Their developers are both startups and well-known global companies.

Markets for Big Data Management Solutions

Big Data Platforms (BDP, Big Data Platform) as a means of combating digital hording

Ability to analyze big data, colloquially called Big Data, is perceived as a boon, and unambiguously. But is it really so? What can the unbridled accumulation of data lead to? Most likely to the fact that domestic psychologists in relation to a person call pathological hoarding, syllogomania, or figuratively "Plyushkin's syndrome." In English, the vicious passion to collect everything is called hording (from the English hoard - “reserve”). According to the classification of mental illness, hording is classified as a mental disorder. In the digital age, digital (Digital Hoarding) is added to the traditional material chording, both individuals and entire enterprises and organizations () can suffer from it.

World and Russian market

Big data landscape - Main providers

Interest in collection, processing, management and analysis tools big data showed almost all the leading IT companies, which is quite natural. Firstly, they directly experience this phenomenon in their own business, and secondly, big data open up excellent opportunities for developing new market niches and attracting new customers.

A lot of startups have appeared on the market that do business on processing huge amounts of data. Some of them use ready-made cloud infrastructure provided by large players like Amazon.

Theory and practice of Big Data in industries

The history of development

2017

TmaxSoft forecast: the next "wave" of Big Data will require DBMS modernization

Businesses know that the huge amounts of data they accumulate contains important information about their business and customers. If the company can successfully apply this information, then it will have a significant advantage over its competitors, and it will be able to offer better products and services than theirs. However, many organizations still cannot effectively use big data due to the fact that their legacy IT infrastructure is unable to provide the necessary storage capacity, the data exchange processes, utilities and applications necessary to process and analyze large arrays of unstructured data to extract valuable information from them, TmaxSoft indicated.

In addition, increasing the processing power needed to analyze ever-increasing volumes of data can require significant investment in an organization's legacy IT infrastructure, as well as additional maintenance resources that could be used to develop new applications and services.

On February 5, 2015, the White House released a report discussing how companies are using " big data to set different prices for different buyers - a practice known as "price discrimination" or "differential pricing" (personalized pricing). The report describes the benefits of "big data" for both sellers and buyers, and concludes that many of the issues raised by the advent of big data and differential pricing can be addressed within existing anti-discrimination laws and regulations. protecting the rights of consumers.

The report notes that at this time, there is only anecdotal evidence of how companies are using big data in the context of individualized marketing and differentiated pricing. This information shows that sellers use pricing methods that can be divided into three categories:

studying the demand curve;
Steering and differentiated pricing based on demographics; and
target behavioral marketing (behavioral targeting - behavioral targeting) and individualized pricing.

Studying the demand curve: In order to understand demand and understand consumer behavior, marketers often conduct experiments in this area, during which customers are randomly assigned one of two possible price categories. “Technically, these experiments are a form of differential pricing because they result in different prices for customers, even if they are “non-discriminatory” in the sense that all customers have the same chance of “hitting” the higher price.”

Steering: This is the practice of presenting products to consumers based on their belonging to a certain demographic group. Yes, website computer company may offer the same laptop to different types of customers at different prices based on the information they provide about themselves (for example, depending on whether the user is a representative of government agencies, scientific or commercial institutions, or an individual) or from their geographic location (for example, determined by the computer's IP address).

Targeted Behavioral Marketing and Customized Pricing: In these cases, personal data of buyers is used for targeted advertising and individualized pricing of certain products. For example, online advertisers use data collected by advertising networks and third party cookies about user activity on the Internet to target their advertising materials. This approach, on the one hand, allows consumers to receive advertisements of goods and services of interest to them, but it may cause concern for those consumers who do not want certain types of their personal data (such as information about visiting websites linked to with medical and financial matters) met without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of individualized pricing in the online environment. The report speculates that this may be because methods are still being developed, or because companies are reluctant to adopt (or prefer to keep quiet about) individual pricing, possibly fearing a backlash from consumers.

The authors of the report believe that "for the individual consumer, the use of big data is undoubtedly associated with both potential returns and risks." While acknowledging that there are issues of transparency and discrimination when using big data, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also highlights the need for “ongoing scrutiny” when companies use confidential information in a non-transparent manner or in ways that are not covered by the existing regulatory framework.

This report is a continuation of the White House's efforts to study the use of "big data" and discriminatory pricing on the Internet, and the resulting consequences for American consumers. Previously, it was reported that working group The White House on Big Data released its report on the subject in May 2014. The Federal Trade Commission (FTC) also addressed these issues during its September 2014 workshop on discrimination in relation to the use of big data.

2014

Gartner demystifies Big Data

A fall 2014 policy brief from Gartner lists and debunks a number of common myths about Big Data among CIOs.

Everyone implements Big Data processing systems faster than us

Interest in Big Data technologies is at an all-time high, with 73% of organizations surveyed by Gartner analysts this year already investing in or planning to do so. But most of these initiatives are still in their very early stages, and only 13% of those surveyed have already implemented such solutions. The hardest part is figuring out how to monetize Big Data, deciding where to start. Many organizations get stuck in the pilot phase because they can't tie new technology to specific business processes.

We have so much data that there is no need to worry about small errors in it.

Some CIOs believe that small flaws in the data do not affect the overall results of analyzing huge volumes. When there is a lot of data, each error separately really affects the result less, analysts say, but the errors themselves become larger. In addition, most of the analyzed data is external, of unknown structure or origin, so the probability of errors increases. Thus, in the world of Big Data, quality is actually much more important.

Big Data technologies will eliminate the need for data integration

Big Data promises the ability to process data in its original format with automatic schema generation as it is read. It is believed that this will allow the analysis of information from the same sources using multiple data models. Many believe that this will also enable end users to interpret any set of data in their own way. In reality, most users often want the traditional out-of-the-box schema where the data is formatted appropriately and there is agreement on the level of information integrity and how it should relate to the use case.

Data warehouses do not make sense to use for complex analytics

Many information management system administrators believe that it makes no sense to spend time creating a data warehouse, given that complex analytical systems use new data types. In fact, many sophisticated analytics systems use information from a data warehouse. In other cases, new data types need to be additionally prepared for analysis in Big Data processing systems; decisions have to be made about the suitability of the data, the principles of aggregation, and the required level of quality - such preparation can take place outside the warehouse.

Data warehouses will be replaced by data lakes

In reality, vendors mislead customers by positioning data lakes as a replacement for storage or as critical elements of an analytical infrastructure. The underlying technologies of data lakes lack the maturity and breadth of functionality found in data warehouses. Therefore, leaders responsible for managing data should wait until the lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are satisfied with the result

Among the main advantages of big data, respondents named:

"search for new sources of income" (56%),
"improving customer experience" (51%),
"new products and services" (50%) and
"an influx of new customers and maintaining the loyalty of old ones" (47%).

When introducing new technologies, many companies have faced traditional problems. For 51%, the stumbling block was security, for 47% - the budget, for 41% - the lack of necessary personnel, and for 35% - difficulties in integrating with the existing system. Almost all surveyed companies (about 91%) plan to soon solve the problem with a shortage of staff and hire big data specialists.

Companies are optimistic about the future of big data technologies. 89% believe they will change business as much as the internet. 79% of respondents noted that companies that do not deal with big data will lose their competitive advantage.

However, the respondents disagreed on what exactly should be considered big data. 65% of respondents believe that these are “large data files”, 60% are sure that this is “advanced analytics and analysis”, and 50% that this is “data visualization tools”.

Madrid spends 14.7 million euros on big data management

In July 2014, it became known that Madrid would use big data technologies to manage urban infrastructure. The cost of the project is 14.7 million euros, and the solutions to be implemented will be based on technologies for analyzing and managing big data. With their help, the city administration will manage the work with each service provider and pay accordingly, depending on the level of services.

We are talking about contractors of the administration who monitor the condition of the streets, lighting, irrigation, green spaces, clean up the territory and remove, as well as process garbage. In the course of the project, 300 key performance indicators of city services have been developed for specially assigned inspectors, on the basis of which 1.5 thousand various checks and measurements will be carried out daily. In addition, the city will start using an innovative technological platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: The peak of fashion for Big Data

Without exception, all vendors in the data management market are currently developing technologies for Big Data management. This new technological trend is also being actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, the wave of discussion around " big data"exceeded all conceivable dimensions. After analyzing the number of mentions of Big Data in social networks, Datashift calculated that in 2012 this term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3070 mentions per hour.

Gartner: Every second CIO is ready to spend money on Big data

After several years of experiments with Big data technologies and the first implementations in 2013, the adaptation of such solutions will increase significantly, Gartner predicts. The researchers surveyed IT leaders around the world and found that 42% of those surveyed have already invested in Big data technologies or are planning to make such investments over the next year (data as of March 2013).

Companies are forced to spend money on processing technologies big data As the information landscape is rapidly changing, I require new approaches to information processing. Many companies have already realized that big data is critical, and working with it allows you to achieve benefits that are not available using traditional sources of information and methods of processing it. In addition, the constant exaggeration of the topic of "big data" in the media fuels interest in relevant technologies.

Frank Buytendijk, vice president of Gartner, even urged companies to tone down as some are worried that they are lagging behind competitors in mastering big data.

“There is no need to worry, the possibilities for realizing ideas based on big data technologies are virtually limitless,” he said.

Gartner predicts that by 2015, 20% of the Global 1000 companies will have a strategic focus on "information infrastructure."

In anticipation of the new opportunities that big data processing technologies will bring, many organizations are already organizing the process of collecting and storing various kinds of information.

For educational and government organizations, as well as companies in the industry, the greatest potential for business transformation lies in the combination of accumulated data with the so-called dark data (literally - “dark data”), the latter includes messages Email, multimedia and other similar content. According to Gartner, those who learn how to deal with a wide variety of information sources will win the data race.

Poll Cisco: Big Data will help increase IT budgets

The Cisco Connected World Technology Report (Spring 2013) conducted in 18 countries by independent analyst firm InsightExpress surveyed 1,800 college students and an equal number of young professionals aged 18 to 30. The survey was conducted to find out the level of readiness of IT departments for the implementation of projects big data and gain an understanding of the associated challenges, technological flaws, and strategic value of such projects.

Most companies collect, record and analyze data. However, according to the report, many companies face a range of complex business and information technology challenges in connection with Big Data. For example, 60 percent of those surveyed acknowledge that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent said that they are already getting real strategic benefits from the accumulated information.

More than half of the CIOs surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased demands on technology, staff and professional skills. At the same time more than a half of respondents expect that such projects will increase IT budgets in their companies already in 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects will require the use of cloud computing. Thus, the spread of cloud technologies can affect the speed of distribution of Big Data solutions and the value of these solutions for business.

Companies collect and use data of various types, both structured and unstructured. Here are the sources from which survey participants receive data (Cisco Connected World Technology Report):

Nearly half (48 percent) of CIOs predict that the load on their networks will double over the next two years. (This is especially true in China, where 68 percent of those surveyed hold this point of view, and in Germany, 60 percent.) 23 percent of respondents expect network traffic to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for an explosive growth in network traffic.

27 percent of those surveyed admitted that they need better IT policies and information security measures.

21 percent need more bandwidth.

Big Data opens up new opportunities for IT departments to add value and form close relationships with business units to increase revenue and strengthen the financial position of the company. Big Data projects make IT departments a strategic partner of business departments.

According to 73 percent of respondents, it is the IT department that will become the main engine for implementing the Big Data strategy. At the same time, respondents believe that other departments will also be involved in the implementation of this strategy. First of all, this concerns the departments of finance (named by 24 percent of respondents), research and development (20 percent), operations (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Millions of new jobs needed to manage big data

Global IT spending will reach $3.7 billion by 2013, up 3.8% from IT spending in 2012 (year-end forecast is $3.6 billion). Segment big data(big data) will evolve at a much faster pace, according to a Gartner report.

By 2015, 4.4 million jobs in the field information technologies will be created to serve big data, of which 1.9 million jobs are in . What's more, each such job will generate three additional non-IT jobs, so that in the US alone, 6 million people will be working to support the information economy over the next four years.

According to Gartner experts, the main problem is that there is not enough talent in the industry for this: both private and public educational systems, for example, in the United States, are not able to supply the industry with a sufficient number of qualified personnel. So of the mentioned new jobs in IT, only one out of three will be provided with personnel.

Analysts believe that the role of cultivating qualified IT personnel should be taken directly by companies that are in dire need of them, as such employees will become a pass for them into the new information economy of the future.

2012

First skepticism about Big Data

Analysts from Ovum and Gartner suggest that for a trendy topic in 2012 big data it may be time to let go of illusions.

The term "Big Data" at this time usually refers to the ever-growing volume of information coming online from social media, sensor networks and other sources, as well as the growing range of tools used to process data and identify important business from it. -trends.

“Because of (or in spite of) the hype surrounding the idea of big data, manufacturers in 2012 looked at this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer said that DataSift conducted a retrospective analysis of big data references in

In the Russian-speaking environment it is used as a term big data and the concept of "big data". The term "big data" is a tracing of an English term. Big data does not have a strict definition. It is impossible to draw a clear boundary - is it 10 terabytes or 10 megabytes? The name itself is very subjective. The word "big" is like "one, two, many" among primitive tribes.

However, there is an established opinion that big data is a set of technologies that are designed to perform three operations. First, to process larger amounts of data compared to "standard" scenarios. Secondly, to be able to work with fast incoming data in very large volumes. That is, there is not just a lot of data, but there is constantly more and more of them. Thirdly, they must be able to work with structured and poorly structured data in parallel in different aspects. Big data assumes that algorithms receive a stream of information that is not always structured and that more than one idea can be extracted from it.

A typical example of big data is information coming from various physical experimental facilities - for example, from , which produces a huge amount of data and does it all the time. The installation continuously produces large amounts of data, and scientists use them to solve many problems in parallel.

The emergence of big data in the public space was due to the fact that these data affected almost all people, and not just the scientific community, where such problems have been solved for a long time. Into the public realm of technology big data came out when it began to talk about a very specific number - the number of inhabitants of the planet. 7 billion gathering in social networks and other projects that aggregate people. YouTube, Facebook, In contact with, where the number of people is measured in billions, and the number of operations that they perform at the same time is huge. The data flow in this case is user actions. For example, the data of the same hosting YouTube, which flow over the network in both directions. Processing means not only interpretation, but also the ability to correctly process each of these actions, that is, put it in the right place and make this data available to each user quickly, since social networks do not tolerate waiting.

Much of what concerns big data, the approaches that are used to analyze it, in fact, has been around for a long time. For example, the processing of images from surveillance cameras, when we are talking not about one picture, but about a data stream. Or navigation robots. All this has existed for decades, just now the tasks of data processing have affected a much larger number of people and ideas.

Many developers are accustomed to working with static objects and thinking in terms of states. In big data, the paradigm is different. You have to be able to work with an unceasing stream of data, and this is an interesting task. It affects more and more areas.

In our lives, more and more hardware and software are starting to generate a large amount of data - for example, the "Internet of Things".

Things are already generating huge flows of information. The Potok police system sends information from all cameras and allows you to find cars using this data. More and more fitness bracelets, GPS trackers and other things that serve the tasks of a person and business are coming into fashion.

The Moscow Department of Informatization is recruiting a large number of data analysts, because there are a lot of statistics on people and it is multi-criteria (that is, statistics are collected about each person, about each group of people according to a very large number of criteria). It is necessary to find regularities and tendencies in these data. For such tasks, mathematicians with IT education are needed. Because in the end, the data is stored in structured DBMS, and you need to be able to access it and get information.

Previously, we did not consider big data as a task for the simple reason that there was no place to store it and there were no networks to transmit it. When these opportunities appeared, the data immediately filled the entire volume provided to them. But no matter how much bandwidth and data storage capacity are expanded, there will always be sources, for example, physics experiments, experiments on modeling wing streamlining, that will produce more information than we can convey. According to Moore's law, the performance of modern parallel computing systems is steadily increasing, and so are the speeds of data transmission networks. However, data must be able to quickly save and retrieve from the media ( hard drive and other types of memory), and this is another challenge in big data processing.

The term "Big Data" may be recognizable today, but there is still quite a bit of confusion around it as to what it actually means. In truth, the concept is constantly evolving and being redefined as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science, and the Internet of Things. But what is Big-Data technology and how is it changing our world? Let's try to understand the essence of Big Data technology and what it means in simple words.

The Amazing Growth of Big Data

It all started with an “explosion” in the amount of data we have created since the dawn of the digital age. This is largely due to the development of computers, the Internet and technologies that can "snatch" data from the world around us. Data by itself is not a new invention. Even before the era of computers and databases, we used paper transaction records, client records, and archive files, which are data. Computers, especially spreadsheets and databases, have made it easy for us to store and organize data on a large scale. All of a sudden, information is available at the click of a mouse.

However, we have come a long way from the original tables and databases. Today, every two days we create as much data as we received from the very beginning until the year 2000. That's right, every two days. And the amount of data we create continues to skyrocket; by 2020, the amount of available digital information will increase from about 5 zettabytes to 20 zettabytes.

Nowadays, almost every action we take leaves its mark. We generate data whenever we access the Internet, when we carry our smartphones equipped with a search engine, when we talk with our acquaintances through social networks or chats, etc. In addition, the amount of machine-generated data is also growing rapidly. Data is generated and shared when our smart home devices communicate with each other or with their home servers. Industrial equipment in plants and factories is increasingly equipped with sensors that accumulate and transmit data.

The term "Big Data" refers to the collection of all this data and our ability to use it to our advantage in a wide range of areas, including business.

How does Big Data technology work?

Big Data works on the principle: the more you know about a particular subject or phenomenon, the more reliably you can achieve a new understanding and predict what will happen in the future. By comparing more data points, relationships that were previously hidden emerge, and these relationships allow us to learn and make better decisions. This is most often done through a process that involves building models from the data we can collect and then running a simulation that tweaks the values of the data points each time and sees how they affect our results. This process is automated - modern analytics technologies will run millions of these simulations, tweaking every possible variable until they find a model - or idea - that helps solve the problem they are working on.

Bill Gates hangs over the paper contents of one CD

Until recently, data was limited to spreadsheets or databases - and everything was very organized and tidy. Anything that could not be easily organized into rows and columns was considered too complex to work with and was ignored. However, progress in storage and analytics means that we can capture, store and process a large amount of data. various types. As a result, "data" today can mean anything from databases to photographs, videos, sound recordings, written texts, and sensor data.

To understand all this messy data, projects based on Big Data often use cutting-edge analytics, using artificial intelligence and machine learning. By teaching computers to determine what particular data is—for example, through pattern recognition or natural language processing—we can teach them to identify patterns much faster and more reliably than we can.

How is Big Data used?

This ever-increasing flow of information about sensor data, text, voice, photo and video data means that we can now use data in ways that were unimaginable just a few years ago. This brings revolutionary changes to the business world in almost every industry. Companies today can predict, with incredible accuracy, which specific categories of customers will want to make an acquisition, and when. Big Data also helps companies perform their activities much more efficiently.

Even outside of business, Big Data projects are already helping to change our world in a variety of ways:

Improving healthcare – Data-driven medicine is able to analyze vast amounts of medical information and images for models that can help detect disease at an early stage and develop new drugs.
Predicting and responding to natural and man-made disasters. Sensor data can be analyzed to predict where earthquakes might occur, and human behavior patterns provide clues that help organizations provide assistance to survivors. Big Data technology is also being used to track and protect the flow of refugees from war zones around the world.
Preventing crime. Police forces are increasingly using data-driven strategies that include their own intelligence and information from open access to make better use of resources and to take countermeasures where necessary.

The best books about Big-Data technology

Everybody lies. Search engines, Big Data and the Internet know everything about you.
BIG DATA. All technology in one book.
happiness industry. How Big Data and new technologies help to add emotion to goods and services.
A revolution in analytics. How to improve your business with operational analytics in the era of Big Data.

Problems with Big Data

Big Data gives us unprecedented insights and opportunities, but it also raises issues and questions that need to be addressed:

Data Privacy – The Big Data we generate today contains a lot of information about our personal lives that we have every right to keep private. More and more often, we are asked to strike a balance between the amount of personal data we disclose and the convenience that applications and services based on the use of Big Data offer.
Data Protection - Even if we think we're fine with someone having our data for a specific purpose, can we trust them to keep our data safe and secure?
Discrimination of data - when all the information is known, will it be acceptable to discriminate against people based on data from their personal lives? We already use credit scores to decide who can borrow money, and insurance is heavily data-driven too. We should expect to be analyzed and evaluated in more detail, but care should be taken that this does not complicate the lives of those people who have fewer resources and limited access to information.

Accomplishing these tasks is an important part of Big Data, and they need to be addressed by organizations that want to use such data. Failure to do so can leave a business vulnerable, not only in terms of its reputation, but also legally and financially.

Looking to the future

Data is changing our world and our lives at an unprecedented pace. If Big Data is capable of all this today, just imagine what it will be capable of tomorrow. The amount of data available to us will only increase, and analytics technology will become even more advanced.

For businesses, the ability to apply Big Data will become increasingly critical in the coming years. Only those companies that view data as a strategic asset will survive and thrive. Those who ignore this revolution risk being left behind.

According to research&trends

Big Data, "Big Data" has become the talk of the town in the IT and marketing press for several years now. And it is clear: digital technologies have permeated life modern man, "everything is written." The volume of data on various aspects of life is growing, and at the same time, the possibilities of storing information are growing.

Global technologies for information storage

Source: Hilbert and Lopez, `The world's technological capacity to store, communicate, and compute information,` Science, 2011 Global.

Most experts agree that accelerating data growth is an objective reality. Social networks, mobile devices, data from measuring devices, business information are just a few types of sources that can generate huge amounts of information. According to research IDCDigital universe, published in 2012, the next 8 years the amount of data in the world will reach 40 Zb (zettabytes), which is equivalent to 5200 GB per inhabitant of the planet.

Growth of collected digital information in the USA

Source: IDC

A significant part of the information is not created by people, but by robots interacting both with each other and with other data networks, such as, for example, sensors and smart devices. At this rate of growth, the amount of data in the world, according to researchers, will double every year. The number of virtual and physical servers in the world will grow tenfold due to the expansion and creation of new data centers. As a result, there is a growing need for efficient use and monetization of this data. Since the use of Big Data in business requires considerable investment, it is necessary to clearly understand the situation. And it is, in essence, simple: you can increase business efficiency by reducing costs and/or increasing sales.

What is Big Data for?

The Big Data paradigm defines three main types of tasks.

Storing and managing hundreds of terabytes or petabytes of data that conventional relational databases cannot efficiently use.
Organization of unstructured information consisting of texts, images, videos and other types of data.
Big Data analysis, which raises the question of how to work with unstructured information, the generation of analytical reports, and the implementation of predictive models.

The Big Data project market intersects with the business intelligence (BA) market, the volume of which in the world, according to experts, in 2012 amounted to about 100 billion dollars. It includes components of network technologies, servers, software and technical services.

Also, the use of Big Data technologies is relevant for income assurance (RA) class solutions designed to automate the activities of companies. Modern income guarantee systems include tools for detecting inconsistencies and in-depth data analysis that allow timely detection of possible losses or distortion of information that can lead to lower financial results. Against this background, Russian companies, confirming the demand for Big Data technologies in the domestic market, note that the factors that stimulate the development of Big Data in Russia are the growth of data, the acceleration of managerial decision-making and the improvement of their quality.

What prevents working with Big Data

Today, only 0.5% of the accumulated digital data is analyzed, despite the fact that objectively there are industry-wide tasks that could be solved using analytical solutions of the Big Data class. Developed IT markets already have results that can be used to evaluate the expectations associated with the accumulation and processing of big data.

One of the main factors that slows down the implementation of Big Data projects, in addition to high cost, is the problem of choosing the data to be processed: that is, the definition of what data should be extracted, stored and analyzed, and which should not be taken into account.

Many business representatives note that the difficulties in implementing Big Data projects are associated with a lack of specialists - marketers and analysts. The rate of return on investment in Big Data directly depends on the quality of work of employees involved in deep and predictive analytics. The huge potential of data that already exists in an organization often cannot be effectively used by marketers themselves due to outdated business processes or internal regulations. Therefore, Big Data projects are often perceived by businesses as difficult not only in implementation, but also in evaluating the results: the value of the collected data. The specifics of working with data requires marketers and analysts to shift their attention from technology and reporting to solving specific business problems.

Due to the large volume and high speed of data flow, the data collection process involves real-time ETL procedures. For reference:ETL - fromEnglishExtract, Transform, load- literally "extraction, transformation, loading") - one of the main processes in management data warehouses, which includes: extracting data from external sources, their transformation and cleaning to meet needs ETL should be viewed not only as a process of transferring data from one application to another, but also as a tool for preparing data for analysis.

And then the issues of ensuring the security of data coming from external sources should have solutions that correspond to the volume of information collected. Since Big Data analysis methods are developing so far only after the growth of the volume of data, the ability of analytical platforms to use new methods of preparing and aggregating data plays an important role. This suggests that, for example, data about potential buyers or a massive data warehouse with a history of clicks on online store sites can be interesting for solving various problems.

Difficulties do not stop

Despite all the difficulties with the implementation of Big Data, the business intends to increase investments in this area. According to Gartner data, in 2013, 64% of the world's largest companies have already invested or have plans to invest in deploying Big Data technologies for their business, while in 2012 there were 58% of such companies. According to a Gartner study, the leaders of industries investing in Big Data are media companies, telecoms, the banking sector and service companies. Successful results of Big Data implementation have already been achieved by many major players in the field retail regarding the use of data obtained using RFID tools, logistics and relocation systems (from the English. replenishment- accumulation, replenishment - R&T), as well as from loyalty programs. Successful retail experience stimulates other market sectors to find new effective ways to monetize big data in order to turn their analysis into a resource that works for business development. Thanks to this, according to experts, in the period up to 2020, investments in management and storage will decrease for each gigabyte of data from $2 to $0.2, but for the study and analysis of the technological properties of Big Data will grow by only 40%.

The costs presented in various investment projects in the field of Big Data are of a different nature. Cost items depend on the types of products that are selected based on certain decisions. The largest part of the costs in investment projects, according to experts, falls on products related to the collection, structuring of data, cleaning and information management.

How it's done

There are many combinations of software and hardware that allow you to create effective Big Data solutions for various business disciplines: from social media and mobile applications, to business data mining and visualization. An important advantage of Big Data is the compatibility of new tools with databases widely used in business, which is especially important when working with cross-disciplinary projects, such as organizing multi-channel sales and customer support.

The sequence of working with Big Data consists of collecting data, structuring the information received using reports and dashboards (dashboard), creating insights and contexts, and formulating recommendations for action. Since working with Big Data implies high costs for collecting data, the result of processing of which is not known in advance, the main task is to clearly understand what the data is for, and not how much of it is available. In this case, data collection turns into a process of obtaining information that is extremely necessary for solving specific problems.

For example, telecommunications providers aggregate a huge amount of data, including geolocation, which is constantly updated. This information may be of commercial interest to advertising agencies, which may use it to serve targeted and localized advertising, as well as to retailers and banks. Such data can play an important role in deciding whether to open a retail outlet in a particular location based on data on the presence of a powerful targeted flow of people. There is an example of measuring the effectiveness of advertising on outdoor billboards in London. Now the coverage of such advertising can only be measured by placing people near advertising structures with a special device that counts passers-by. Compared to this type of measurement of advertising effectiveness, mobile operator much more opportunities - he knows exactly the location of his subscribers, he knows their demographic characteristics, gender, age, marital status, etc.

Based on such data, in the future, the prospect opens up to change the content of the advertising message, using the preferences of a particular person passing by the billboard. If the data shows that the person passing by travels a lot, then they can be shown an ad for the resort. The organizers of a football match can only estimate the number of fans when they come to the match. But if they had the opportunity to ask the operator cellular communication information about where visitors were an hour, a day or a month before the match, this would give the organizers the opportunity to plan places to advertise the next matches.

Another example is how banks can use Big Data to prevent fraud. If the client reports the loss of the card, and when making a purchase using it, the bank sees in real time the location of the client’s phone in the purchase area where the transaction takes place, the bank can check the information on the client’s statement whether he tried to deceive him. Or the opposite situation, when a client makes a purchase in a store, the bank sees that the card on which the transaction takes place and the client’s phone are in the same place, the bank can conclude that the owner of the card is using it. Thanks to these advantages of Big Data, the boundaries that traditional data warehouses are endowed with are expanding.

For a successful decision to implement Big Data solutions, a company needs to calculate an investment case, and this causes great difficulties due to many unknown components. The paradox of analytics in such cases is to predict the future based on the past, information about which is often missing. In this case, an important factor is the clear planning of your initial actions:

Firstly, it is necessary to determine one specific business problem, for which Big Data technologies will be used, this task will become the core of determining the correctness of the chosen concept. You need to focus on collecting data related to this particular task, and during the proof of concept, you will be able to use various tools, processes and management methods that will allow you to make more informed decisions in the future.
Secondly, it is unlikely that a company without the skills and experience of data analytics will be able to successfully implement a Big Data project. The necessary knowledge always comes from previous experience in analytics, which is the main factor affecting the quality of work with data. An important role is played by the culture of using data, since often the analysis of information reveals the harsh truth about the business, and in order to accept this truth and work with it, developed methods for working with data are needed.
Thirdly, the value of Big Data technologies lies in providing insights. Good analysts remain in short supply in the market. They are called specialists who have a deep understanding of the commercial meaning of the data and know how to apply them correctly. Data analysis is a means to achieve business goals, and in order to understand the value of Big Data, you need an appropriate behavior model and an understanding of your actions. In this case, big data will give a lot useful information about consumers, on the basis of which you can make useful business decisions.

Despite the fact that the Russian Big Data market is just beginning to take shape, some projects in this area are already being implemented quite successfully. Some of them are successful in the field of data collection, such as projects for the Federal Tax Service and Tinkoff Credit Systems, others in terms of data analysis and practical application of its results: this is the Synqera project.

Tinkoff Credit Systems Bank implemented a project to implement the EMC2 Greenplum platform, which is a tool for massively parallel computing. In recent years, the bank has increased its requirements for the speed of processing accumulated information and real-time data analysis, caused by the high growth rate in the number of credit card users. The Bank announced plans to expand the use of Big Data technologies, in particular for processing unstructured data and working with corporate information obtained from various sources.

The Federal Tax Service of Russia is currently creating an analytical layer of the federal data warehouse. On its basis, a single information space and technology for accessing tax data for statistical and analytical processing is being created. During the implementation of the project, work is being carried out to centralize analytical information with more than 1200 sources of the local level of the Federal Tax Service.

Another interesting example of real-time big data analysis is the Russian startup Synqera, which developed the Simplate platform. The solution is based on the processing of large data arrays, the program analyzes information about customers, their purchase history, age, gender and even mood. At the checkouts in the network of cosmetic stores were installed touch screens with sensors that recognize the emotions of customers. The program determines the mood of a person, analyzes information about him, determines the time of day and scans the discount database of the store, after which it sends targeted messages to the buyer about promotions and special offers. This solution improves customer loyalty and increases retailer sales.

If we talk about foreign successful cases, then in this regard, the experience of using Big Data technologies at Dunkin` Donuts, which uses real-time data to sell products, is interesting. Digital displays in stores display offers that change every minute, depending on the time of day and product availability. According to cash receipts, the company receives data on which offers received the greatest response from buyers. This data processing approach allowed to increase profits and turnover of goods in the warehouse.

As the experience of implementing Big Data projects shows, this area is designed to successfully solve modern business problems. At the same time, an important factor in achieving commercial goals when working with big data is choosing the right strategy, which includes analytics that identifies consumer needs, as well as the use of innovative technologies in the field of Big Data.

According to a global survey conducted annually by Econsultancy and Adobe since 2012 among marketers of companies, “big data”, which characterizes the actions of people on the Internet, can do a lot. They are able to optimize offline business processes, help understand how mobile device owners use them to search for information, or simply “make marketing better”, i.e. more efficient. Moreover, the last function is becoming more popular from year to year, as follows from our diagram.

The main areas of work of Internet marketers in terms of customer relations

Source: Econsultancy and Adobe, publishedemarketer.com

Note that the nationality of the respondents does not matter much. According to a survey conducted by KPMG in 2013, the proportion of "optimists", i.e. of those who use Big Data when developing a business strategy is 56%, and the fluctuations from region to region are small: from 63% in North American countries to 50% in EMEA.

Use of Big Data in various regions of the world

Source: KPMG, publishedemarketer.com

Meanwhile, the attitude of marketers to such “fashion trends” is somewhat reminiscent of a well-known anecdote:

Tell me, Vano, do you like tomatoes?
- I like to eat, but I don't.

Despite the fact that marketers say they “love” Big Data and even seem to use it, in fact, “everything is complicated,” as they write about their heartfelt attachments in social networks.

According to a survey conducted by Circle Research in January 2014 among European marketers, 4 out of 5 respondents do not use Big Data (despite the fact that they, of course, “love” it). The reasons are different. There are few inveterate skeptics - 17% and exactly the same number as their antipodes, i.e. those who confidently answer "Yes". The rest are hesitating and doubting, the “swamp”. They evade a direct answer under plausible excuses like "not yet, but soon" or "we'll wait for the others to start."

Use of Big Data by marketers, Europe, January 2014

Source:dnx, published -emarketer.com

What confuses them? Sheer nonsense. Some (exactly half of them) simply do not believe this data. Others (there are also quite a few of them - 55%) find it difficult to correlate the sets of "data" and "users" among themselves. Someone just (let's put it politically correct) has an internal corporate mess: the data is ownerlessly walking between marketing departments and IT structures. For others, the software cannot cope with the influx of work. And so on. Since the total shares are well above 100%, it is clear that the situation of "multiple barriers" is not uncommon.

Barriers preventing the use of Big Data in marketing

Source:dnx, published -emarketer.com

Thus, we have to state that so far "Big Data" is a great potential that still needs to be used. By the way, this may be the reason why Big Data is losing its “fashion trend” halo, as evidenced by the survey data conducted by the Econsultancy company we have already mentioned.

The most significant trends in digital marketing 2013-2014

Source: Consultancy and Adobe

They are being replaced by another king - content marketing. How long?

It cannot be said that Big Data is some fundamentally new phenomenon. Big data sources have been around for years: databases of customer purchases, credit histories, lifestyles. And for years, scientists have used this data to help companies assess risk and predict future customer needs. However, today the situation has changed in two aspects:

More sophisticated tools and methods have emerged to analyze and combine different datasets;

These analytical tools are complemented by an avalanche of new data sources driven by the digitization of virtually every data collection and measurement method.

The range of information available is both inspiring and intimidating for researchers who grew up in a structured research environment. Consumer sentiment is captured by websites and all sorts of social media. The fact of viewing ads is recorded not only by set-top boxes, but also with the help of digital tags and mobile devices communicating with TV.

Behavioral data (such as number of calls, shopping habits and purchases) is now available in real time. Thus, much of what could previously be learned through research can now be learned through big data sources. And all these information assets are constantly being generated, regardless of any research processes. These changes make us wonder if big data can replace classical market research.

It's not about the data, it's about questions and answers

Before ordering a death knell for classical research, we must remind ourselves that it is not the presence of one data asset or another, but something else that is decisive. What exactly? Our ability to answer questions, that's what. A funny thing about the new world of big data is that results from new data assets lead to even more questions, and those questions tend to be best answered by traditional research. Thus, as big data grows, we see a parallel increase in the availability and demand for “small data” that can provide answers to questions from the world of big data.

Let's consider a situation: a large advertiser constantly monitors traffic in stores and sales volumes in real time. Existing research methodologies (in which we ask participants in research panels about their buying motivations and behavior at the point of sale) help us better target specific customer segments. These methodologies can be expanded to include a wider range of big data assets, to the point where big data becomes a passive observation tool and research a method of ongoing, narrowly focused investigation of changes or events that need to be studied. This is how big data can free research from unnecessary routine. Primary research should no longer focus on what's going on (big data will). Instead, primary research can focus on explaining why we see certain trends or deviations from trends. The researcher will be able to think less about getting data and more about how to analyze and use it.

At the same time, we see that big data is solving one of our biggest problems, the problem of overly long studies. Examining the studies themselves has shown that overly bloated research tools have a negative impact on data quality. Although many experts acknowledged this problem for a long time, they invariably responded with the phrase: “But I need this information for senior management,” and long interviews continued.

In the world of big data, where quantitative indicators can be obtained through passive observation, this issue becomes moot. Again, let's think back to all of this consumption research. If big data gives us insights about consumption through passive observation, then primary research in the form of surveys no longer needs to collect this kind of information, and we can finally back up our vision of short surveys not only with good wishes, but also with something real.

Big Data needs your help

Finally, "big" is just one of the characteristics of big data. The characteristic "large" refers to the size and scale of the data. Of course, this is the main characteristic, since the volume of this data is beyond the scope of everything that we have worked with before. But other characteristics of these new data streams are also important: they are often poorly formatted, unstructured (or, at best, partially structured), and full of uncertainty. The emerging field of data management, aptly named "entity analytics", aims to solve the problem of overcoming noise in big data. Its task is to analyze these datasets and find out how many observations are for the same person, which observations are current, and which of them are usable.

This kind of data cleansing is necessary to remove noise or erroneous data when working with big or small data assets, but it is not enough. We also need to create context around big data assets based on our previous experience, analytics and category knowledge. In fact, many analysts point to the ability to manage the uncertainty inherent in big data as a source of competitive advantage, as it enables better decision making.

And this is where primary research is not only freed from routine thanks to big data, but also contributes to content creation and analysis within big data.

A prime example of this is the application of our brand new brand equity framework to social media. (we are talking about the one developed inMillward Browna new approach to measuring brand valueThe Meaningfully Different Framework- "The paradigm of significant differences" -R & T ). This model is behavior-tested within specific markets, implemented on a standard basis, and can be easily applied to other marketing directions and information systems for decision support. In other words, our brand equity model, which is based on survey research (though not only survey research), has all the properties needed to overcome the unstructured, disconnected, and uncertain nature of big data.

Consider consumer sentiment data provided by social media. In its raw form, peaks and valleys in consumer sentiment are very often minimally correlated with offline measures of brand equity and behavior: there is simply too much noise in the data. But we can reduce this noise by applying our models of consumer meaning, brand differentiation, dynamics, and identity to raw consumer sentiment data, which is a way of processing and aggregating social media data along these dimensions.

Once the data is organized according to our framework model, the identified trends usually match the brand equity and behavior measurements obtained offline. In fact, social media data cannot speak for itself. To use them for this purpose requires our experience and models built around brands. When social media gives us unique information expressed in the language that consumers use to describe brands, we must use that language when creating our research to make primary research much more effective.

Benefits of Exempt Studies

This brings us back to the fact that big data is not so much replacing research as it is freeing it up. Researchers will be relieved of having to create a new study for each new case. The ever-growing big data assets can be used for different research topics, allowing subsequent primary research to delve deeper into the topic and fill in the gaps. Researchers will be freed from having to rely on overly inflated surveys. Instead, they will be able to use short surveys and focus on the most important parameters, which improves the quality of the data.

With this release, researchers will be able to use their established principles and insights to add precision and meaning to big data assets, leading to new areas for survey research. This cycle should lead to a deeper understanding on a range of strategic issues and ultimately a move towards what should always be our primary goal of informing and improving the quality of brand and communications decisions.

(literally - big data)? Let's look at the Oxford dictionary first:

Data- quantities, characters or symbols operated by a computer and which can be stored and transmitted in the form of electrical signals, recorded on magnetic, optical or mechanical media.

Term big data used to describe a large and exponentially growing data set. To process this amount of data, machine learning is indispensable.

Benefits provided by Big Data:

Collection of data from various sources.
Improve business processes through real-time analytics.
Storage of huge amount of data.
Insights. Big Data is more insightful to hidden information through structured and semi-structured data.
Big data helps reduce risk and make smarter decisions with the right risk analytics

Examples of Big Data

New York Stock Exchange daily generates 1 terabyte trading data for the previous session.

Social media: statistics show that Facebook databases are uploaded daily 500 terabytes new data are generated mainly due to uploading photos and videos to social network servers, messaging, comments under posts, and so on.

Jet engine generates 10 terabytes data every 30 minutes during flight. Since thousands of flights are made every day, the volume of data reaches petabytes.

Big Data classification

Big data forms:

Structured
Unstructured
semi-structured

Structured form

Data that can be stored, accessed, and processed in a fixed format is called structured data. Over a long period of time, computer science has made great strides in improving techniques for working with this type of data (where the format is known in advance) and has learned to reap the benefits. However, already today there are problems associated with the growth of volumes to sizes measured in the range of several zettabytes.

1 zettabyte equals one billion terabytes

Looking at these numbers, it is easy to be convinced of the veracity of the term Big Data and the difficulties associated with the processing and storage of such data.

Data stored in a relational database is structured and looks like, for example, tables of company employees

unstructured form

Data of unknown structure is classified as unstructured. In addition to its large size, this form is characterized by a number of difficulties in processing and extracting useful information. A typical example of unstructured data is a heterogeneous source containing a combination of simple text files, pictures, and videos. Organizations today have access to a large amount of raw or unstructured data, but do not know how to make use of it.

semi-structured form

This category contains both of the above, so semi-structured data has some form, but is not really defined by tables in relational databases. An example of this category is personal data presented in an XML file.

Prashant RaoMale35 Seema R.Female41 satish maneMale29 Subrato RoyMale26 Jeremiah J.Male35

Characteristics of Big Data

Growth of Big Data over time:

Blue color represents structured data (Enterprise data), which is stored in relational databases. In other colors - unstructured data from different sources (IP-telephony, devices and sensors, social networks and web applications).

According to Gartner, big data varies in size, generation rate, variety, and volatility. Let's consider these characteristics in more detail.

Volume. By itself, the term Big Data is associated with large size. The size of the data is the most important indicator in determining the possible recoverable value. Every day, 6 million people use digital media, which generates an estimated 2.5 quintillion bytes of data. Therefore, volume is the first characteristic to consider.
Diversity is the next aspect. He refers to heterogeneous sources and the nature of data, which can be either structured or unstructured. Previously, spreadsheets and databases were the only sources of information considered in most applications. Today the data is in the form emails, photo, video, PDF files, audio is also considered in analytical applications. Such a variety of unstructured data leads to problems in storage, mining and analysis: 27% of companies are not sure that they are working with the right data.
Generation rate. How quickly data is accumulated and processed to meet requirements determines potential. Speed determines the speed of information flow from sources - business processes, application logs, social networking and media sites, sensors, mobile devices. The flow of data is huge and continuous in time.
Variability describes the variability of data at some points in time, which complicates processing and management. For example, most data is unstructured in nature.

Big Data analytics: what is the use of big data

Promotion of goods and services: Access to data from search engines and sites such as Facebook and Twitter allows businesses to develop marketing strategies more accurately.

Customer Service Improvement: traditional systems feedback with customers are replaced with new ones, in which Big Data and natural language processing are used to read and evaluate customer feedback.

Risk calculation associated with the release of a new product or service.

Operational efficiency: big data is structured to extract faster necessary information and deliver accurate results quickly. This combination of Big Data technologies and storage helps organizations optimize the work with rarely used information.