Data Warehouse And Its Applications In Agriculture

by admin on September 12, 2011

data visualization
by hsingy

Data Warehouse And Its Applications In Agriculture

 

DATA WAREHOUSE AND ITS APPLICATIONS IN AGRICULTURE

K.P.Wagh                                                                               Dr. Satish R. Kolhe                

Assistant Professor                                                                 Reader                                    

Gf’s GCOE Jalgaon                                                                NMU Jalgaon                         

Kishorwagh2000@yahoo.com                                               srkolhe2000@gmail.com       

 

A Data warehouse is a repository of integrated information, available for queries and analysis.  Data and information are extracted from heterogeneous sources as they are generated.  This makes it much easier and more efficient to run queries over data that originally came from different sources. In other words Data warehouse is a database that is used to hold data for reporting and analysis. 

 Economic foundation and productivity growth depends on agricultural sectors. Agriculture is the driving force behind the way of live and source of earnings for the majority of peoples. More than 60 percents of population are living in rural areas and the majority are farmers. The rural communities as a main producer for country food productivity and food security earn only 11 percents of Gross Domestic Product (GDP). The arrival of information age guides this country to new development strategies.

National Electronics and Computer Technology Center (NECTEC) in collaboration with the Ministry of Agriculture, has launched “Agriculture Information Network” as a response to the unmet information requirements of the agricultural sector. Farmers should gain benefit from the contents provided which include risk assessment, agriculture warning system and agricultural knowledge base, which aim to improve technology, productivity, income and stability of India agriculture sector through the age of Information Technology. The data warehouse consists of common databases and geo-spatial databases from various departments and organizations in the country and abroad. Farmers can get access to the contents through Internet by themselves or from groups of professional people called “Information Brokers”.

 

Keywords: Data Warehouse, Agriculture, IT

 

 

1.    Introduction

A  Data  warehouse [1] is  a  repository  of  integrated  information,  available  for  queries  and analysis.  Data  and  information  are  extracted  from  heterogeneous  sources  as  they  are generated.  This  makes  it  much  easier  and  more  efficient  to  run  queries  over  data  that originally came from  different  sources.  In other words Data warehouse is a database that is used to hold data for reporting and analysis. 

  

Goals of Data Warehousing

To facilitate reporting as well as analysis Maintain an organizations historical information Be an adaptive and resilient source of information Be the foundation for decision making

  

Data Warehouse Architecture

Data warehouse Architecture comprises of

Operational source systems A data staging area One or more conformed data marts A data warehouse database

 

Operational Source Systems

Operational  source  systems [1]  are  developed  to  capture  and  process  original  business transactions.  These  systems  are  designed  for  data  entry,  not  for  reporting,  but  it  is  from here the data in data warehouse gets populated.

 

Data Staging Area

Data staging area  is where  the  raw operational  data is  extracted,  cleaned,  transformed and combined  so  that  it  can  be  reported  on  and  queried  by  users.  This area lies between the operational source systems and the user database and is typically not accessible to users.

 

Data staging is a major process that includes the following sub procedures:

Extraction

The extract  step  is  the  first  step  of  getting  data  into the  data  warehouse  environment. Extracting means reading and understanding the  source data,  and  copying  the pas  that are needed to the data staging for further work.

Transformation

Once  the  data  is  extracted  into  the  data  staging  area,  there  are  many  transformation steps, including

 

1.  Cleaning the data by correcting misspellings, resolving domain conflicts, dealing with         missing data elements, and parsing into standard formats.

2.  Purging selected fields from the legacy data that are not useful for data warehouse.

3.  Combining  data  sources  by  matching  exactly  on  key  values  or  by  performing  fuzzy    matches on non-key  attributes.

4.  Creating  surrogate  keys  for  each  dimension  record  in  order  to  avoid  dependency  on legacy  defined  keys,  where  the  surrogate  key  generation  process  enforces  referential integrity between the dimension tables and fact tables.

5.  Building the aggregates for boosting the performance of common queries.

Loading and indexing

At  the  end  of  transformation  process,  the  data  is  in  the  form  of  load  record  images. Loading  in  the  data  warehouse  environment  usually  takes  the  form  of  replicating  the dimensional  tables  and  fact  tables  and  presenting  these  tables  to  bulk  loading facilitates each  recipient  data mart.  Bulk  loading  is a very important  capability  that  is to  be  contrasted  with  record-at-a  time  loading,  which  is  far  slower.  The target data mart must then index the newly arrived data for query performance.

 

Data Mart

Data  mart  is  a  logical  subset  of  an  enterprise-wide  data  warehouse.  For example, a data warehouse for a retail chain is constructed incrementally from individual, conformed data marts dealing with separate subject areas such as product sales. Dimensional  data  marts  are  organized  by  subject  area  such  as  sales,  finance,  and  marketing  and  coordinated  by  data  category  such  as  customer,  product,  and  location. These  flexible  information  stores  allows  data  structures  to  respond  to  business  changes-product  line  additions,  new  staff  responsibilities,  mergers,  consolidations,  and acquisitions.

  

Data Warehouse Database

A data  warehouse database  contains  the  data  that  is  organized  and  stored  specifically  for direct  user  queries  and  reports.  It  differs  from  an  OLTP  database  in  the  sense  that  it  is

designed primarily for reads not writes. An  OLAP  application  is  a  system  designed  for  few  but  complex  (read  only)  request.  An OLTP  application  is  a  system  designed  for  many  but  simple  concurrent  (and  updating) requests.

 

Metadata

Metadata defines the content and location of the data in the data warehouse, relationships between the operational databases and the data warehouse and the business views of the data in the data in the warehouse as accessible to the end-user tools. Metadata is searched by user to find the subject areas and the definitions of the data.

For decision support, the pointers required to data warehouse are provided by the metadata. Therefore, it acts as logical link between the decision support system application and the data warehouse. Thus, any data warehouse design should assure that there is a mechanism that populates and maintains the metadata repository and that all access paths to data warehouse have metadata as an entry point. In other words there should be no direct access permitted to the data-warehouse data if it does the user metadata definitions to gain the access. Meta data definition can be done by the user in any given data warehousing environment. The software environment as decided by the software tools used will provide a facility for metadata definition in a metadata repository.

 

OLAP Vs OLTP

 

OLTP (Online Transactional Processing)

OLTP servers handle mission-critical production data accessed through simple queries Usually handles queries of an automated nature OLTP applications consist of a large number of relatively simple transactions. Most often contains data organised on the basis of logical relations between normalised tables

• OLAP (Online Analytical Processing)

OLAP servers handle management-critical data accessed through an iterative analytical investigation Usually handles queries of an ad-hoc nature supports more complex and demanding transactions contains logically organised data in multiple dimensions

 

2.    Warehouse Schema Design

Dimensional modeling is a term used to refer a set of data modeling techniques that have

gained popularity  and acceptance for  data  warehouse  implementation.  Dimensional modeling is one of the key techniques in data warehousing.  Two types of tables are used in dimensional modeling: Fact tables and dimensional tables

 

 

 

Fact Tables  

These are used to record actual facts and measures in the business.  Facts are numeric data items that are of interest to the business.  Example, telecommunication- length of call in minutes, average number of calls.

 

Dimensional Tables 

Dimensional tables establish the context of the facts.  Dimensional tables store fields that describe the facts.  Example, telecommunication- call origin, call destination.  A schema is a fact table plus its related dimensional table.

  

3. Crucial Decision in Designing a Data Warehouse

The job of designing and implementing a data warehouse [3] is very challenging and difficult one, even though at the same time, there is a lot of focus and importance attached to it. The designer of the data warehouse may be asked by the top management:”take all enterprise data and build a data warehouse such that the management can get answer to all their questions”. This is daunting task with responsibility being visible and exciting. But how to get started? Where to start? Which data should be put first? Where is that data available? Which query should be answered? How would bring down the scope of project to something smaller and manageable, yet be scalable to gradually upgrade to upgrade to comprehensive data warehouse environment finally?

The recent trend is to build data marts for before a real large data warehouse is built. People want something smaller, so as to get manageable results before proceeding to the real data warehouse.

RALPH KIMBALL identified a nine step method as follows:

Step 1: Choose the subject matter.

Step 2: Decide the what the fact table represents.

Step 3: Identify and confirm the dimension.

Step 4: Choose the facts.

Step 5: Store precalculation in the fact table.

Step 6: Define the dimension and tables.

Step 7: Define the duration of database and periodicity of updation.

Step 8: Track slowly the change in dimension.

Step 9: Decide the query priorities and query modes.

All the above steps are required before the data warehousing is implemented. The final step or step 10 is implemented a simple data warehouse or data mart. The approach should be ‘from simpler to complex’. First only a few data marts are identified, designed and implemented. A data warehouse then will emerge gradually.

Let us discuss the above mentioned steps in detail. Interaction with the user is essential for obtaining answers to many questions. The user to be interviewed includes top management, middle management, executives, as also operational users, in addition to sales force and marketing teams. A clear picture emerges from the entire project on the data warehousing as to what are their problems and how they can be possibly solved with the help of data warehousing.

4.  Various Technology Considerations

The following or technological issues [3] are required to be considered for designing and implementing a data warehouse:

1. The hardware platforms for Data Warehouse

2. DBMS for supporting data warehouse

3. Communication and network infrastructure for a Data Warehouse

4. The system management /operating system platforms

5. The software tools for building, operating and using Data Warehouse

  

Hardware Platform

Organization normally tend to utilize the already existing hardware platform for data warehouse development however the disk storage requirements for a data warehouse will be significantly large, especially in comparison with single application.

If data warehouse or data mart is small in data size, normal Pentium server will be probably sufficient with not very high reliability standards. However for a regular large data warehouse application the server has to be specialized for the tasks associated with a data warehouse. A mainframe, for example is well suited for this purpose, as a data warehouse server. What are the features required for a successful data warehouse server? Firstly it should be able to support large data volume and complex query processing. In addition, it has to be highly scalable. As the user population keeps on growing, the network traffic and the access traffic increase significantly. Therefore, the requirement of data warehouse server is the scalable high performance for data loading and ad hoc query processing as well as the ability to support large database in a reliable and efficient manner. If the querying is going to be on a large public data network then multiprocessor configuration will be required for parallel query processing. In case of a complex server of configuration with multiple processors and large I/O bandwidth a proper balance needs to be made between I/O and processing power.

 

DBMS Selection

Next to hardware solutions a factor most critical is the database selection. This determines the speed performance of the data warehousing environment. The requirement Of a DBMS for data warehousing and requirement are scalability and high volume storage and processing and throughput in traffic.

            The majority of established RDBMS vendors have implemented various degree of parallelism in their products. Even though all the vendors have implemented various degrees of parallelism in their products. Even though all the well known vendors-IBM, ORACLE SYBASE-support parallel database processing, some of them have improved their architectures so as to better suit the specialized requirement of the data warehouse. The RDBMS products provide additional modules for OLAP cubes. The correct choice of OLAP server DB server and web server can be made by the designer or user of Data warehouse depending on the requirement.

 

Communication and Networking Infrastructure

Data warehouse can be internet enabled or intranet enabled as the choice may be. If web enabled the networking is taken care by the internet. If only Intranet based then the appropriate LAN operational environment should be accessible to all the identified users. Thus network expansion may be required as per the needs. In web enabled data warehouses, issues of security privacy and accessibility need to be considered carefully .Accordingly web enablement facilities should be ensured in the software tools used for data warehouse development.

  

Stages in Implementation

A data warehouse cannot be purchased and installed. Its implementation requires the integration of implementation of many products. Following are the steps of the Data Warehouse implementation:

Step 1: Collect and analyze business requirement.

Step2: Create a data model and physical design and data warehouse after deciding the                

            appropriate hardware platform.

Step 3: Define the data sources

Step 4: Choose the DBMS and software platform for data warehouse.

Step 5: Extract the data from operational data sources, translate it, clean-up and load into the      

            data warehouse model or data mart.

Step 6: Choose database access and reporting tools.

Step 7: Choose database connectivity software.

Step 8: Choose the data analysis (OLAP) and presentation (client GUI) software.

Step 9: Keep refreshing the data warehouse periodically.

  

Access Tools

With the exception of SAS(of SAS institute), all the Data Warehouses /OLAP vendors are not currently providing comprehensive single-window software tools capable of handling all aspects of data warehousing project implementation .SAS alone meets the requirement largely independently as it has its own database internally with a capability of import data from any vendor DBMS software. Therefore one can implement a data warehousing and data mining solution independently with SAS.

The best way to choose a group of tools is to understand the capability and compatibility of different type of access to the data and reporting by selecting best tool in market for that kind of access. The types of access and reporting are as follows:

1. Time series analysis 2. Data visualization, graphing, charting and pivoting 3. Complex textual search (text mining) 4. General stastical analysis. 5. Artificial intelligence techniques for hypothesis testing, trends discovery, identification and validation of data clusters and segments(also useful for data mining) 6. Mapping of specifial information into geographic information system 7. Ad hoc user-specific queries 8. Predefined repeatable queries 9. Drilling down interactically 10. Reporting the analysis by drilling down 11. Complex queries with multi-table forces, multilevel sub-queries, sophisticated search criteria.

In some applications, the user requirement may exceed the capability of tools. A number of query tools are available in the market today which enables an ordinary user to build customized reports by easily composing and executing ad hoc queries without any necessity to have the knowledge of the underlying design details or data base technology, SQL, or even the data model

  

5.  Its Applications in Agriculture

  

Project: Agriculture Information System Network (AGRISNET)

Department of Agriculture and Cooperation (DAC) [2] have taken steps to establish “Agricultural Information System Network (AGRISNET)” in collaboration with NIC. The Proposal recommends (i) the state-of-the-art IT infrastructure requirements to establish AGRISNET as the INTRANET over NICNET, (ii) development of databases and information systems for decision support for evaluation, monitoring and policy formulations, and (iii) human resources development, (iv) multi-media based training and demonstration of transfer of technology to strengthen Farm Research and Education using broadcast VSATs, (v) special interest groups in respect of subjects, problems, programmes, schemes, etc, and above all, to make Indian Agriculture on-line for INTERNET and INTRANET access through AGRISNET Nodes. AGRISNET Nodes are envisaged to be established at 

  • DAC Hqrs (Krishi Bhawan), 

  • DAC Attached Offices and its regional offices, 

  • DAC Subordinate Offices and its regional units, 

  • DAC Public Sector Undertakings (NSC&SFCI) and sub-units, 

  • DAC Autonomous Organizations, 

  • Apex Cooperative Organizations 

  • State Agriculture Departments 

  • NCT/UT Agriculture Departments 

  • District Agriculture Offices and 

  • Block Agriculture offices 

In this direction, IFFCO has taken up a project in association with Indian Space Research Organisation (ISRO) to utilise satellite based remote sensing data and Geographical Information Systems (GIS). Attention may be drawn to the fact that the developed countries have been utilising precision farming with the help of IT tools for a long time. While this will take a long time for our country due to small holdings, it is to be noted that GIS has an invaluable role to play even in the existing conditions. Remote sensing and GIS information can provide warnings on evolving crop stresses, crop vigour, etc.

The IFFCO-ISRO GIS project extends support for efficient and timely availability of IFFCO’s fertiliser to farmers though better logistics & efficient operations. It endeavours to provide farmers’ advisory services to provide decision support to farmers on land related issues, weather, etc. In addition to the GIS based services, effort is being made to create databases that contain information of interest to the farmers. These include recommendation on package of practices for major cereals, pulses, horticulture, floriculture and animal husbandry, etc.Information on all the inputs such as seeds, fertiliser, sources, current availability, prices, availability of credit, alternatives available and terms and conditions, etc. are sought to be provided. An important service envisaged is to provide access to the nearest expert in case of stress or any other problem witnessed in the crops. Facilities are sought to be provided to encourage and share farm experiences by forging various crop forums. Many of the agricultural extension services are also proposed to be made online using aspects of multimedia.

In order to encourage farmers to obtain best possible price, information on various agricultural output markets (mantis) is also being provided. The objective of this activity is to provide status of price at different mandies to facilitate farmer to move his produce to the mandi where he can expect better price. Other areas of interest to farmers such as distance education, location specific news, etc. are also planned. Access to other related sites of interest such those relating to courts, health, etc. are also sought to be provided.

  

6.  Conclusions

Analytical exploration of vast amount of agricultural data can best be support by appropriate application of Data Warehousing and OLAP technologies. A Data Warehouse provides efficient and reliable structure of storage for vast amount data while OLAP techniques provide mechanisms for analysis of this data.

 

7.  References

[1] Data warehouse and its applications in Agriculture, Anil Rai, Indian Agricultural Statistics Research Institute Library Avenue, New Delhi.

[2] Information Technology in Agriculture, S.C. Mittal.

[3] Data Warehousing concepts, Techniques, Products and Applications, C.S.R.Prabhu.

 

  

data visualization

More Data Visualization Articles

Previous post:

Next post: