Friday, August 10, 2012

Data Retrieval & Classification using jsoup


In the process of domain specific data retrieval, the main idea is to get the content which are within a certain domain & display to the user. So the relevant data may not be the whole website or even the web page instead it might be a small section within the webpage. Therefore a technological approach required to retrieve above mentioned data retrieval. There are few systems, libraries that can be used to retrieve specific data from websites, among them, jsoup looks promising for the purpose because of its features.
 Jsoup is a java library for working with real world HTML. It provides a very convenient API for extracting & manipulating data, using the best of DOM, CSS, & jquery like methods. Jsoup is an open source application which makes it a perfect development tool for this project as it can be modified according to the purpose. As jsoup is specially developed for java environment, it makes a perfect candidate for the development process as well. In the project, need to extract some sections within a given website, where the URL will be available. In this case, jsoup is suitable for the process, as it can be used to extract data for a given URL, from a file, or from a given string. Thus jsoup can be used to extract data from the given URL & then store the data & also extract or scrape sections within the data.
Jsoup API has many sophisticated features that can be used to enhance the extraction process. For example, data extraction can be done by reading the DOM structure of the website. As all websites are using HTML, jsoup can read the structure of the websites & go through the DOM structure & get the content as intended. In the jsoup, the HTML tags & attributes can be easily identified & get data by referring to them. These are called elements & elements provide a range od DOM-like methods to find elements, & extract & manipulate their data. The DOM getters are contextual: called on a parent document & find matching elements under the document; called on a child element they find elements under that child.  There are many elements & getters provided in jsoup. That makes the data extraction process very easy because, can extract only the intended sections without grabbing a bunch of web pages.
The extracted data need to be classified according to the content type. That means as text, images, videos, links, etc. & jsoup can be used for the categorization. It has the features to identify the content separately as text, links, and images & based on that the extraction process can be separated. It can identify the content type using the HTML tags & based on that use functions to extract each content type. Thus while extracting the data, the content classification also can be achieved using the jsoup.
So considering the requirements for data extraction process, jsoup can be mentioned as a highly sophisticated tool for data extraction. The features & functions provided for java based data extraction made the process very much easy & as it is an open source application, the cost effectiveness is also achieved.


Sunday, May 27, 2012

Using OpenJPA in Development

you know by now the use of openJPA but I havent mention you the way to use this in practice. Let me explain this using a simple example. I am using Netbeans IDE, which is easy for me to use openJPA as it will create the entity classes automatically. Study the following steps.
1. Create a new netbeans web project.
2. write click on the project & from the menu select New -->other--> persistence --> Persistence Unit.
you will get following dialog box.
3. let the name as it is & in the persistence provider section you need to select openJPA. If that is not in the list you need to create a new library. Go to new library & add your openjpa-all-2.1.0 jar. (you need to have openjpa unzipped in a folder).
4. Then click ok to complete the persistence creation.
5. Now you need to create a new package to insert the entity files. (example -com.openjpa.domain)
6.Right click on the package & select New -->other--> persistence --> Entity class from database.
Here you need to select your database connection. database schema is your db connection.
for example see below figure.
7. once the data source is selected, you can see the tables within the database will be displayed to you & you can select which tables you need to import as an entity class.
8. once you import the entity classes, netbeans will create them automatically.
8. If you take a look at these classes you can see table mapping & queries are generated.
9. then you can create your DAO files & access the data objects using the named queries or normal queries.
10. then you can use them in your servlets & jsps as you required.

So those are the steps to create a openJPA objects using netbeans. Hope this post helped you to understand the basics in openJPA. see you in next post.


MVC Architecture of OpenJPA


In the development process the standard way is to use Model View Controller architecture, where the separation of each component comes into play. 
1.      Model – Model represents the data & rules that govern access & update of data. That means basically it will contain database objects to be accessed.
2.      View – View is the web pages that we can display. The HTML, JSP pages are under the View category.
3.      Controller – This is the controlling mechanism between model & view. This will communicate with Model & View & translate & transport data objects.

Figure MVC Architecture of JPA

1.      DAO – DAO means Data Access Objects & it provides an abstract interface to the databases. Basically in DAO layer we write queries to access the database. We can write DAO’s & include methods as much as we wanted. When writing DAO we can use OpenJPA NamedQueries directly.
2.      Domain – Domain means the entity classes for the tables in the database. For each table in the database, we can create a mapping entity class & define the table entities as objects. It is much more efficient way than traditional relational database mapping. Domain also called as POJO (Plain Old Java Objects).

 In order to create the connection between database & domains, need to create an xml file including such details. That xml file is named as “persistance.xml” by default. It is similar to the normal database connection establishment, but it is defined in xml format. A typical persistance.xml file would contain following details.
This holds the connection details & there are two ways to retrieve the data objects. One method is using Entity manager & other way is using Session factory. I used entity manager factory in the DAO layers.
Hope this post would help you to understand the MVC architecture of the OpenJPA. Stay in touch :)



Thursday, November 24, 2011

What is Spring MVC Architecture

As I mentioned earlier, Spring has a MVC architecture which make the Spring a high performance framework comparing to others. in the MVC architecture of Spring, there are important parts to be noted as dispatcher Servlet, handler mapping, view resolver & views. The abstract communication between these components is as follows. the communication steps are as follows.

 1. The user request is received by the dispatcher servlet. This dispatcher servlet is the key for mapping the resources to the correct requests. It contains xml mapping codes defining each resource with the path to the resource. 
2. Then the dispatcher servlet will invoke the controller for the request. 
3. The Controller process the request by calling the appropriate service methods and returns a ModeAndView object to the DispatcherServlet. The ModeAndView object contains the model data and the view name.
 4. The DispatcherServlet sends the view name to a ViewResolver to find the actual View to invoke. 
5. Now the DispatcherServlet will pass the model object to the View to render the result & the View with the help of the model data will render the result back to the user.

What is a Controller in Spring Framework

In Spring framework, all the requests sent by the dispatcher servlet normally directed to a controller class. This controller class is mapping those requests to each process & execute the requested inputs. In a project there can be multiple controllers defined for different purposes. All these controllers refers to the same dispatcher servlet. As I mentioned in the previous post, @RequestMapping keyword is used to map the dispatcher servlet with the controller class. 
In a Controller mapping there are two types of mapping as GET & POST. Normally there can be many GET methods in a controller while one POST method is employed. GET request method is used to get the requests from the user do the desired work & output results into a view( jsp pages). A GET request is shown below.


A POST request is used to post the results in to a page as usual. A POST request is shown below.


So this is about the basic concept of the Spring controller. Hope the post is useful for you & feel free to make a comment.

What is Dispatcher-Servlet in Spring

When you work with Spring framework, you will find an xml file called dispatcher-servlet is created when we create a new project. If we use Maven to build the project we can define the name for this servlet instead of the default name "dispatcher-servlet.xml". Normally we give the project name as the prefix & keep the servlet part as it is. For example, if the project name is Navigator, then dispatcher servlet can be "navigater-servlet.xml". 
the purpose of the dispatcher is to receive the requests & map those requests to the correct resources such as models & views. that means when a request is received it should be handled to meet with the correct resources & output the results. therefore we define all the request mapping codes within the dispatcher servlet. It is an xml file which contains java beans. 
when we define the beans tag, we must include all relevant bean URLs within it in order to map the resources. A sample beans tag definition is as follows.
After defining the beans, we can define the resource mapping tags. For example, we can define view resolvers, annotation URLs or any other DAO resource URLs within the xml.
In order to map the requests the request mapping code must be included within the java controller pages. such mapping code can be shown as follows.
@RequestMapping("/enterdetails")
This @RequestMapping denotes that the current page is mapped with the dispatcher servlet. so if the mapping is correct, the request will be directed to this controller & controller will do the rest if the things.
Now you may have a clear idea on the dispatcher servlet. Stay tuned for the next post.




Saturday, September 10, 2011

How to Use Spring Framework With Maven

In earlier post I was describing about Spring framework & its components. Now let me describe you how we can use this to our development process. I am using Ubuntu for the development form the beginning, so this is also going under Ubuntu.
First you need to download the latest version of the Spring framework from HERE.
Then you need to untar the tar file & move the folder to your preferred place. In the installation process, you n=only need to do that.The folder will contain sub folders such as dist, projects & src which contains all the Spring dependencies required for the development.
In order to use Spring in your java projects with maven, you need to add the Spring to the POM file of the Maven. I think now all of you are aware of Maven & the POM as I've described them in my earlier posts.
so to add Spring to Maven you need to add it as a Dependency to the POM. This is how you need to do it.

1. Create a Maven based project using following code.
mvn archetype:create -DgroupId=com.maven.test -DartifactId=TestProject -DarchetypeArtifactId=maven-archetype-webapp

now this will create an empty POM in your project folder.

2. Open the pom & you will see something similar to this.


<?xml version="1.0" encoding="UTF-8"?>
  <modelVersion>4.0.0</modelVersion>

 

  <groupId>org.springsource.maven</groupId>

  <artifactId>example1</artifactId>

  <version>1.0-SNAPSHOT</version>

 

  <name>Our Simple Project</name>

 

</project>

3. Now insert the following code within the < dependencies > </dependencies > tags.



           <dependency>

            <groupId>org.springframework</groupId>

            <artifactId>spring-context</artifactId>

            <version>3.0.5.RELEASE</version>

       </dependency>



4. This tag will tell java that Spring framework is installed & ready to use for the project.

Now you can use all the Spring dependencies under this as required for the project. Following are the most used dependencies. The version can be varied so that it can be given like a variable & define it in the top of the POM under properties.


-<dependency> <groupId>org.springframework</groupId> <artifactId>spring-expression</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Bean Factory and JavaBeans utilities (depends on spring-core) Define this if you use Spring Bean APIs (org.springframework.beans.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-beans</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Aspect Oriented Programming (AOP) Framework (depends on spring-core, spring-beans) Define this if you use Spring AOP APIs (org.springframework.aop.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-aop</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Application Context (depends on spring-core, spring-expression, spring-aop, spring-beans) This is the central artifact for Spring's Dependency Injection Container and is generally always defined -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Various Application Context utilities, including EhCache, JavaMail, Quartz, and Freemarker integration Define this if you need any of these integrations -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-context-support</artifactId> <version>${org.springframework.version}</version> </dependency>
<!-- Transaction Management Abstraction (depends on spring-core, spring-beans, spring-aop, spring-context) Define this if you use Spring Transactions or DAO Exception Hierarchy (org.springframework.transaction.*/org.springframework.dao.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-tx</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- JDBC Data Access Library (depends on spring-core, spring-beans, spring-context, spring-tx) Define this if you use Spring's JdbcTemplate API (org.springframework.jdbc.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Object-to-Relation-Mapping (ORM) integration with Hibernate, JPA, and iBatis. (depends on spring-core, spring-beans, spring-context, spring-tx) Define this if you need ORM (org.springframework.orm.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-orm</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Object-to-XML Mapping (OXM) abstraction and integration with JAXB, JiBX, Castor, XStream, and XML Beans. (depends on spring-core, spring-beans, spring-context) Define this if you need OXM (org.springframework.oxm.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-oxm</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Web application development utilities applicable to both Servlet and Portlet Environments (depends on spring-core, spring-beans, spring-context) Define this if you use Spring MVC, or wish to use Struts, JSF, or another web framework with Spring (org.springframework.web.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-web</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Spring MVC for Servlet Environments (depends on spring-core, spring-beans, spring-context, spring-web) Define this if you use Spring MVC with a Servlet Container such as Apache Tomcat (org.springframework.web.servlet.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-webmvc</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Spring MVC for Portlet Environments (depends on spring-core, spring-beans, spring-context, spring-web) Define this if you use Spring MVC with a Portlet Container (org.springframework.web.portlet.*) -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-webmvc-portlet</artifactId> <version>${org.springframework.version}</version> </dependency>

<!-- Support for testing Spring applications with tools such as JUnit and TestNG This artifact is generally always defined with a 'test' scope for the integration testing framework and unit testing stubs -->
 -<dependency> <groupId>org.springframework</groupId> <artifactId>spring-test</artifactId> <version>${org.springframework.version}</version> <scope>test</scope> </dependency>

So this is how you integrate Spring into Maven using the POM dependencies. So enjoy development with Spring until the next post.


View Iroshan Priyantha's profile on LinkedIn