Contents

Introduction
Data Source
Methods for Collecting CAZyme Sequence Data
Methods for Collecting Annotation Data
Blast Page
Annotate Page
Download Page
Search Page
API
Javascript API Library
PlantCAZymeDomain
PlantCAZymeAnnotate
PlantCAZymeBLAST
HTTP Requests
POST http://cys.bios.niu.edu/plantcazyme/api/domain.php
POST http://cys.bios.niu.edu/plantcazyme/api/annotate.php
POST http://cys.bios.niu.edu/plantcazyme/api/blast.php

Introduction

  • The current version contains data of 43790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes.
  • Useful features of the database include: 1) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, 2) a download page to allow batch downloading data of a specific protein family or species, and 3) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data.
  • PlantCAZyme database is the first web resource dedicated to provide pre-computed CAZyme sequence and annotation data for all sequenced plants and algae. We expect it would be a highly useful tool for plant cell wall and bioenergy research communities.

  • Data Source

  • Genomes were downloaded from Phytozome and spruce genome project.
  • 330 CAZyme family/domain HMMs from dbCAN

  • Methods for Collecting CAZyme Sequence Data

    HMMER package v3.0 was used as the tool to search 330 dbCAN HMMs against 35 genomes (the protein datasets). We have tested the performance of dbCAN-based search on all of the 330 CAZyme families as a whole (denoted as All) using different combinations of E-values and overage cutoffs. Figure 1 shows the F-measure values of different parameter combinations for the All sets of Arabidopsis (Figure 1A) and rice (Figure 1B), where F-measure = 2 * (Sensitivity * Precision) / (Sensitivity + Precision). We then selected the combination that gave the highest F-measure value and showed them in Table 2 and Table 3.

    Table 2 and 3 show that the coverage > 0.2 and E-value < 1e-23 combination gave the best F-measure for both Arabidopsis (F-measure = 0.91, sensitivity = 0.89 and precision = 0.92) and rice (F-measure = 0.85, sensitivity = 0.84 and precision = 0.85). We have also performed evaluation for the five classes separately, which suggests that the best F-measure varies for different CAZyme classes (Table 2 and 3). Overall the largest two classes GT and GH (81% of CAZyme families) in both plants have higher F-measures than the three smaller classes CE, PL and CBM. It also suggests that: (i) to annotate GH proteins, one should use a very relax coverage cutoff or the sensitivity will be low; (ii) to annotate CE families a very stringent E-value cutoff and coverage cutoff should be used; otherwise the precision will be very low because of a very high false positive rate.

    Although it would work best to use different parameter combinations for different classes and for different plants, we decided to use coverage > 0.2 and E-value < 1e-23 as the universal cutoffs, as this setting agrees in both dicots and monocots and makes the parsing process less complicated and easy to reproduce by others. Domain sequences and full-length protein sequences were retrieved for further bioinformatics analyses.

    Methods for Collecting Annotation Data

  • CDD search
    RPS-BLAST (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/node20.html) was run with full-length CAZyme protein sequences as query and the NCBI CDD database (hyperlink) as the database. CDD is a protein annotation resource that contains well annotated sequence models. E-value < 1e-2 was used to keep the CDD domain match.
  • GO data
    Gene Ontology annotation was retrieved from the Phytozome annotation for each genome. See one example (ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Athaliana/ annotation/Athaliana_167_annotation_info.txt.gz).
  • Hydropathy
    Full-length sequences were used to run the pepwindow program (http://emboss.sourceforge.net/apps/release/6.0/emboss/apps/pepwindow.html) and a graph of the classic Kyte & Doolittle hydropathy plot was generated.
  • TMHMM
    Full-length sequences were taken to run TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) to predict the transmembrance regions.
  • SignalP
    Signal peptide was predicted using SignalP (http://www.cbs.dtu.dk/services/SignalP/)
  • Secondary structure
    PSSpred was run to predict secondary structures (http://zhanglab.ccmb.med.umich.edu/PSSpred/)
  • Coiled-coil regions
    Full-length sequences were taken to run the COILS program (http://embnet.vital-it.ch/software/COILS_form.html)
  • EST data
    Plant EST (expressed sequence tag) data were downloaded from EBI (ftp://ftp.ebi.ac.uk/pub/databases/embl/release/). TBLASTN was run with full-length proteins as query to search for homologous EST matches. E-value < 1e-2 was used to keep significant match.
  • NCBI-nr
    NCBI non-redundant protein sequence database (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) was downloaded. BLASTP was run with full-length proteins as query to search for homologous protein matches. E-value < 1e-2 was used to keep significant match.
  • PDB
    Protein Data Bank protein sequence database (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz) was downloaded. BLASTP was run with full-length proteins as query to search for homologous protein matches. E-value < 1e-2 was used to keep significant match. If there is a significant PDB match, that means the browsed protein has a close homolog with 3D structure solved.
  • Orthologous groups
    CAZyme domain sequences were taken to run OrthoMCL program (http://orthomcl.org/orthomcl/). This was done in a number of steps, each of which is explained below:

    1. The domain FASTA sequences were sorted into files based on their family.
    2. Each family FASTA file was profiled into a BLAST database using the program makeblastdb with default settings.
    3. For each family, the FASTA file was run against the database of the same name using blastp with default settings, with all output being tabular.
    4. orthomclAdjustFasta was run for each family, using the family as the identifier.
    5. orthomclBlastParser was run using the BLAST results from step 3, and the compliant fasta file created from step 4.
    6. orthomclInstallSchema was run, the default configuration file was used, except for the login information to the database.
    7. orthomclLoadBlast was run, using the file generated in step 5.
    8. orthomclPairs was run with cleanup.
    9. orthomclDumpPairs was run.
    10. mcl mclInput was run, using --abc and -I 1.5.
    11. orthomclMclToGroups was run using the file generated from step 3, and inputing the file from step 10.
    12. The resulting file gives each orthologous group on a separate line.

  • Sequence alignment
    For each orthologous group, the alignment was generated by MAFFT (http://www.ebi.ac.uk/Tools/msa/mafft/). The graph was then generated by inputting the resulting file into the php script that we provide here (http://cys.bios.niu.edu/plantcazyme/scripts.php?script=alignment)
  • Phylogeny
    The above alignment of each orthologous group was used to run FastTree (http://www.microbesonline.org/fasttree/) to generate a phylogenetic tree. The Newick format tree file was turned into a graph using a biopython script (http://cys.bios.niu.edu/plantcazyme/scripts.php?script=tree).
  • Publication
    For Arabidopsis CAZymes, publication records were retrieved from the TAIR database (ftp://ftp.arabidopsis.org/User_Requests/Locus_Published_20130305.txt).

  • BLAST Page

    At BLAST page, you can submit your own protein (blastp) or DNA/RNA (blastx) sequences to search against our pre-computed CAZyme protein sequences. You may also choose a specific species or the CAZy database to search against. If you are submitting a large dataset, expect a long waiting time and better leave your email address so that the result will be sent to you after the job is finished.

    Annotate Page

    This function is the same as what dbCAN server (http://csbl.bmb.uga.edu/dbCAN/annotate.php) provides, except that we do not provide a graphical representation of the result page. Instead the hmmsearch output, a parseable table of per-domain hits, is provided. The format description is available at ftp://selab.janelia.org/pub/software/hmmer3/3.1b1/Userguide.pdf. Briefly, the following columns are what you want to pay attention to: target name, query name, tlen, qlen, C-Evalue, ali coord from and to, hmm coord from and to.

    Download Page

    Download page allows you to bulk download CAZyme sequences of each family or each plant.


    The PlantCAZyme database can be queried with searches by using the search bar at the top of the page. Multiple types of criteria can be specified to narrow your search. There are two options when deciding to do a search: unformatted search and formatted search.

  • Unformatted searching
    You enter a query with no formatting (e.g. entering brackets [] into the query). This will run your query only against the following fields:
    - ID, e.g. AT2G46570.1
    - Family, e.g. CBM10
    - Species, e.g. Arabidopsis Thaliana
    - Domain, e.g. Cellulose_synt

    See formatted searching for a description of these fields. Please note that unformatted searching is space delimited. So, entering the query "Arabidopsis Thaliana" will yield results from both Arabidopsis Thaliana and Arabidopsis Lyrata, as both contain the word "Arabidopsis".

  • Formatted searching
    Formatted searching allows you to be more specific and search through more categories. Formatted searches are done by indicating formatting with the use of brackets []. For example, if you want to search for the species Arabidopsis Thaliana, you can search "Arabidopsis Thaliana[Species]". You can write more than one specifier in a query. So if you only wanted the AA1 family, you could write the query as "Arabidopsis Thaliana[Species] AA1[Family]". These specifiers are all strung together in an AND fashion, so a result will only appear if it matches all of the criteria you have given.

    The categories that can be searched and their description are as follows:
  • ID, e.g. AT2G46570.1[ID]
  • Family, e.g. CBM10[Family]
  • Species, e.g. Arabidopsis Thaliana[Species]
  • Domain, e.g. Cellulose_synt[DomainID]
  • Domain Description, e.g. cellulose synthase[DomainDesc]
  • CDD ID, e.g. cd04188[CddID]
  • EST ID, e.g. HO801725[estID]
  • Gene Ontology ID, e.g. 0005886[GOid]
  • NR ID, e.g. AAD32031.1[NRid]
  • NR Description, e.g. pectinesterase[NRdescription]
  • PDB ID, e.g. 1asq_B[PDBid]
  • PDB Description, e.g. Nadph-Binding[PDBdescription]
  • Length, e.g. 100-500[Length]
  • Molecular Weight, e.g. 10000-20000[MW]
  • Isolelectric Point, e.g. 5-7[IP]

    Note that these queries shown above are still delimited by spaces. This means that each word is searched rather than the phrase. E.g. searching "Arabidopsis Thaliana[Species]" will bring up anything with a species containing "Arabidopsis" or "Thaliana". However, this can be restricted in either of two ways. The first is to simply split the query. E.g. "Arabidopsis[Species] Thaliana[Species]" would only bring up plants from the species "Arabidopsis Thaliana". The second method is to use parentheses. By putting parentheses around a phrase, it will be treated as a single query. So, "(Arabidopsis Thaliana)[Species]" will result in only plants of the species "Arabidopsis Thaliana".


  • API

    The PlantCAZyme API can be used to query the PlantCAZyme database from an external source. The API works over HTTP requests. A JavaScript file for working with the API is provided, and documentation for it is found below. Alternatively, one can send the request themselves. The documentation for these requests is provided in the respective section below.


    The Javascript API Library

    The Javascript API library provides everything you need to make the queries that are available to PlantCAZyme. There are three distinct function objects can be passed information in order to make a query. Each of these requires jQuery in order to exectute. The JavaScript API library can be found here, and jQuery can be downloaded here, or you can link to Google's jQuery here.

    All of the objects work in the same way. First, construct the query by passing it values. Then, assign it functions to perform when it is done, fails, or every time. Use the next() function to iterate over results, and use the get() command to retrieve information.


    PlantCAZymeDomain

    This function object allows you to look up the Signature Domain that a protein belongs to. This protein must already be a part of the PlantCAZyme database, and you must use the ID that PlantCAZyme uses.

    Set up the object.
    var result = new PCAZymeDomain("AT1G18140.1");
    

    The next () function allows us to iterate over results. Use this in a while() loop.
    while(result.next())
    

    The get() function allows us to get information from the result. Passing no parameters will return an array, otherwise we pass in a value. These are part of the object's data variable. They are as follows:
    data.ID
    data.Family
    data.Start
    data.End
    data.Evalue
    data.Sequence
    

    Alternatively, you can pass in the corresponding integer to get the same results. The object's done() function must be set, as the query is done asynchronously. If one wishes, they can also set the fail() and always() functions.

    The following code will query PlantCAZyme for the domains of the protein "AT1G18140.1". It will alert the user of all of the domains upon success, and alert the user of failure upon a failure.
    var result = new PCAZymeDomain("AT3G13560.1");
    result.done = function(){
    	while(result.next()){
    		alert(result.get(result.data.Family));
    	}
    }
    


    PlantCAZymeAnnotate

    This function object allows you to annotate protein sequences to find likely domains. This is done using the HMM's of dbCAN.

    Set up the object. The ID and sequences are mandatory, but the evalue is optional. The default is 10.
    var result = new PCAZymeAnnotate("ATT1G05240.1", "MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGVTVQYVSRQKTLAAKL\
    LRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCA\
    DVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLN\
    AKDLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLN\
    MDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDS\
    MVKLGFVQILTGKNGEIRKRCAFPN", {evalue: 10});
    

    The next () function allows us to iterate over results. Use this in a while() loop.
    while(result.next())
    

    The get() function allows us to get information from the result. Passing no parameters will return an array, otherwise we pass in a value. These are part of the object's data variable. They are as follows:
    data.targetName
    data.targetAccession
    data.targetLength
    data.queryName
    data.queryAccession
    data.queryLength
    data.fullEvalue
    data.fullScore
    data.fullBias
    data.domainResultNum
    data.domainResultTotal
    data.domainc-Evalue
    data.domaini-Evalue
    data.domainScore
    data.domainBias
    data.hmmFrom
    data.hmmTo
    data.aliFrom
    data.aliTo
    data.envFrom
    data.envTo
    data.acc
    data.description
    

    Alternatively, you can pass in the corresponding integer to get the same results. The object's done() function must be set, as the query is done asynchronously. If one wishes, they can also set the fail() and always() functions. The following code will run a hmmscan for the domains of the protein "AT1G05240.1". It will alert the user of all of the domains upon success, and alert the user of failure upon a failure.
    var result = new PCAZymeAnnotate("AT1G05240.1", "MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGVTVQYVSRQKTLAAKL\
    LRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCA\
    DVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLN\
    AKDLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLN\
    MDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDS\
    MVKLGFVQILTGKNGEIRKRCAFPN", {evalue: 10});
    result.done = function(){
    	while(result.next()){
    		alert(result.get(result.data.queryID));
    	}
    }
    


    PlantCAZymeBLAST

    This function object allows you to run a BLAST against the proteins of the PlantCAZyme database.

    Set up the object. The ID and sequences are mandatory. Options are included in braces, and are case sensitive. Options include evalue, program, database, matrix, filter, mask. The default values are 10, "blastp", "all", "BLOSUM62", "yes", and "no" respectively.
    var result = new PCAZymeBLAST("AT3G13560.1","MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGVTVQYVSRQKTLAAKL\
    LRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCA\
    DVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLN\
    AKDLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLN\
    MDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDS\
    MVKLGFVQILTGKNGEIRKRCAFPN", {evalue:10, program: "blastp", matrix: "BLOSUM62"});
    

    The next () function allows us to iterate over results. Use this in a while() loop.
    while(result.next())
    

    The get() function allows us to get information from the result. Passing no parameters will return an array, otherwise we pass in a value. These are part of the object's data variable. They are as follows:
    data.queryID
    data.subjectID
    data.identity
    data.alignmentLength
    data.mismatches
    data.gapOpens
    data.qStart
    data.qEnd
    data.sStart
    data.sEnd
    data.evalue
    data.bitScore
    
    Alternatively, you can pass in the corresponding integer to get the same results. The object's done() function must be set, as the query is done asynchronously. If one wishes, they can also set the fail() and always() functions.

    The following code will run a BLAST search against the PlantCAZyme proteins for the protein "AT4G18780.1". It will make a list of all hits, and then alert the user upon success, and alert the user of failure upon a failure.
            var result = new PCAZymeBLAST("AT3G13560.1","MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGVTVQYVSRQKTLAAKL\
    LRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCA\
    DVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLN\
    AKDLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLN\
    MDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDS\
    MVKLGFVQILTGKNGEIRKRCAFPN");
            result.done = function(){
    	    var hits = "";
                    while(result.next()){
                            hits += result.get(result.data.subjectID) + ",";
                    }
    	    alert(hits.substr(0,-1));
            }
    


    HTTP Requests

    Sometimes you may wish to get information from PlantCAZyme without using JavaScript. There is no library provided for these actions, but these requests will be supported. Each request is done through POST to the corresponding API page, and a successful return will be JSON encoded. The examples provided are done in perl.

    POST http://cys.bios.niu.edu/plantcazyme/api/domain.php

    Sending a request here will retrieve the domains for a given protein. The fields are "id".
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/domain.php', ['id' => 'AT1G18140.1']);
    my $response = $ua->request($request);
    

    Returned is an array of arrays. Each row is a domain match, the columns are in the order ID, Family, Start, End, Evalue, Sequence. -1 is returned upon failure.
    use LWP::UserAgent;
    use HTTP::Request::Common qw{ POST };
    use CGI;
    use JSON::Parse 'parse_json';
    
    my $ua = LWP::UserAgent->new;
    
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/domain.php', ['id' => 'AT1G18140.1']);
    
    my $response = $ua->request($request);
    if($response->is_success){
            $result = $response->decoded_content;
            if($result != -1){
                    $result = parse_json($result);
                    for(my $i = 0; $i < scalar @$result; $i++){
                            print @$result[$i]->[1] . "\n"; # print the domain
                    }
            }
            else{ print "Failure!"; }
    } else {
            print $response->code, "\n";
    }
    


    POST http://cys.bios.niu.edu/plantcazyme/api/annotate.php

    Sending a request here will run a hmmscan for a given protein. The fields are "id", "sequence", and "evalue". "evalue" is optional.
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/annotate.php', ['id' => 'AT1G05240.1', 'sequence'=>'MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGV
    TVQYVSRQKTLAAKLLRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCADVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLNAKDLVVLSGGHTIGISSCALVN
    SRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLNMDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDSMVKLGFVQILTGKNGEIRKRCAFPN', 'evalue'=>10]);
    my $response = $ua->request($request);
    

    Returned is an array of arrays. Each row is a domain match, the columns are in the order target name, target accession, target length, query name, query accession, query length, full domain evalue, full domain score, full domain bias, this domain result number, total results in this domain, is domain c-Evalue, this domain i-Evalue, is domain score, this domain bias, hmm model start position, hmm model end position, domain start position, domain end position, domain envelope start, domain envelope end, probability, and description. -1 is returned upon failure.
    use LWP::UserAgent;
    use HTTP::Request::Common qw{ POST };
    use CGI;
    use JSON::Parse 'parse_json';
    
    my $ua = LWP::UserAgent->new;
    
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/annotate.php', ['id' => 'AT1G05240.1', 'sequence'=>'MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPKAEEIVRGV
    TVQYVSRQKTLAAKLLRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCADVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLNAKDLVVLSGGHTIGISSCALVN
    SRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLNMDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDSMVKLGFVQILTGKNGEIRKRCAFPN', 'evalue'=>10]);
    
    my $response = $ua->request($request);
    if($response->is_success){
            $result = $response->decoded_content;
            if($result != -1){
                    $result = parse_json($result);
                    for(my $i = 0; $i < scalar @$result; $i++){
                            print @$result[$i]->[0] . "\n"; # print the domain
                    }
            }
            else{ print "Failure!"; }
    } else {
            print $response->code, "\n";
    }
    


    POST http://cys.bios.niu.edu/plantcazyme/api/blast.php

    Sending a request here will run a BLAST search for a given protein. The fields are "evalue", "program", "database", "matrix", "filter", "mask". Filter and mask are set simply by defining them.
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/blast.php', ['id' => 'AT3G13560.1', 'sequence'=>'MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPK
    AEEIVRGVTVQYVSRQKTLAAKLLRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCADVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLNAK
    DLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLNMDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDSMVKLGFVQILTGKNGEIRKRCAFPN', 
    'evalue'=>10, 'program'=>'blastp', 'matrix'=>'BLOSUM62']);
    
    my $response = $ua->request($request);
    

    Returned is an array of arrays. Each row is a match, the columns are in the order query ID, subject ID, % identity, alignment length, mismatches, gap opens, query start, query end, subject start, subject end, evalue, bit score. -1 is returned upon failure.
    use LWP::UserAgent;
    use HTTP::Request::Common qw{ POST };
    use CGI;
    use JSON::Parse 'parse_json';
    
    my $ua = LWP::UserAgent->new;
    
    my $request = POST('http://cys.bios.niu.edu/plantcazyme/api/blast.php', ['id' => 'AT3G13560.1', 'sequence'=>'MAIKNILALVVLLSVVGVSVAIPQLLDLDYYRSKCPK
    AEEIVRGVTVQYVSRQKTLAAKLLRMHFHDCFVRGCDGSVLLKSAKNDAERDAVPNLTLKGYEVVDAAKTALERKCPNLISCADVLALVARDAVAVIGGPWWPVPLGRRDGRISKLNDALLNLPSPFADIKTLKKNFANKGLNAK
    DLVVLSGGHTIGISSCALVNSRLYNFTGKGDSDPSMNPSYVRELKRKCPPTDFRTSLNMDPGSALTFDTHYFKVVAQKKGLFTSDSTLLDDIETKNYVQTQAILPPVFSSFNKDFSDSMVKLGFVQILTGKNGEIRKRCAFPN', 
    'evalue'=>10, 'program'=>'blastp', 'matrix'=>'BLOSUM62']);
    
    my $response = $ua->request($request);
    if($response->is_success){
            $result = $response->decoded_content;
            if($result != -1){
                    $result = parse_json($result);
                    for(my $i = 0; $i < scalar @$result; $i++){
                            print @$result[$i]->[1] . "\n"; # print the match
                    }
            }
            else{ print "Failure!"; }
    } else {
            print $response->code, "\n";
    }