Difference between revisions of "Alex's Elasticsearch Adventure"

From CoGepedia
Jump to: navigation, search
Line 8: Line 8:
  
 
  {
 
  {
1: {
+
1: {id: 1,
id: 1,
+
type_name: "gene",
type_name: "gene",
+
start: 0,
start: 0,
+
stop: 1,
stop: 1,
+
strand: "+",
strand: "+",
+
chromosome: 1
chromosome: 1
+
feature_name: {
feature_name: {
+
name1: "blah1",
name1: "blah1",
+
name2: "name",
name2: "name",
+
name3: "George",
name3: "George",
+
name4: "obligatory"
name4: "obligatory"
+
}
}
+
},
},
+
+
2: {
2: {
+
        id: 2,
        id: 2,
+
        type_name: "exon",
        type_name: "exon",
+
        start: 1776,
        start: 1776,
+
        stop: 2014,
        stop: 2014,
+
        strand: "und",
        strand: "und",
+
        chromosome: 3
        chromosome: 3
+
        feature_name: {
        feature_name: {
+
                name1: "stuff",
                name1: "stuff",
+
                name2: "at4g37764",
                name2: "at4g37764",
+
                name3: "578926",
                name3: "578926",
+
                name4: "name_of_feature"  
                name4: "name_of_feature"  
+
        }  
        }  
+
},
},
+
 
+
3: {
3: {
+
                id: 3,
                id: 3,
+
 
                 type_name: "cds",
 
                 type_name: "cds",
 
                 start: 1,
 
                 start: 1,
Line 69: Line 68:
 
Same error:
 
Same error:
  
  No handler found for uri [/testIndex/feature] and method [PUT]franka1@172:~$
+
  No handler found for uri [/testIndex/feature] and method [PUT]
 +
 
 +
Tried again, actually specifying an _id field of "test1" this time (the 1,2, and 3, in the JSON file were supposed to be the _id fields:
 +
 
 +
curl -XPUT localhost:9200/testIndex/feature/test1 -H 'Content-Type: application/json' -d @sample1.json
 +
 
 +
Get yet another error:
 +
 
 +
{"error":"InvalidIndexNameException[[testIndex] Invalid index name [testIndex], must be lowercase]","status":400}
 +
 
 +
Alright, apparently in doesn't like the capital letter in "testIndex". In that case:
 +
 +
curl -XPUT localhost:9200/test_index/feature/test1 -H 'Content-Type: application/json' -d @sample1.json
 +
 
 +
Woo more errors!
 +
 
 +
{"error":"MapperParsingException[failed to parse]; nested: JsonParseException[Unexpected character ('}' (code 125)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: [B@23276b35; line: 1, column: 838]]; ","status":400}
 +
 
 +
Okay, it looks like I forgot to put quotes around my object labels in the JSON file. That's easy enough to fix.
 +
New sample1.json:
 +
 
 +
 
 +
Run the command, get the same error. Looking at it more closely, the quotes may not have been the issue (though it probably didn't hurt to add them). It appears to be having issues with one of my closing brackets ( "}" ).
 +
 
 +
.....................
 +
 
 +
After talking to Matt, we figured out how the Bulk API is supposed to work (found at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html).
 +
 
 +
So, new JSON file (sample2.json):
 +
 
 +
{ "create" : { "_id" : "one" } }
 +
{ "feature1" : "This is the first feature" }
 +
{ "create" : { "_id" : "two" } }
 +
{ "feature2" : "This is the second feature" }
 +
 
 +
And new curl command:
 +
 
 +
curl -s -XPOST localhost:9200/testindex/feature/_bulk -d @sample2.json

Revision as of 17:31, 22 September 2014

I have been working on getting a working Elasticsearch database populated with test data, in order to see what the system is capable of.

First, I went through all the steps at https://genomevolution.org/wiki/index.php/Install_Elasticsearch.

Next, I began looking into loading multiple JSON objects into Elasticsearch's system at once. Found useful information at http://httpkit.com/resources/HTTP-from-the-Command-Line/ under the heading "Use a File as a Request Body".

I created a JSON file (I called it sample1.json) that looked like this:

{
	1: {id: 1,
		type_name: "gene",
		start: 0,
		stop: 1,
		strand: "+",
		chromosome: 1
		feature_name: {
			name1: "blah1",
			name2: "name",
			name3: "George",
			name4: "obligatory"
		}
	},
	
	2: {
	        id: 2,
        	type_name: "exon",
	        start: 1776,
	        stop: 2014,
	        strand: "und",
	        chromosome: 3
	        feature_name: {
	                name1: "stuff",
	                name2: "at4g37764",
	                name3: "578926",
	                name4: "name_of_feature" 
	        } 
	},

	3: {
                id: 3,
               type_name: "cds",
               start: 1,
               stop: 4,
               strand: "-",
               chromosome: 2
               feature_name: {
                       name1: "stuff",
                       name2: "at4g37764",
                       name3: "578926",
               }
       }
}


I then tested the command

curl -X PUT \
    -H 'Content-Type: application/json' \
    -d @sample1.json \
    localhost:9200/testIndex/feature

and got a "No handler Found" error.

So, I tried reorganizing the command:

curl -XPUT localhost:9200/testIndex/feature -H 'Content-Type: application/json' -d @sample1.json

Same error:

No handler found for uri [/testIndex/feature] and method [PUT]

Tried again, actually specifying an _id field of "test1" this time (the 1,2, and 3, in the JSON file were supposed to be the _id fields:

curl -XPUT localhost:9200/testIndex/feature/test1 -H 'Content-Type: application/json' -d @sample1.json

Get yet another error:

{"error":"InvalidIndexNameException[[testIndex] Invalid index name [testIndex], must be lowercase]","status":400}

Alright, apparently in doesn't like the capital letter in "testIndex". In that case:

curl -XPUT localhost:9200/test_index/feature/test1 -H 'Content-Type: application/json' -d @sample1.json

Woo more errors!

{"error":"MapperParsingException[failed to parse]; nested: JsonParseException[Unexpected character ('}' (code 125)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: [B@23276b35; line: 1, column: 838]]; ","status":400}

Okay, it looks like I forgot to put quotes around my object labels in the JSON file. That's easy enough to fix. New sample1.json:


Run the command, get the same error. Looking at it more closely, the quotes may not have been the issue (though it probably didn't hurt to add them). It appears to be having issues with one of my closing brackets ( "}" ).

.....................

After talking to Matt, we figured out how the Bulk API is supposed to work (found at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html).

So, new JSON file (sample2.json):

{ "create" : { "_id" : "one" } }
{ "feature1" : "This is the first feature" }
{ "create" : { "_id" : "two" } }
{ "feature2" : "This is the second feature" }

And new curl command:

curl -s -XPOST localhost:9200/testindex/feature/_bulk -d @sample2.json