This page highlights file upload and query errors and their workarounds.
Created by gh-md-toc
AWS Stack Management
My AWS user account is not authorized to create IAM resources
The stack creation procedure requires you to create an Identity and Access Management (IAM) resource. If you are not signed into AWS as root (or a user with permission to create IAM resources), setup fails with
I deleted my stack and still see an S3 bucket
If you delete your stack, an S3 bucket might still exist. You can delete the bucket in S3 in the AWS Management Console. Note that if files are still in the bucket, they will be deleted after a day.
File Upload Errors
I don’t see my dataset in the client
Try clearing the cache if you don’t see your new dataset in the IQL web client: select Settings > Clear cache.
Files fail to upload
Upload failures in TSV Uploader are marked as failed.
Ensure that the bucket names you defined when you created the stack match the S3 bucket names you created prior to setup. For example, if the error log shows that TSV Uploader couldn’t upload the converted files to the S3 bucket, you might have entered the incorrect name for the BuildBucket parameter during stack creation. In this case, you must recreate the stack. Don’t worry, you won’t lose any data, but do ensure that you point to the same S3 buckets when you recreate the stack.
A TSV file was uploaded to the wrong dataset
If a TSV file is indexed in the wrong dataset, you must delete the shard that contains the new data:
- Determine the time range of the shard by running an IQL query for fields in the TSV file.
- Sign into AWS and navigate to your S3 data bucket.
- Open the folder for the dataset that contains the new data from the TSV file you uploaded.
- From the list of compressed shards, locate the shard with the timestamp from your IQL query.
- Delete the shard folder and corresponding document.
=, :, !=, =~, !=~, (, *, \, %, +, -, /, >=, >, <=, <, in, not or not in expected, EOF encountered.
You entered invalid text after a field name. Review the query syntax to ensure the query is not missing the operator (: or =). If the syntax is correct, add quotations around the field value. Example: country:”united states”
INTEGER, string literal or IDENTIFIER expected, EOF encountered.
One of the fields has a typo.
If your queries are extremely slow, you must have a lot of data. Here are some tips for handling queries on large datasets.
Add Imhotep machines to the cluster
Recreate the stack to increase the value of
NumImhotepInstances. If you have already uploaded data into your S3 buckets, ensure that you point to the same S3 buckets when you recreate the stack. You don’t need to upload your data again.
Test on a small time range
Start small and then ramp up to the required range if performance is sufficient.
|Use||Do not use|
Determine the actual number of expected groups
If you think your query will return a large number of groups, run a DISTINCT query to return the actual number of expected groups before grouping your data:
1h today select distinct(accountid)
If the number of expected groups is a value that your system can handle, run the group by query:
1h today group by accountid
Make the largest group the last
If ascending order on all columns from left to right is not necessary, try making the largest group the last grouping and make it non-exploded by adding square brackets to the field name. This allows the result to be streamed instead of stored in memory.
|Use||Do not use|
group by city, country is especially problematic because IQL can’t verify in advance how many terms will be returned. If the requested number is too high, IQL uses too much memory and requires time to recover.
Avoid using DISTINCT for large queries
Don’t use distinct() as a metric with a large amount of data if you are using the group by filter with a large amount of data.
Heap memory size
The number of rows IQL can return on non-streaming queries depends on the heap size allocated to IQL.