Dataset Filtering Guide¶
This guide explains how to use filters when searching for datasets in the USNAN SDK. The filtering system is based on PrimeNG table filters and provides powerful search capabilities.
Overview¶
Dataset filtering is performed using the SearchConfig class, which allows you to build complex search queries by combining multiple filters with different match modes and operators.
Basic Usage¶
To search for datasets, create a SearchConfig object and add filters:
import usnan
client = usnan.USNANClient()
# Create a search configuration
search_config = usnan.models.SearchConfig()
# Add a filter
search_config.add_filter('is_knowledgebase', value=True, match_mode='equals')
# Execute the search
results = client.datasets.search(search_config)
# Iterate through results
for dataset in results:
print(f"Dataset: {dataset.dataset_name}")
SearchConfig Parameters¶
The SearchConfig class accepts the following parameters:
records(int, default=25): Number of records to fetch per batchoffset(int, default=0): Starting offset for resultssort_order(str, default=’ASC’): Sort order (‘ASC’ or ‘DESC’)sort_field(str, optional): Field to sort by
# Configure pagination and sorting
search_config = usnan.models.SearchConfig(
records=50,
offset=0,
sort_order='DESC',
sort_field='dataset_name'
)
Adding Filters¶
Use the add_filter() method to add search criteria:
search_config.add_filter(
field='field_name',
value='search_value',
match_mode='equals', # optional, default='equals'
operator='AND' # optional, default='AND'
)
Match Modes¶
The following match modes are supported:
Text Matching¶
equals- Exact matchnotEquals- Not equal tostartsWith- Starts with the specified valueendsWith- Ends with the specified valuecontains- Contains the specified valuenotContains- Does not contain the specified valuesimilarTo- Similar to (fuzzy matching)
Null Checking¶
isNull- Field is null/emptyisNotNull- Field is not null/empty
Numeric Comparison¶
greaterThan- Greater than the specified valuelessThan- Less than the specified value
Array Operations¶
includes- Array includes the specified valuenotIncludes- Array does not include the specified value
Examples by Match Mode¶
Exact Match¶
# Find datasets that are knowledge base entries
search_config = usnan.models.SearchConfig()
search_config.add_filter('is_knowledgebase', value=True, match_mode='equals')
Text Search¶
# Find datasets with names containing "protein"
search_config = usnan.models.SearchConfig()
search_config.add_filter('dataset_name', value='protein', match_mode='contains')
Numeric Comparison¶
# Find datasets with more than 2 dimensions
search_config = usnan.models.SearchConfig()
search_config.add_filter('num_dimension', value=2, match_mode='greaterThan')
Null Checking¶
# Find datasets with descriptions
search_config = usnan.models.SearchConfig()
search_config.add_filter('description', match_mode='isNotNull')
Multiple Filters¶
You can combine multiple filters to create complex search queries:
# Find 2D knowledge base datasets
search_config = (usnan.models.SearchConfig()
.add_filter('is_knowledgebase', value=True, match_mode='equals')
.add_filter('num_dimension', value=2, match_mode='equals'))
Operators¶
When adding multiple filters for the same field, you can specify the operator:
AND(default) - All conditions must be trueOR- Any condition can be true
# Find datasets with specific names (OR logic)
search_config = usnan.models.SearchConfig()
search_config.add_filter('dataset_name', value='protein', match_mode='contains', operator='OR')
search_config.add_filter('dataset_name', value='nucleic', match_mode='contains', operator='OR')
Important: All filters for the same field must use the same operator. Mixing operators for the same field will raise a ValueError.
Pagination¶
The search results are returned as a generator that automatically handles pagination:
search_config = usnan.models.SearchConfig(records=25)
results = client.datasets.search(search_config)
count = 0
for dataset in results:
count += 1
print(f"Dataset {count}: {dataset.dataset_name}")
# The generator will automatically fetch more results
# when the current batch is exhausted
if count >= 100: # Stop after 100 results
break
Cloning Search Configurations¶
You can clone a search configuration to create variations:
# Base configuration
base_config = usnan.models.SearchConfig()
base_config.add_filter('is_knowledgebase', value=True, match_mode='equals')
# Clone and modify
modified_config = base_config.clone()
modified_config.add_filter('num_dimension', value=2, match_mode='equals')
Error Handling¶
Common errors and how to handle them:
Invalid Filter Names¶
try:
search_config = usnan.models.SearchConfig()
search_config.add_filter('invalid_field_name', value=True, match_mode='equals')
except ValueError as e:
print(f"Invalid filter: {e}")
Mixed Operators¶
try:
search_config = usnan.models.SearchConfig()
search_config.add_filter('field', value='value1', operator='OR')
search_config.add_filter('field', value='value2', operator='AND') # Error!
except ValueError as e:
print(f"Operator mismatch: {e}")
Invalid Dataset IDs¶
try:
client = usnan.USNANClient()
dataset = client.datasets.get("invalid_id") # Should be integer
except TypeError as e:
print(f"Invalid ID type: {e}")
try:
dataset = client.datasets.get(999999) # Non-existent ID
except KeyError as e:
print(f"Dataset not found: {e}")
Complete Example¶
Here’s a comprehensive example showing various filtering techniques:
import usnan
def search_datasets():
client = usnan.USNANClient()
# Create a complex search
search_config = (usnan.models.SearchConfig(records=50)
.add_filter('is_knowledgebase', value=True, match_mode='equals')
.add_filter('num_dimension', value=2, match_mode='equals'))
print("Searching for 2D datasets in knowledge base...")
results = client.datasets.search(search_config)
count = 0
for dataset in results:
count += 1
print(f"{count}. {dataset.dataset_name}")
print(f" Experiment: {dataset.experiment_name}")
print(f" Facility: {dataset.facility.name if dataset.facility else 'Unknown'}")
print(f" Dimensions: {dataset.num_dimension}")
print()
if count >= 10: # Limit output
break
if count == 0:
print("No datasets found matching the criteria.")
else:
print(f"Found {count} datasets (showing first 10)")
if __name__ == "__main__":
search_datasets()
Best Practices¶
Use specific filters: Start with the most selective filters to reduce the result set quickly.
Handle pagination: Don’t assume all results will fit in memory. Process results as you iterate.
Clone configurations: When creating variations of searches, clone the base configuration rather than recreating it.
Error handling: Always wrap search operations in try-catch blocks to handle invalid filters or network issues.
Performance: When fetching many datasets, the SDK will automatically increase the number of records fetched at once to reduce network latency. Increasing the number of records returned is only helpful when it is known initially that a large number of results is expected and all results must be fetched.
Field validation: Refer to the dataset model documentation for valid field names and types.
Common Dataset Fields¶
Here are some commonly used fields for filtering:
dataset_name- Name of the dataset (may be edited by the user for readability)experiment_name- Name of the experiment (as captured on the spectrometer)is_knowledgebase- Boolean indicating if it’s a knowledge base entrynum_dimension- Number of dimensions (1D, 2D, etc.)description- Dataset description
For a complete list of available fields, refer to the usnan.models.datasets.Dataset documentation or inspect a dataset object’s attributes. All properties of a dataset
can be used for searching/ordering results.