Guide: Data Transformations with Mappers

Mappers are the core of the data transformation process. They are powerful, reusable functions that you use within your mapping dictionary to define how each column of your destination file should be generated.

This guide provides a comprehensive reference for all mappers available in the fluvo.lib.mapper module.


Data Quality Validation (Processor.check)

Before you start the main transformation process, it’s often a good idea to validate the quality and structure of your source data. The library provides a .check() method on the Processor object for this purpose.

You can call .check() multiple times with different “checker” functions to validate your data against a set of rules. If a check fails, a warning will be logged to the console, and you can prevent the transformation from continuing.

Using Checkers

In your transform.py script, after initializing the Processor but before calling .process(), you can add your checks:

transform.py
from fluvo.lib import checker
from fluvo.lib.transform import Processor

# Initialize processor
processor = Processor('origin/my_data.csv')

# --- Add Data Quality Checks ---
print("Running data quality checks...")
processor.check(checker.line_length_checker(15))
processor.check(checker.cell_len_checker(120))
processor.check(checker.id_validity_checker('SKU', r'^[A-Z]{2}-\d{4}$'))

# Now, proceed with the mapping and processing
# processor.process(...)

Available Checker Functions

The following checkers are available in the fluvo.lib.checker module.

checker.line_length_checker(expected_length)

Verifies that every row in your data file has exactly the expected_length number of columns. This is useful for catching malformed CSV rows.

checker.cell_len_checker(max_cell_len)

Verifies that no single cell (field) in your entire dataset exceeds the max_cell_len number of characters.

checker.line_number_checker(expected_line_count)

Verifies that the file contains exactly expected_line_count number of data rows (not including the header).

checker.id_validity_checker(id_field, pattern)

Verifies that the value in the specified id_field column for every row matches the given regex pattern. This is extremely useful for ensuring key fields like SKUs or external IDs follow a consistent format.


Basic Mappers

mapper.val(field, [postprocess])

Retrieves the value from a single source column, identified by field. This is the most fundamental mapper.

  • field (str): The name of the column in the source file.

  • postprocess (function, optional): A function to modify the value after it has been read.

mapper.const(value)

Fills a column with a fixed, constant value for every row.

  • value: The static value to use (e.g., string, bool, integer).

How it works

Input Data (source.csv)

AnyColumn

a

b

Transformation Code

'sale_type': mapper.const('service')

Output Data

sale_type

service

service


Combining and Formatting

mapper.concat(separator, *fields)

Joins values from one or more source columns together, separated by a given separator.

  • separator (str): The string to place between each value.

  • *fields (str): A variable number of source column names (field) or static strings to join.


Conditional and Boolean Logic

mapper.cond(field, true_value, false_value)

Checks the value of the source column field. If it’s considered “truthy” (not empty, not “False”, not 0), it returns true_value, otherwise it returns false_value.

mapper.bool_val(field, true_values=None, false_values=None, default=False)

Checks the value in the source column field and converts it to a boolean 1 or 0.

  • field (str): The column to check.

  • true_values (list, optional): A list of strings that should be considered True.

  • false_values (list, optional): A list of strings that should be considered False.

  • default (bool, optional): The default boolean value to return if the value is not in true_values or false_values.

How it works

Input Data (source.csv)

Status

Active

Inactive

Done

Transformation Code

'is_active': mapper.bool_val('Status', true_values=['Active', 'In Progress'], false_values=['Inactive']),

Output Data

is_active

1

0

0


Numeric Mappers

mapper.num(field, default='0.0')

Takes the numeric value of the source column field. It automatically transforms a comma decimal separator (,) into a dot (.). Use it for Integer or Float fields in Odoo.

  • field (str): The column containing the numeric string.

  • default (str, optional): A default value to use if the source value is empty. Defaults to '0.0'.

How it works

Input Data (source.csv)

my_column

01

2,3

Transformation Code

'my_field': mapper.num('my_column'),
'my_field_with_default': mapper.num('my_column', default='-1.0')

Output Data

my_field

my_field_with_default

1

1

2.3

2.3

0.0

-1.0


Relational Mappers

mapper.m2o_map(prefix, *fields)

A specialized concat for creating external IDs for Many2one relationship fields (e.g., partner_id). This is useful when the unique ID for a record is spread across multiple columns.


Many-to-Many Relationships

Handling many-to-many relationships often requires a two-step process:

  1. Extract and Create Related Records: First, you need to identify all the unique values for the related records (e.g., all unique “Tags” or “Categories”), create a separate CSV file for them, and assign each one a unique external ID.

  2. Link to Main Records: In the main record file (e.g., partners), you create a comma-separated list of the external IDs of the related records.

The library provides special mappers and a processing flag (m2m=True) to make this easy.

Example: Importing Partners with Categories

Let’s assume you have a source file where partner categories are listed in a single column, separated by commas.

Input Data (client_file.csv)

Company

Firstname

Lastname

Birthdate

Category

The World Company

John

Doe

31/12/1980

Premium

The Famous Company

David

Smith

28/02/1985

Normal, Bad Payer

Step 1: Extract and Create Unique Categories

We need to create a res.partner.category.csv file. The key is to use mapper.m2m_id_list and mapper.m2m_value_list combined with the m2m=True flag in the .process() method. This tells the processor to automatically find all unique values in the ‘Category’ column, split them, and create one row for each.

Transformation Code

# This mapping is specifically for extracting unique categories.
partner_category_mapping = {
   'id': mapper.m2m_id_list('res_partner_category', 'Category'),
   'name':  mapper.m2m_value_list('Category'),
}

# The m2m=True flag activates the special processing mode.
processor.process(partner_category_mapping, 'res.partner.category.csv', m2m=True)

Output File (res.partner.category.csv) This file will contain one row for each unique category found across all partner records.

id

name

res_partner_category.Premium

Premium

res_partner_category.Normal

Normal

res_partner_category.Bad_Payer

Bad Payer


Importing Product Variants: Legacy (v9-v12) vs. Modern (v13+)

Importing product variants (e.g., a T-shirt that comes in different colors and sizes) is a common but complex task. The data model for product attributes changed significantly in Odoo 13. The library provides two distinct workflows to handle both the old and new systems.

Modern Approach (Odoo v13+)

This is the recommended approach for all modern Odoo versions. Odoo can now automatically create product variants if you provide the attribute values directly.

  • Processor Class: ProductProcessorV10

  • Key Mapper: mapper.m2m_template_attribute_value

How it Works: You provide the attribute values (e.g., “Blue”, “L”) as a comma-separated string. The ProductProcessorV10 sets create_variant: 'Dynamically' on the attribute, telling Odoo to find or create the corresponding product.attribute.value records and link them to the product template automatically.

Example Transformation Script (Modern)

# In your transform.py
from fluvo.lib import mapper
from fluvo.lib.transform import ProductProcessorV10

# Initialize the modern processor
processor = ProductProcessorV10('origin/products.csv')

# --- 1. Create the product.attribute records ---
# This step tells Odoo which attributes can create variants
attributes = ['Color', 'Size']
processor.process_attribute_data(
    attributes, 'prod_attrib', 'data/product.attribute.csv', {}
)

# --- 2. Create the product.template records ---
# The key is to map the raw values to the attribute's technical name
template_mapping = {
    'id': mapper.m2o_map('prod_template_', 'template_id'),
    'name': mapper.val('Product Name'),
    'attribute_line_ids/Color/value_ids': mapper.val('Color'),
    'attribute_line_ids/Size/value_ids': mapper.val('Size'),
}
processor.process(template_mapping, 'data/product.template.csv')

Legacy Approach (Odoo v9-v12)

This approach is for older Odoo versions and requires a more manual, three-file process. You must create the attributes, then the attribute values with their own external IDs, and finally link them to the product template.

  • Processor Class: ProductProcessorV9

  • Key Mappers: mapper.m2m_attribute_value, mapper.val_att, mapper.m2o_att

Example Transformation Script (Legacy)

# In your transform.py
from fluvo.lib import mapper
from fluvo.lib.transform import ProductProcessorV9

# Initialize the legacy processor
processor = ProductProcessorV9('origin/products.csv')

# --- This single call creates all three required files ---
attributes = ['Color', 'Size']
attribute_prefix = 'prod_attrib'

# Mapping for product.attribute.value file
value_mapping = {
    'id': mapper.m2m_attribute_value(attribute_prefix, *attributes),
    'name': mapper.val_att(attributes),
    'attribute_id/id': mapper.m2o_att_name(attribute_prefix, attributes),
}

# Mapping for product.template.attribute.line file
line_mapping = {
    'product_tmpl_id/id': mapper.m2o_map('prod_template_', 'template_id'),
    'attribute_id/id': mapper.m2o_att_name(attribute_prefix, attributes),
    'value_ids/id': mapper.m2o_att(attribute_prefix, attributes),
}

processor.process_attribute_mapping(
    value_mapping, line_mapping, attributes, attribute_prefix, 'data/', {}
)

Advanced Mapping

mapper.map_val(map_dict, key, default=None, m2m=False)

Looks up a key in a map_dict and returns the corresponding value. This is extremely useful for translating values from a source system to Odoo values.

  • map_dict (dict): The Python dictionary to use as a translation table.

  • key (mapper): A mapper that provides the key to look up in the dictionary (often mapper.val).

  • default (optional): A default value to return if the key is not found.

  • m2m (bool, optional): If set to True, the key is expected to be a list of values. The mapper will look up each value in the list and return a comma-separated string of the results.

Example: Advanced Country Mapping

Transformation Code

# The mapping dictionary translates source codes to Odoo external IDs.
country_map = {
    'BE': 'base.be',
    'FR': 'base.fr',
    'NL': 'base.nl',
}

# Use map_val to look up the code and return the external ID.
'country_id/id': mapper.map_val(country_map, mapper.val('CountryCode'))

Binary Mappers

mapper.binary(field)

Reads a local file path from the source column field and converts the file content into a base64-encoded string.

  • field (str): The name of the column that contains the relative path to the image file.

How it works

Input Data (images.csv)

ImagePath

images/product_a.png

Transformation Code

# Reads the file at the path and encodes it for Odoo
'image_1920': mapper.binary('ImagePath')

Output Data

image_1920

iVBORw0KGgoAAAANSUhEUg… (etc.)

mapper.binary_url_to_base64(field)

Reads a URL from the source column field, downloads the content from that URL, and converts it into a base64-encoded string.

  • field (str): The name of the column that contains the full URL to the image or file.

How it works

Input Data (image_urls.csv)

ImageURL

https://www.example.com/logo.png

Transformation Code

# Downloads the image from the URL and encodes it
'image_1920': mapper.binary_url_to_base64('ImageURL')

Output Data

image_1920

iVBORw0KGgoAAAANSUhEUg… (etc.)


Advanced Techniques

Pre-processing Data

For complex manipulations before the mapping starts, you can pass a preprocess function to the Processor constructor. This function receives the CSV header and data and must return them after modification.

Adding Columns

def my_preprocessor(header, data):
    header.append('NEW_COLUMN')
    for i, j in enumerate(data):
        data[i].append('NEW_VALUE')
    return header, data

processor = Processor('source.csv', preprocess=my_preprocessor)

Removing Lines

def my_preprocessor(header, data):
    data_new = []
    for i, j in enumerate(data):
        line = dict(zip(header, j))
        if line['Firstname'] != 'John':
            data_new.append(j)
    return header, data_new

processor = Processor('source.csv', preprocess=my_preprocessor)

Sharing Data Between Mappers (The state Dictionary)

For complex, stateful transformations, every mapper function receives a state dictionary as its second argument. This dictionary is persistent and shared across the entire processing of a file, allowing you to “remember” values from one row to the next.

This is essential for handling hierarchical data, like sales orders and their lines.

Example: Remembering the Current Order ID

def get_order_id(val, state):
    # When we see a new Order ID, save it to the state
    if val:
        state['current_order_id'] = val
    return val

sales_order_mapping = {
    'id': mapper.val('OrderID', postprocess=get_order_id),
    'order_line/product_id/id': mapper.m2o_map('prod_', 'SKU'),
    'order_line/order_id/id': lambda line, state: state.get('current_order_id')
}

Conditionally Skipping Rows (_filter)

You can filter out rows from your source data by adding a special _filter key to your mapping dictionary. The mapper for this key should return True for any row that you want to skip.

Input Data (source.csv)

Name

Status

John

Active

Jane

Cancelled

Transformation Code

my_mapping = {
    '_filter': mapper.val('Status', postprocess=lambda x: x == 'Cancelled' or not x),
    'name': mapper.val('Name'),
    # ... other fields
}

In this example, the rows for “Jane” and the blank line will be skipped, and only the row for “John” will be processed.

Creating Custom Mappers

Any Python function can act as a custom mapper when used with postprocess. The function will receive the value from the source column as its first argument and the shared state dictionary as its second.

Updating Records With Database IDs

To update records using their database ID, map your source ID to the special .id field and provide an empty id field.

my_mapping = {
    'id': mapper.const(''),
    '.id': mapper.val('id_column_from_source'),
    'name': mapper.val('name_from_source'),
    # ... other fields to update
}

Data Cleaning

When importing data from external sources, values often need sanitization before they can be used in Odoo. The library provides two complementary cleaning modules for this purpose.

Choosing the Right Module

Module

Use Case

Performance

clean_expr

Polars expressions, large datasets

10-100x faster

clean

Legacy mapper integration, stateful operations

Flexible

Polars-Native Cleaning (clean_expr)

The clean_expr module returns Polars expressions for vectorized operations. Use this with the expr module for maximum performance.

from fluvo.lib import expr, clean_expr

mapping = {
    "phone": clean_expr.phone("Phone"),           # Keep digits + leading +
    "email": clean_expr.email("Email"),           # Lowercase, strip noise
    "website": clean_expr.url("Website"),         # Ensure https, fix www
    "vat": clean_expr.vat("VAT"),                 # Clean VAT number
    "name": clean_expr.name_clean("ContactName"), # Strip titles/suffixes
}

Row-by-Row Cleaning (clean)

The clean module returns callables for use with the mapper’s postprocess parameter. Use this for stateful operations or with existing mapper-based code.

from fluvo.lib import mapper, clean

mapping = {
    "phone": mapper.val("Phone", postprocess=clean.phone()),
    "email": mapper.val("Email", postprocess=clean.email()),
    "website": mapper.val("Website", postprocess=clean.url()),
    "vat": mapper.val("VAT", postprocess=clean.vat()),
}

Available Cleaners

Phone Cleaners

Function

Description

Example

phone()

Keep digits and leading +

"+31 (6) 12-34""+31612345678"

phone_digits()

Keep only digits

"+31 6 12""31612"

phone_normalize(country)

Country-specific rules

phone_normalize("NL")("0612")"+31612"

Email Cleaners

Function

Description

Example

email()

Lowercase, strip, remove noise

"John@Example.COM (John)""john@example.com"

email_domain()

Extract domain

"user@example.com""example.com"

URL Cleaners

Function

Description

Example

url()

All-in-one: https, fix www

"wwwexample.com""https://www.example.com"

url_https()

Convert http to https

"http://...""https://..."

url_fix_www()

Fix missing dot after www

"wwwtest.com""www.test.com"

VAT Cleaners

Function

Description

Example

vat()

Keep letters, digits, hyphen

"NL 123.456.789.B01""NL123456789B01"

vat_or_exempt(exempt_values, marker)

Handle exempt cases

See below

# Mark VAT-exempt values with "/" prefix for Odoo bypass
clean.vat_or_exempt(
    exempt_values=["no vat", "church", "non-profit"],
    marker="/"
)("no vat")  # Returns "/no vat"

Name Cleaners

Function

Description

Example

name_clean()

Strip titles, normalize space

"Mr.  John   Doe""John Doe"

name_strip_title()

Remove Mr., Mrs., Dr., etc.

"Dr. Jane Smith""Jane Smith"

name_filter_common()

Filter placeholder names

"Test User"None

Company Suffix Cleaners

Normalize business entity suffixes to their canonical forms. Handles variations across multiple countries.

Function

Description

Example

company_suffix()

Normalize legal suffix

"Acme BV""Acme B.V."

Supported countries and their canonical forms:

Country

Variations

Canonical

NL

BV, Bv, bv, B.V

B.V.

NL

NV, nv

N.V.

DE

gmbh, GMBH, GmbH

GmbH

DE

AG, ag

AG

BE

BVBA, bvba

B.V.B.A.

FR

SARL, sarl, S.A.R.L

S.A.R.L.

FR

SAS, sas

S.A.S.

UK

Ltd, LTD, ltd, Limited

Ltd.

UK

PLC, plc

PLC

US

Inc, INC, Incorporated

Inc.

US

LLC, llc

LLC

IT

SPA, spa

S.p.A.

ES

SL, sl

S.L.

DK/NO

AS, as

A/S

SE

AB, ab

AB

Usage with mapper (row-by-row):

from fluvo.lib import mapper, clean

mapping = {
    "name": mapper.val("CompanyName", postprocess=clean.company_suffix()),
}

# Input:  {"CompanyName": "Acme BV"}
# Output: {"name": "Acme B.V."}

# Input:  {"CompanyName": "Test gmbh"}
# Output: {"name": "Test GmbH"}

# Input:  {"CompanyName": "Corp Limited"}
# Output: {"name": "Corp Ltd."}

Usage with Polars expressions:

from fluvo.lib import clean_expr

mapping = {
    "name": clean_expr.company_suffix("CompanyName"),
}

Custom suffix mapping:

# Override with your own suffixes
custom_suffixes = {"xyz": "X.Y.Z.", "abc": "A.B.C."}
clean.company_suffix(suffixes=custom_suffixes)

# Or extend the default mapping
from fluvo.lib import clean
clean.COMPANY_SUFFIX_CANONICAL["myco"] = "MyCo."

String Cleaners

Function

Description

Example

strip()

Remove leading/trailing whitespace

"  hello  ""hello"

normalize_space()

Collapse multiple spaces

"hello   world""hello world"

lower() / upper() / title()

Case conversion

Standard behavior

truncate(max_len)

Limit string length

truncate(5)("hello world")"hello"

default(value)

Default if empty/None

default("N/A")(None)"N/A"

Zip Code Cleaners

Function

Description

Example

zip_code()

Remove spaces and commas, filter e- prefix

"1234 AB""1234AB", "e-mail"None

zip_strip_prefix()

Remove country prefix

"NL-1234AB""1234AB"

The zip_code() cleaner:

  • Removes all spaces and commas: "1234, AB""1234AB"

  • Filters out invalid values starting with e- (returns None): "e-12345"None

City & Street Cleaners

Function

Description

Example

city()

Clean city name with title case

"amsterdam (NH)""Amsterdam"

street()

Clean street address (preserves case)

"123 Main St (Apt 4)""123 Main St"

The city() cleaner:

  • Strips whitespace and normalizes to title case: "amsterdam""Amsterdam"

  • Removes parenthetical notes: "Amsterdam (Noord-Holland)""Amsterdam"

  • Removes trailing postal codes: "Amsterdam 1012 AB""Amsterdam"

  • Removes leading/trailing punctuation: ",Amsterdam.""Amsterdam"

  • Collapses multiple spaces: "New   York""New York"

  • Filters out invalid values starting with e- (returns None)

The street() cleaner:

  • Strips whitespace: "  123 Main St  ""123 Main St"

  • Removes parenthetical notes: "123 Main St (Apt 4)""123 Main St"

  • Removes leading/trailing punctuation: ",123 Main St.""123 Main St"

  • Collapses multiple spaces: "123   Main   St""123 Main St"

  • Preserves original case (unlike city())

  • Filters out invalid values starting with e- (returns None)

Numeric Cleaners

Function

Description

Example

digits()

Keep only digits

"abc123def""123"

numeric(decimal, thousands)

Parse decimal number

numeric(",", ".")("1.234,56")"1234.56"

Chaining Cleaners

Use pipe() to chain multiple cleaners:

from fluvo.lib import clean

# Chain cleaners left-to-right
mapping = {
    "code": mapper.val("Code", postprocess=clean.pipe(
        clean.strip(),
        clean.upper(),
        clean.truncate(10),
    )),
}

Stateful Cleaners

Some cleaners share data between fields. For example, deriving a website from an email domain:

from fluvo.lib import mapper, clean

mapping = {
    # email() stores the domain in shared state
    "email": mapper.val("Email", postprocess=clean.email()),
    # website_from_email() reads from state if website is empty
    "website": mapper.val("Website", postprocess=clean.website_from_email()),
}

# Input:  {"Email": "john@acme.com", "Website": ""}
# Output: {"email": "john@acme.com", "website": "https://www.acme.com"}

Common email providers (gmail.com, yahoo.com, etc.) are automatically filtered out.

Extending Constants

All default constants can be extended:

from fluvo.lib import clean

# Add your own email providers to exclude
clean.COMMON_EMAIL_PROVIDERS.add("yourcompany.com")

# Add placeholder names to filter
clean.COMMON_FILTER_NAMES.add("internal")

# Or override per-call
clean.name_filter_common(filter_names={"test", "demo", "acme"})

Available constants:

  • COMMON_EMAIL_PROVIDERS: Email domains to exclude from website derivation

  • COMMON_FILTER_NAMES: Placeholder names to filter out

  • TITLES: Titles to strip (Mr., Mrs., Dr., etc.)

  • VAT_EXEMPT_VALUES: Values indicating VAT exemption

  • PHONE_COUNTRY_RULES: Country-specific phone normalization rules

  • COMPANY_SUFFIX_CANONICAL: Business entity suffix mappings (BV → B.V., etc.)


GeoNames Integration

The geonames module provides utilities to download, cache, and query GeoNames data for geographic lookups. This is useful for city-to-country mapping, postal code validation, and coordinate lookups.

Why GeoNames?

Instead of hardcoding city/country mappings in the library (which become stale), you can use GeoNames data:

  • Comprehensive: 25,000+ cities with population > 15,000

  • Up-to-date: Data is downloaded from the official source

  • Cached: Downloaded once, reused across environments

  • Full data: Includes coordinates, population, timezone, alternate names

Basic Usage

from fluvo.lib import geonames, clean

# Load cities (downloads and caches on first use)
cities = geonames.get_cities_lookup()

# Use with detect_country
clean.detect_country(city="Amsterdam", cities=cities)  # Returns: 'NL'
clean.detect_country(city="Den Haag", cities=cities)   # Returns: 'NL' (alternate name)
clean.detect_country(city="Париж", cities=cities)      # Returns: 'FR' (Russian alternate)

Available Datasets

Dataset

Cities

Size

Use Case

cities15000

~25k

~5MB

Most imports (default)

cities5000

~50k

~10MB

More coverage

cities1000

~150k

~35MB

Comprehensive

cities500

~200k

~50MB

Maximum coverage

Loading City Data

import polars as pl
from fluvo.lib import geonames

# Load as Polars DataFrame for analysis
df = geonames.load_cities(dataset="cities15000", min_population=100000)

# Available columns: name, asciiname, alternatenames, latitude, longitude,
# country_code, population, timezone, and more

# Filter to specific country
dutch_cities = df.filter(pl.col("country_code") == "NL")

# Get coordinates
geonames.get_city_coordinates("Paris", country="FR")
# Returns: (48.85341, 2.3488)

Postal Code Lookups

from fluvo.lib import geonames

# Load postal codes for specific countries
lookup = geonames.get_postal_lookup(["NL", "BE", "DE"])

# Lookup place name by postal code
lookup["NL"]["1012AB"]  # Returns: 'Amsterdam'
lookup["BE"]["1000"]    # Returns: 'Bruxelles'

Caching

Data is automatically cached in ~/.cache/fluvo/geonames/:

from fluvo.lib import geonames

# Check cache directory
cache_dir = geonames.get_cache_dir()

# Force re-download
geonames.download_dataset("cities15000", force=True)

Integration Example

Combining GeoNames with detect_country for smart country detection:

from fluvo.lib import geonames, clean, mapper

# Load city lookup once at the start
cities = geonames.get_cities_lookup()

def detect_partner_country(row, state):
    """Detect country from available fields."""
    return clean.detect_country(
        phone=row.get("Phone"),
        postal=row.get("Zip"),
        city=row.get("City"),
        cities=cities,  # Pass the GeoNames lookup
    )

mapping = {
    "name": mapper.val("Name"),
    "country_id/id": mapper.val("Country", postprocess=lambda v, s:
        f"base.{detect_partner_country(s, s).lower()}" if detect_partner_country(s, s) else ""
    ),
}