Loading tool...
Find and report empty cells and missing values
Generate realistic fake data for testing and development. Create names, usernames, emails, addresses, phone numbers, and more. Export to JSON or CSV format
Validate CSV data against custom rules
Automatically detect data types in CSV columns including integers, floats, dates, emails, URLs, phone numbers, booleans, UUIDs, and IP addresses. Understanding your data's actual types is essential for proper database design, type-aware processing, and data quality assessment. Manual type inspection is tedious and error-prone, especially with large datasets. This tool automatically analyzes columns and identifies predominant types with confidence scoring showing what percentage of values match the detected type. Confidence scores reveal data quality issues—if a numeric column shows 90% confidence, the remaining 10% are anomalies worth investigating. Export schema suggestions mapping detected types to SQL data types for direct database table creation. Perfect for planning database imports, understanding unfamiliar data, and validating data quality before processing.
Automatically generate database schema suggestions by detecting column types, enabling faster creation of properly typed tables.
Understand data types before import to configure import settings correctly, ensuring type conversion happens as expected.
Identify data quality issues by analyzing type consistency—low confidence scores in supposedly numeric columns indicate anomalies.
Generate data type documentation and metadata for datasets, improving understanding of data structure for new team members.
Understand column types to configure API endpoints, ETL pipelines, and data integration properly.
Determine correct type configurations for business intelligence tools and analytical platforms that require schema information.
Data type detection, also known as type inference or data profiling, is the process of automatically determining the semantic type of values stored as untyped strings in flat file formats like CSV. This capability bridges the gap between CSV's typeless nature—where every value is simply a sequence of characters—and the strongly typed schemas required by databases, programming languages, APIs, and analytical tools. The challenge lies in interpreting ambiguous string representations correctly: "123" could be an integer, a string identifier, or a ZIP code; "01/02/03" could be a date in multiple formats; "true" could be a boolean or a label.
Type inference algorithms analyze value patterns against a hierarchy of type definitions, from specific types (email addresses, IP addresses, UUIDs) to general types (numbers, dates, text). The detection process typically examines all values in a column, applying regular expression patterns and parsing attempts to classify each value. The predominant type—the one matching the highest percentage of values—becomes the column's inferred type. This majority-wins approach handles the reality that data columns frequently contain a small percentage of anomalous values: a mostly-numeric column with a few "N/A" entries should still be typed as numeric.
Confidence scoring transforms type detection from a binary classification into a nuanced quality assessment. A column where 100% of values parse as integers is unambiguously numeric, while a column with 85% numeric values and 15% text values suggests data quality issues that need investigation. The non-conforming values may be legitimate exceptions, data entry errors, or indicators that the column actually contains mixed data types. Confidence thresholds help automate decisions: columns above 95% confidence might be automatically typed, while those between 80% and 95% are flagged for human review.
The mapping from detected types to database-specific data types involves platform-specific knowledge. An integer column might map to INT, BIGINT, or SMALLINT depending on the value range. A decimal column requires precision and scale specifications, such as DECIMAL(10,2) for monetary values. Date columns map differently across databases: DATE, DATETIME, TIMESTAMP, or DATETIME2 depending on the platform. String columns require length specifications—VARCHAR(255) versus VARCHAR(MAX) versus TEXT—informed by the maximum observed value length in the data.
Advanced type detection extends beyond primitive types to semantic types. Email addresses follow the pattern defined in RFC 5322, URLs conform to RFC 3986, phone numbers match E.164 or regional formats, IP addresses follow IPv4 or IPv6 conventions, and UUIDs match the RFC 4122 hexadecimal pattern. Detecting these semantic types enables richer schema design, input validation rule generation, and data quality assessment, providing significantly more value than simple primitive type classification.
The detector identifies integers, floats, dates, emails, URLs, phone numbers, booleans, UUIDs, IP addresses, and plain text. Each column is analyzed against all type patterns to find the best match.
The confidence score indicates what percentage of non-empty values in a column match the detected type. A score of 95% means that 95% of values conform to that type, with 5% being exceptions or errors.
Yes, the tool provides schema suggestions that map detected types to common SQL data types. You can export this as a starting point for your CREATE TABLE statement or use the CSV to SQL tool directly.
When a column contains mixed types, the tool reports the dominant type along with a lower confidence score. It also shows a breakdown of the different types found so you can identify data quality issues.
All processing happens directly in your browser. Your files never leave your device and are never uploaded to any server.