Aerospike Connect for Spark Release Notes

  • 3.0.1
    Release Date: February 24, 2021
    • [CONNECTOR-110] - Spark 3.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.

    Known Issues

    • DataSource v2 API does not support the SQL statement “INSERT INTO” a temp view. Use DataFrame syntax for equivalent functionality.
    • Streaming write does not work with Apache Spark 3.1.0.
    • Streaming update trait SupportsStreamingUpdate from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend in Spark 3.1.0.

  • 3.0.0
    Release Date: February 18, 2021
    • [CONNECTOR-103] - Extend support for Apache Spark 3.0.0 Data Source V2.

    Known Issues

    • DataSource v2 API does not support the SQL statement “INSERT INTO” a temp view. Use DataFrame syntax for equivalent functionality.
    • Streaming write does not work with Apache Spark 3.1.0.
    • Streaming update trait SupportsStreamingUpdate from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend in Spark 3.1.0.
    • We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 3.0.1.

    New Features

    • Data Source V2 implementation for Apache Spark 3.0.0.

  • 2.7.2
    Release Date: February 24, 2021
    • [CONNECTOR-111] - Spark 2.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.

    Known Issues

    • DataSource v2 API does not support the SQL statement “INSERT INTO” a temp view. Use DataFrame syntax for equivalent functionality.

  • 2.7.1
    Release Date: January 25, 2021
    • [CONNECTOR-105] - Fixed a TLS issue in the Aerospike Spark 2.7.0 release.

    Known Issues

    • DataSource v2 API does not support the SQL statement “INSERT INTO” a temp view. Use DataFrame syntax for equivalent functionality.
    • We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 2.7.2.

  • 2.7.0
    Release Date: January 19, 2021
    • Datasource V2 implementation.
    • Tested with Aerospike Enterprise Edition Database version 5.2.0 & Apache Spark version 2.4.0.

    Known Issues

    • DataSource v2 API does not support the SQL statement “INSERT INTO” a temp view. Use DataFrame syntax for equivalent functionality.
    • We have observed that configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect.

    New Features

    • [CONNECTOR-96] - Upgrade DataSource APIs used in the Spark Connector to v2.
    • [CONNECTOR-101] - Spark Feature file verification expires one day early.

    Improvements

    • Aerospike datasource format can be specified with brevity.

  • 2.6.0
    Release Date: October 29, 2020
    • Support Writes in Spark SQL Format.
    • Tested with Aerospike Enterprise Edition Server version 5.2.0 & Apache Spark version 2.4.0.

    New Features

    • [CONNECTOR-94] - Support Writes in Spark SQL Format.

    Improvements

    • Aerospike datasource format can be specified with brevity.

  • 2.5.0
    Release Date: October 14, 2020
    • Flexible schema support in spark, to read mixed data types from aerospike bin.
    • Tested with Aerospike Enterprise Edition Server version 5.2.0 & Apache Spark version 2.4.0.

    New Features

    • [CONNECTOR-85] - Support records with a different number of bins and types in a set.
    • [CONNECTOR-82] - Support pushdown of spark datetype and timestamptype.

    Improvements

    • Additional error handling to address underflow and overflow in Short, Int, and Float types.

  • 2.4.0
    Release Date: September 3, 2020
    • Extended primary key types support.

    New Features

    • Introduced a flag aerospike.keyType, to hint primary key type during schema inference.

  • 2.3.1
    Release Date: July 16, 2020
    • Fixed a broken API to create AerospikeConfig instance.

  • 2.3.0
    Release Date: June 19, 2020
    • Nested updateByKey support and prioritizing __digest, __ttl, __generation filters.

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.
    • updateByKey only supports keys which are accepted by the Java client.

    New Features

    • Record insertion can be done by nested updateByKey.
    • Spark Filters are rearranged such that __digest, __ttl, __generation are always in the beginning, if present.

  • 2.2.0
    Release Date: May 12, 2020

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.
    • The default value of aerospike.partition.factor has changed to 12 from 0.
      • Previous to version 2.2, the number of aerospike partitions were computed by 4096 >> f, where f is the aerospike.partition.factor.
      • From version 2.2 onwards, the number of aerospike partitions will be computed by 2^f, where f is the aerospike.partition.factor.

    New Features

    • Ability to extend aerospike partitions up to 32768 (2^15).
    • Ability to specify the target set for spark write operations through the aerospike.writeset flag.

  • 2.1.0
    Release Date: April 28, 2020

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.

    New Features

    • Added capability of streaming writes to Aerospike.

  • 2.0.0
    Release Date: April 15, 2020

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.

    New Features

    • Ability to fine tune up to 4096 scan partitions concurrently.
      • This can be further tuned by setting the aerospike.partition.factor value appropriately.
    • TLS and LDAP support.
    • Ability to query multiple primary keys through connector.

    Improvements

    • Query engine improvements.
    • Ability to specify seed nodes through Aerospike configuration.
    • Ability to specify feature file from configuration or HDFS.
    • Improved error handling in case of write/save failure.
    • Ability to enable client-server compression in spark connector.
    • Ability to set records per second for scans.
    • Fixed issue of duplicate data accumulation in primary key call.

  • 1.1.2
    Release Date: October 21, 2019

    Known Issues

    • Primary key call will fetch mutiple copies of record, hence accumulating duplicate data.

    New Features

    • Added explicit schema for saves.

  • 1.1.0
    Release Date: March 26, 2019
    • Initial Standalone Connector General Availability release.
    • Embedded Spark update.

    New Features

    • Spark 2.4.0 support.
    • Added dataset aeroIncrease function which enables dataset send add/increment operations to Aerospike server.

  • 1.0.0
    Release Date: March 12, 2019
    • Initial Embedded Spark General Availability release.

    New Features

    • Reading from Aerospike to a DataFrame/Dataset.
    • Saving a DataFrame/Dataset to Aerospike.
    • Spark SQL multiple filters pushed down to the Aerospike cluster.
    • Support for Geo points-within-region query using Aerospike.
    • Join a Spark Dataset that contains record keys to record data stored in Aerospike.