Data Modification

Photo by RetroSupply on Unsplash

Data Modification

Adding data

SQL has a set of statements that are used to manipulate data called DML (Data Manipulation Language). I used to think that DML involves statements that only modify data however, DML also involves data retrieval. In this article, I'll be discussing data modification statements that are used to insert data namely INSERT and SELECT.



Inserting data

In this section, I'll discuss the different statements that are used to insert data into tables like INSERT VALUES, INSERT SELECT, INSERT EXEC, SELECT INTO, and BULK INSERT.


The INSERT VALUES statement

The INSERT VALUES statement is used to insert rows into a table based on the values you provide. Let's take a look at how the statement works using the snippets below (assuming you've set up your server). In this example, I'll be using a database called BikeStores

USE BikeStores 

IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL
  DROP TABLE dbo.Orders;

-- Create the Orders table in the dbo schema
CREATE TABLE dbo.Orders 
(
     ordered INT NOT NULL,
          CONSTRAINT PK_Orders PRIMARY KEY,
     orderdate DATE NOT NULL,
           CONSTRAINT DFT_orderdate DEFAULT(SYSDATETIME()),
     emptied INT NOT NULL,
     custom VARCHAR(10) NOT NUL
)

We can now insert some data into the created table using INSERT VALUES statement as shown below.

INSERT INTO dbo.Orders (ordered, orderdate, empid, custid)
   VALUES(10001, '20090314', 3, 'A')

Specifying the column names is optional but this allows you to control the value-column associations instead of relying on the order in which the columns were presented when the table was defined.

If you specify a value for a particular column then SQL Server will use that value. If you don't, SQL Server will first check if the column has a default value defined for that column and if it does, it'll use that value. If it doesn't and the column accepts NULL marks, then a NULL mark will be used. If a NULL mark is used in a column that does not accept NULL marks then the whole insert operation will fail. In the snippet below, data is inserted into dbo.Orders without the specifying any value for the orderdate column because a default value is specified for that column.

INSERT INTO dbo.Orders(ordered, emptied, custid)
   VALUES (10002, 5, 'B');


The Enhanced VALUES clause

SQL Server 2008 and SQL Server 2012 support an enhanced VALUES clause that allows you to specify multiple rows separated by commas. For example, let's insert 4 rows into dbo.Orders.

INSERT INTO dbo.Orders 
   (ordered, orderdate, empid, custid)
VALUES
     (10003, '20090213', 4, 'B'),
     (10004, '20090214', 1, 'A'),
     (10005, '20090213', 1, 'C'),
     (10006, '20090215', 3, 'C');

The statement is processed as an atomic operation, therefore if any row isn't successfully entered into the table, the whole operation fails.

Here's another feature of the enhanced VALUES clause. You can use it create a derived table.

SELECT * 
FROM ( VALUES 
                (10003, '20090213', 4, 'B'),
                (10004, '20090214', 1, 'A'),
                (10005, '20090213', 1, 'C'),
                (10006, '20090215', 3, 'C'))
      AS O(orderid, orderdate, empid, custid) -- An alias MUST be assigned to the table

The query produces the following output

orderid     orderdate   empid       custid
---------  ----------   -------      ------
10003       20090213    4              B
10004       20090214    1              A
10005       20090213    1              C
10006       20090215    3              C



The INSERT SELECT statement

The INSERT SELECT statement is used to insert a set of rows into a target table with a SELECT statement. Let's take a look at an example.

INSERT INTO dbo.Orders(orderid, orderdate, empid, custid)
    SELECT 
        FIRST_VALUE(order_id) 
          OVER(PARTITION BY order_id, customer_id, order_date
            ORDER BY customer_id, order_id
            ROWS BETWEEN UNBOUNDED PRECEDING 
            AND UNBOUNDED FOLLOWING) + 4000 AS orderid,

        FIRST_VALUE(order_date) 
          OVER(PARTITION BY order_id, customer_id, order_date
            ORDER BY customer_id, order_id
            ROWS BETWEEN UNBOUNDED PRECEDING 
            AND UNBOUNDED FOLLOWING) AS orderdate,

        2 AS empid, 

        FIRST_VALUE(customer_id) 
          OVER(PARTITION BY order_id, customer_id, order_date
            ORDER BY customer_id, order_id
            ROWS BETWEEN UNBOUNDED PRECEDING 
            AND UNBOUNDED FOLLOWING) AS custid


    FROM sales.joinOrdersAndItems
    GROUP BY order_id, customer_id, order_date
    ORDER BY orderid, custid, orderdate 
    OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY

The INSERT SELECT statement also allows you the option of specifying the target column names, and the recommendations provided earlier regarding specifying those names remains the same with this statement. This statement is also processed as an atomic operation.

Prior to the implementation of the enhanced feature of the VALUES clause, if you wanted a virtual table based on values, you'd have to use multiple SELECT statements, each returning a single row based on values, and unify the rows with the UNION ALL set operator. For example, the query below uses the SELECT statement and UNION ALL set operator to insert rows into dbo.Orders.

INSERT INTO dbo.Orders(orderid, orderdate, empid, custid)
  SELECT 10007, '20090215', 2, 'B' UNION ALL
  SELECT 10008, '20090215', 1, 'C' UNION ALL
  SELECT 10009, '20090216', 2, 'C' UNION ALL
  SELECT 10010, '20090216', 3, 'A';

The use of the SELECT statement without the FROM clause is non-standard. It's better to use a table constructor based on the VALUES clause.


The INSERT EXEC statement

This statement is used to insert the resultset of a stored procedure into a target table. It's very similar to the INSERT SELECT statement as shown in the query below.

-- Delete the procedure if it already exists
IF OBJECT_ID('sales.usp_getorders', 'P') IS NOT NULL 
  DROP PROC sales.usp_getorders;
GO

CREATE PROC sales.usp_getorders
  @orderid AS INT 
AS 

  SELECT orderid + 1 AS orderid, orderdate, empid, custid, qty 
  FROM dbo.Orders
  WHERE orderid = @orderid
GO

INSERT INTO dbo.Orders (orderid, orderdate, empid, custid, qty)
  EXEC sales.usp_getorders @orderid=4001;

By using this the INSERT EXEC statement, you can direct the results of the procedure to the dbo.Orders table.



The SELECT INTO statement

The SELECT INTO statement is a non-standard T-SQL statement that is used to create a new table that is populated by the result set of a query. Non-standard means that it is not part of the ISO and ANSI SQL standards. This statement cannot be used to insert data into an existing table. Use INSERT SELECT for that instead. Here's how to use the SELECT INTO statement to create a copy of dbo.Orders.

IF OBJECT_ID('dbo.newOrders', 'U') IS NOT NULL
   DROP TABLE dbo.Orders;

SELECT orderid, orderdate, empid, custid 
INTO dbo.newOrders 
FROM dbo.Orders;

-- Clean up
DROP TABLE dbo.newOrders

The structure of the target table and the data it contains is based on the source table. The SELECT INTO statement copies the following from the source table to the target.

  • Column names
  • Types
  • Nullability
  • Identity Property
  • The data

There are 4 things that the statement does not copy, namely:

  • Constraints
  • Triggers
  • Permissions
  • Indexes

You'll have to implement these features manually if required.


Logging

Minimal logging involves logging only the information that is required to recover a transaction without supporting point-in-time recovery. Check out this blog post to read more.

One of the benefits of the SELECT INTO statement is that you can run it in minimally logged mode by setting a database property called Recovery Model to FULL. The INSERT SELECT statement can also benefit from minimal logging based on certain criteria. For details see Prerequisites for Minimal Logging in Bulk Import


SELECT INTO with SETS

There's a slight variation in the syntax but the query does the same thing.

IF OBJECT_ID('dbo.uniontable') IS NOT NULL
  DROP TABLE dbo.People;


SELECT first_name, last_name, email
INTO dbo.People
FROM sales.customers 

UNION 

select first_name, last_name, email 
from sales.staffs



The BULK INSERT statement

The BULK INSERT statement is used to insert data from a file into an existing table. In the statement, you specify the target table, the source file, and options. Check out the documentation to find all the options. Let's add an order to dbo.Orders from a file called orders.txt.

# Create the file
echo "4022,'20160101',2,600,500" > orders.txt

Now run the following query to insert the order

BULK INSERT dbo.Orders FROM './orders.txt'
  WITH
      (
           DATAFILETYPE = 'char',
           FIELDTERMINATOR = ',',
           ROWTERMINATOR = '\n'
       );

The BULK INSERT statement can be run in minimally logged mode in certain scenarios provided some requirements are met.



Conclusion

In this article I discussed the use of the INSERT and SELECT statements to add data to tables. Furthermore, I highlighted a few standard and non-standard ways of adding data.