欢迎各位兄弟 发布技术文章

这里的技术是共享的

You are here

hashbytes_T-SQL中的HashBytes函数 有大用

hashbytes

One of the paramount ways to guard data within a database is to utilize database encryption. However, no one encryption solution is perfect for all databases. Which encryption solution you select totally depends on the requirements of your application. Note that more powerful encryption for larger amounts of data requires a healthy amount of CPU. So, be prepared in the event that that introduction of encryption increases the system load.

保护数据库中数据的最重要方法之一是利用数据库加密。 但是,没有一种加密解决方案适合所有数据库。 您选择哪种加密解决方案完全取决于您的应用程序要求。 请注意,对大量数据进行更强大的加密需要健康的CPU数量。 因此,如果引入加密会增加系统负载,请做好准备。

This article will start with the divergence of hashing and encryption, and give all the details of the HashBytes function used in T-SQL

本文将从散列和加密的分歧开始,并提供T-SQL中使用的HashBytes函数的所有详细信息

In addition, I’ll discuss …

另外,我将讨论...

  • Details about the function with syntaxes and clear examples


    有关具有语法和清晰示例的函数的详细信息

  • How to store and check passwords with hashbytes function


    如何使用哈希字节功能存储和检查密码

  • How to deal with the restriction of the return value 8000 bytes limit


    返回值限制8000字节限制如何处理

  • Some restrictions such as Collation difference, or data type with Unicode data


    一些限制,例如排序规则差异或带有Unicode数据的数据类型

散列与加密 (Hashing versus Encryption)

There are two main methodologies to safeguard your data: hashing and encryption. Encryption is accomplished via one of several different algorithms that return a value that can be decrypted through the correct decryption key. Each of the different encryption options provides you with a different strength of encryption. As I have mentioned earlier, the stronger level of encryption you use, the greater the CPU load on the Microsoft SQL Server.

保护数据的主要方法有两种:散列和加密。 加密是通过几种不同的算法之一完成的,这些算法返回可以通过正确的解密密钥解密的值。 每个不同的加密选项都为您提供了不同的加密强度。 如前所述,您使用的加密级别越强,Microsoft SQL Server上的CPU负载就越大。

However, if we can talk about hashing values, we are mainly referring to hashing algorithms. Hashing algorithms provide us a one-way technique that has been used to mask data, in which we have a minimal chance that someone could reverse the hashed value back to the original value. And with hashed techniques, every time you hash the original value you get the same hashed value.

但是,如果我们可以讨论哈希值,则主要是指哈希算法。 散列算法为我们提供了一种用于掩盖数据的单向技术,在这种方法中,我们极少有机会有人将散列的值反转回原始值。 使用散列技术时,每次对原始值进行散列时,您将获得相同的散列值。

支持的算法 (Supported algorithms)

Microsoft SQL Server has supported the same hashing values from Microsoft SQL Server 2005 to Microsoft SQL Server 2008 R2. You can use MD2, MD4, MD5, SHA, or SHA1 to create hashes of your data. These algorithms are limited up to 20 bytes only.

Microsoft SQL Server支持从Microsoft SQL Server 2005到Microsoft SQL Server 2008 R2相同的哈希值。 您可以使用MD2,MD4,MD5,SHA或SHA1创建数据的哈希。 这些算法最多只能限制20个字节。

In SQL Server 2012, we have an enhancement in this function and now it supports SHA2_256, SHA2_512 algorithms that can generate 32 and 64 bytes hash codes for the respective input.

在SQL Server 2012中,我们对该功能进行了增强,现在它支持SHA2_256,SHA2_512算法,可以为相应的输入生成32和64字节的哈希码。

Beginning with SQL Server 2016, all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

从SQL Server 2016开始,不推荐使用SHA2_256和SHA2_512以外的所有算法。 较旧的算法(不建议使用)将继续工作,但是会引发弃用事件。

T-SQL中的HashBytes函数 (The HashBytes function in T-SQL )

Hashing can be created, regardless of the algorithm used, via the HashBytes system function. A hash is an essential calculation based on the values of the input, and two inputs that are the same ought to produce the same hash.

无论使用哪种算法,都可以通过HashBytes系统函数创建哈希。 哈希是基于输入值的必要计算,两个相同的输入应该产生相同的哈希。

句法 (Syntax)

If we talk about the syntax for SQL Server, Azure SQL Database, Azure SQL Data Warehouse, and Parallel Data Warehouse the below images describe the syntax in detail.

如果我们谈论SQL Server,Azure SQL数据库,Azure SQL数据仓库和并行数据仓库的语法,则以下图像详细描述了该语法。


The key word in syntax

语法中的关键词

‘<algorithm>’
In this, we have to initiate the hashing algorithm to be used to hash the input. This is a mandatory field with no default. And its also requires a single quotation mark. Starting with SQL Server 2016, all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event.

'<算法>'
在这种情况下,我们必须启动用于对输入进行哈希处理的哈希算法。 这是必填字段,没有默认值。 并且它还需要单引号。 从SQL Server 2016开始,不推荐使用SHA2_256和SHA2_512以外的所有算法。 较旧的算法(不建议使用)将继续工作,但是会引发弃用事件。

@input
In this, we have to specify a variable that contains the data to be hashed. @input is varcharnvarchar, or varbinary.

@输入
在此,我们必须指定一个变量,其中包含要散列的数据。 @inputvarcharnvarcharvarbinary

‘input’
this specifies an expression that evaluates to a character or binary string to be hashed.

“输入”
它指定一个表达式,该表达式的结果为要哈希的字符或二进制字符串。

The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256 and 512 bits (64 bytes) for SHA2_512.

输出符合算法标准:MD2,MD4和MD5为128位(16字节); SHA和SHA1为160位(20字节); SHA2_256为256位(32字节),SHA2_512为512位(64字节)。

The below image depicts all supported algorithms with their respective lengths

下图描绘了所有受支持的算法及其各自的长度


The HashBytes function accepts two values: the algorithm to use and the value to get the hash for.

HashBytes函数接受两个值:要使用的算法和要获取其哈希值的值。

The HashBytes system function does not support all data types that Microsoft SQL Server supports before SQL server 2016. The biggest problem with this lack of support is that the HashBytes function doesn’t support character strings longer than 8000 bytes (For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.)

HashBytes系统函数不支持Microsoft SQL Server在SQL Server 2016之前支持的所有数据类型。缺少支持的最大问题是HashBytes函数不支持超过8000字节的字符串(对于SQL Server 2014及更早版本) ,则允许的输入值限制为8000个字节。)

To be more specific, when using ASCII strings with the CHAR or VARCHAR data types, the HashBytes system function will accept up to 8000 characters. When using Unicode strings with the NCHAR or NVARCHAR data types, the HashBytes system function will accept up to 4000 characters. We will provide a solution how to come up with this particular restriction.

更具体地说,当将ASCII字符串与CHAR或VARCHAR数据类型一起使用时,HashBytes系统函数最多可以接受8000个字符。 当使用具有NCHAR或NVARCHAR数据类型的Unicode字符串时,HashBytes系统函数将接受最多4000个字符。 我们将提供一个解决方案,以解决这一特殊限制。

HashBytes <2014年 (HashBytes < 2014)

I will now explain the HashBytes function as it existed before 2014, not allowing large string in SQL Server and its solution.

我现在将解释HashBytes函数,因为它在2014年之前就已存在,不允许在SQL Server及其解决方案中使用大字符串。

Let’s quickly look at the example.

让我们快速看一下示例。

  1. CREATE TABLE dbo.Sql_Shack_Demo (Filed NVARCHAR(MAX));  
  2. INSERT  into Sql_Shack_Demo Select Replicate('Temp',50001)
  3. SELECT hashbytes('MD2',Filed) From Sql_Shack_Demo

The above code will throw the exception “String or binary data would be truncated.” as shown below:

上面的代码将引发异常“字符串或二进制数据将被截断”。 如下所示:


To overcome the limitation, I have come up with a solution, a SQL function, to break down the string into multiple sub-parts and apply the hashing separately and later re-constitute them back into a single string

为了克服此限制,我提出了一个解决方案,即SQL函数,将字符串分解为多个子部分,分别应用哈希,然后将它们重新构造为单个字符串

The script is as below:

脚本如下:

  1. Create Or Alter FUNCTION [dbo].[Get_HASH_val_for_Large_String]
  2. (  
  3.     @St_val nvarchar(max)
  4. )
  5.  
  6. RETURNS varbinary(200)
  7.  
  8. AS
  9. BEGIN
  10.  
  11. Declare @Int_Len as integer
  12. Declare @varBina_Val as varbinary(20)
  13. Declare @Max_len int  = 3999
  14.     
  15. Set @Int_Len = len(@St_val)
  16.     if @Int_Len > @Max_len
  17.     
  18. Begin
  19. ;With hashbytes_val
  20. as
  21. (
  22. Select substring(@St_val,1, @Max_len) val, @Max_len+1 as st, @Max_len lv,
  23.    hashbytes('SHA1', substring(@St_val,1, @Max_len)) hashval
  24. Union All
  25. Select substring(@St_val,st,lv), st+lv ,@Max_len  lv,
  26.    hashbytes('SHA1', substring(@St_val,st,lv) + convert( varchar(20), hashval ))
  27. From hashbytes_val where Len(substring(@St_val,st,lv))>0
  28. )
  29. Select @varBina_Val = (Select Top 1 hashval From hashbytes_val Order by st desc)
  30. return @varBina_Val
  31. End
  32. else
  33.     Begin
  34. Set @varBina_Val = hashbytes('SHA1', @St_val)
  35. return @varBina_Val
  36.     End
  37. return NULL
  38. END

Now we execute scripts and the error has vanished.

现在我们执行脚本,错误消失了。

  1. CREATE TABLE dbo.Sql_Shack_Demo (Filed NVARCHAR(MAX));  
  2. INSERT  into Sql_Shack_Demo Select Replicate('Temp',50001)
  3. SELECT dbo.Get_HASH_val_for_Large_String (Filed) From Sql_Shack_Demo


I hope this will help you whenever you may need to generate hashes for larger strings in SQL Server versions prior to 2014

我希望这对您有所帮助,只要您可能需要在2014年之前SQL Server版本中为较大的字符串生成哈希值

使用加密检查和存储密码 (Checking and storing Passwords with Encryption)

First of all, we have to make sure that the field or column we have used to preserve password for store the hash code is of data type varbinary. Then, use the HashBytes function in the insert statement to generate the hash for the password and store it in the column. Below is the salient example of storing a password in hash code with SHA2 512 algorithm and comparing the hash-coded password in a select statement.

首先,我们必须确保用于保存密码以存储哈希码的字段或列的数据类型为varbinary。 然后,在insert语句中使用HashBytes函数生成密码的哈希并将其存储在列中。 以下是使用SHA2 512算法在哈希代码中存储密码并在select语句中比较哈希编码的密码的显着示例。

(Example)

  1. CREATE TABLE [dbo].[Users_detail](
  2.     [ID] [int] NOT NULL,
  3.     [Name] [nvarchar](20) NOT NULL,
  4.     [Password] [varbinary](150) NOT NULL
  5. ) ON [PRIMARY]
  6. GO
  7. Insert into Users_detail values (1, 'sql_shack_User1', HASHBYTES('SHA2_512', 'Sanya'))
  8. Insert into Users_detail values (2, 'sql_shack_User2', HASHBYTES('SHA2_512', 'Luna'))
  9. GO
  10. Select * from Users_detail
  11. GO
  12.  
  13. CREATE or alter FUNCTION Password_Check
  14. (
  15. @v1 varchar(500)
  16. )
  17. RETURNS  varchar(7)
  18. AS
  19. BEGIN
  20. DECLARE @result varchar(7)
  21.  
  22.  
  23. Select
  24. @result  =( CASE when [Password] =  HASHBYTES('SHA2_512', @v1) THEN 'Valid'
  25. ELSE 'Invalid'
  26. END )
  27. from Users_detail
  28. where  [Password] =  HASHBYTES('SHA2_512', @v1)
  29.  
  30. RETURN @result
  31. END
  32. GO
  33. SELECT [dbo].[Password_Check] ('Sanya')

一些限制 (Some restrictions)

We should review some gray areas before uses it.

在使用之前,我们应该检查一些灰色区域。

First, we have to take care of the collation; if the collation is different, the output will be different.

首先,我们必须注意整理。 如果排序规则不同,则输出将不同。

Second, the column definition, if we used same data type but the length is different, then there should be the same result. To elaborate in depth, review below example of VARCHAR(50) and VARCHAR(100), we can test the output:

其次,列定义,如果我们使用相同的数据类型但长度不同,那么结果应该相同。 为了进一步详细说明,请查看下面的VARCHAR(50)和VARCHAR(100)示例,我们可以测试输出:


Both SELECT statements return the similar hashes value such as below:

两个SELECT语句都返回类似的哈希值,如下所示:

0x640AB2BAE07BEDC4C163F679A746F7AB7FB5D1FA

0x640AB2BAE07BEDC4C163F679A746F7AB7FB5D1FA

However, although, it has a similar data type, one should be aware that VARCHAR and NVARCHAR will not produce the same HashBytes value, even with the same string. To illustrate, review the following select statements.

但是,尽管它具有相似的数据类型,但应注意,即使使用相同的字符串,VARCHAR和NVARCHAR也不会产生相同的HashBytes值。 为了说明,请查看以下select语句。


It can be noted that collations are only reviewed when we compare values between two files. The reason why the n[var]char produces a different result because it’s two bytes per character whilst the[var]char is a single byte per character.

可以注意到,只有在我们比较两个文件之间的值时才检查归类。 nchar产生不同结果的原因是,每个字符两个字节,而char是每个字符单个字节。

For further understating, HashBytes, as the name implies, hashes a set of bytes, and so the two inputs return different results.

为了进一步简化,顾名思义,HashBytes散列了一组字节,因此两个输入返回不同的结果。

结论 (Conclusion)

After reviewing all the points elaborated above it can be said that, there should be another hash for the similar values by using different algorithms.

在回顾了上面阐述的所有要点之后,可以说,应该使用不同的算法为相似的值提供另一个哈希。

Furthermore, it can be easily seen if something has changed while comparing the same string to itself if hashes algorithm is different than it is the exception. Especially, I prefer to use it for password protection, so, one can hash a password in the database table and then have the user enter their own version, hash it, and compare the results. In this way, the system end (front end) never knows the value. Just make sure to double check you have used same the same algorithm

此外,如果哈希算法不同于它,则可以很容易地看出在将相同字符串与自身进行比较时是否有所更改。 特别是,我更喜欢将其用于密码保护,因此,可以在数据库表中对密码进行哈希处理,然后让用户输入自己的版本,对其进行哈希处理,然后比较结果。 这样,系统端(前端)永远不会知道该值。 只要确保仔细检查您是否使用了相同的算法

翻译自: https://www.sqlshack.com/the-hashbytes-function-in-t-sql/

hashbytes

文章知识点与官方知识档案匹配,可进一步学习相关知识
算法技能树首页概览61220 人正在系统学习中
普通分类: