This blog is a bit different from my recent blogs. For this one, I’m assuming the reader already knows a bit of C# and Entity Framework. I’m going to discuss some do’s and do-not’s when communicating to a database. I will be using the source code from my first blog which is on Github here. (Note, I updated Github to contain this new code.)
// Never do this with large datasets! (think more than 10k items)
public async Task<List<Enrollment>> GetAllEnrollments()
{
var result = await db.Enrollments.Include(x => x.Course)
.AsNoTracking()
.Include(x => x.Student)
.OrderBy(x => x.EnrollmentID)
.ToListAsync();
return result;
}
In the above sample, I’ve added this function to the SchoolServic.cs file, and all of the Enrollments are returned with all the Courses and Students that each Enrollment contains. Since the code does not modify the Table, it uses the AsNoTracking extension method which can lead to a performance boost. However, if you used the seed code to create the SQL database then this will return 10k enrollments and is query is inefficient. TIP: Never just return a full table unless you’re certain that there aren’t very many records in that table (the exact number is platform dependent, but I’d say test with 250, 500, and 1k Enrollments). Benchmark it with K6 or Benchmark.net.
Instead, use pagination:
public async Task<List<Enrollment>> GetEnrollments(int start, int count)
{
var result = await db.Enrollments.Include(x => x.Course)
.AsNoTracking()
.Include(x => x.Student)
.OrderBy(x => x.EnrollmentID)
.Skip(start)
.Take(count)
.ToListAsync();
return result;
}
Notice above, the query takes two parameters – a start position and a count (how many to return). This is better than just returning the entire table. However, it still has the problem that the user can set count to a really high number. If this service feeds an API, then I’d check that the count is not greater than some constant and, if so, return a Bad Request. Maybe like 100-1000 records max. Or, you could always check the count variable in this function and just return an empty list if it’s too high.
public async Task CallFuncOnEnrollment(Func<Enrollment, Task> func)
{
foreach (var enrollment in db.Enrollments.AsNoTracking()
.Include(x => x.Course)
.Include(x => x.Student)
.OrderBy(x => x.EnrollmentID)
{
await func(enrollment);
}
}
This one calls an asynchronous function for each enrollment in the table. Use this one for processing larger lists. AsNoTracking reduces the footprint of the objects returned and they will periodically get garbage collected when the enrollment object goes out of scope.
What if you need to return a bunch of Enrollments? The naïve way is to do something like this:
public async Task<List<Enrollment>> GetEnrollments(List<long> enrollmentIds)
{
var result = await db.Enrollments.AsNoTracking()
.Include(x => x.Course)
.Include(x => x.Student)
.Where(x=> enrollmentIds.Contains(x.EnrollmentID))
.OrderBy(x=> x.EnrollmentID)
.ToListAsync();
return result;
}
Beware of these types of queries. This “Contains” call performs a SQL Where IN[] on the table and can be very inefficient. I generally limit the size of the enrollmentIds list to be 1k or less(depending on table and CPU size).
If you must return a thousand or more records like in the previous example (and your database CPU isn’t being overrun). You can implement what I like to call the Split Task DB-Factory pattern:
First, we need to setup the Split extension method for the enrollment IDs.
public static class ListExtensions
{
public static IEnumerable<List<T>> Split<T>(this List<T> source, int size = 500)
{
for (int i = 0; i < source.Count; i += size)
{
yield return source.GetRange(i, Math.Min(size, source.Count - i));
}
}
}
This simply takes a list of anything and returns buckets, each having a count of size. Size is pretty arbitrary depending on CPU and table size. I like to start with 500 records at a time and go from there. (Goal being to have a result within 500ms).
Next, we need to create a database factory in the Program.cs file:
var dbOptions = new DbContextOptionsBuilder<SchoolContext>()
.UseSqlServer(dbString)
.Options;
builder.Services.AddSingleton(ctx => new Func<SchoolContext>(() => new SchoolContext(dbOptions)));
This creates a Singleton Function that returns the SchoolContext. So you can use this to create as many database connections as you see fit (more on this later).
Next, let’s add to the School Service. First, modify the constructor to take the function to create new database objects:
private readonly SchoolContext db;
private readonly Func<SchoolContext> dbFactory;
public SchoolService(SchoolContext db, Func<SchoolContext> dbFactory)
{
this.db = db;
this.dbFactory = dbFactory;
}
This is the constructor and fields in the SchoolService.cs file. It simply takes the database and database factory and sets them to private fields respectively.
Now, let’s implement the new function:
public async Task<List<Enrollment>> GetEnrollmentsSplit(List<long> enrollmentIds)
{
var tasks = new List<Task>();
var results = new List<Enrollment>();
foreach (var enrollmentIdsSplit in enrollmentIds.Split())
{
var sw2 = Stopwatch.StartNew();
tasks.Add(Task.Run(() =>
{
using (var dbc = dbFactory())
{
var firstResult = dbc.Enrollments.AsNoTracking()
.Include(x => x.Course)
.Include(x => x.Student)
.Where(x => enrollmentIdsSplit.Contains(x.EnrollmentID))
.OrderBy(x => x.EnrollmentID)
.ToList();
lock (results)
{
results.AddRange(firstResult);
}
}
}));
}
await Task.WhenAll(tasks);
return results;
}
There’s quite a bit here and I’ll explain the best I can. First, I create a List of Tasks. Next, I create the list of results. Then, I iterate through the enrollment Ids, using the Split extension method that we created earlier which creates lists of enrollments, each containing 500 enrollments. From there, I add to the tasks list, via Task.Run and inside of that, I retrieve the enrollments. From there, I lock on the enrollments list and add the retrieved enrollments. Then, I await all of the tasks generated previously. Finally, I return the results.
So what does this do? This simply splits the enrollments list into smaller (which will be faster if the CPU(s) can handle it) buckets and runs them at the same time. So, we’re using more than one database connection, which will be faster, at the cost of more work for the database db. That’s pretty cool! (BTW – I use this technique for Automapper all of the time.) NOTE: you should benchmark this for your particular application. The query must be slow enough to make up for the thread context switch.
Thanks for coming by and I hope you learned something!