This isn’t too hard to do, but there are a couple important things you need to get right.
Firstly, avoid creating large python lists inside of loops when you have large amounts of data to iterate over (like mesh faces). Python in itself is actually pretty fast, but the memory overhead in creating large dynamic lists in every loop iteration is what will cause most of your slowdown. Adding to existing lists is usually OK, but copying, combining, or removing from lists (not by index) can get heavy because it is actually iterating under the hood.
Instead, use an iterator from the API, like MItMeshPolygon
. The difference here is that the iterator doesn’t construct any new lists at all, but rather gives you a handle to each polygon in memory as you’re iterating through the loop, so the overhead is minimal. Or, even better, if you can get the lists you need from any of the get functions on MFnMesh
, they’re created in C++ and all you have to do is read their data in python. I’ve used this method to process uvs for meshes with 30k+ verts in under 1/8th a second with no issues. For your use case though, I think MItMeshPolygon
is going to be the easiest way to access the data you need. Particularly I’d look at the following methods:
MItMeshPolygon.index() # get the index of the current face
MItMeshPolygon.getVertices() # get the face-relative indices of the current face's verts
MItMeshPolygon.vertexIndex() # get the object-relative indices of a particular vert
MItMeshPolygon.getPoints() # get the position of all the verts
You can use MItMeshPolygon
like so…
# get dag path of object...
meshPolyIter = om.MItMeshPolygon(dagPath)
while not meshPolyIter.isDone():
# do work here...
meshPolyIter.next()
Documentation for MItMeshPolygon
is here
…
Ok.
Now that the API part of this is out of the way, let’s talk algorithms. I’m going to be using pseudocode from here on because frankly, it’s easier that way.
Essentially, a duplicate face is one that shared all its verts with at least one other face on the mesh (by your definition here). The most straightforward, naive way to find faces that match that criteria is by comparing every face to every other face:
for every face on mesh:
for every otherFace on mesh:
if face.vertIDs matches otherFace.vertIDs:
// we found a duplicate face, so track it
duplicates.add(otherFace.faceID)
Now, if you understand computational complexity, this method is O(n^2), which is a fancy way of saying that you have to iterate over the number of faces times the number of faces. This is going to be really slow for high poly meshes because the computation time of the algorithm increases exponentially as the number of polys grows.
So that means we’ve gotta get rid of that inner loop. Instead, let’s try storing each face as we iterate through them the first time. For every face, we can store it by picking an arbitrary axis and using the ID of it’s most positive vert along that axis. That way when we go to compare faces, we only have to compare with faces that we know share the same topmost vert as the current face. This works because we can assume that duplicate faces are overlapping exactly. Again, the actual axis you use is arbitrary, so I’m just gonna use the Y axis.
// keep in mind, this is pseudocode
facesDictionaryByID = dictionary()
for every face in mesh:
maxY = max(v.y for v in face.verts)
topmostVertID = vert where vert.y == maxY
if topmostVertID in facesDictionaryByID:
otherFaces = facesDictionaryByID[topmostVertID]
for otherFace in otherFaces:
if all face.vertPositions match otherFace.vertPositions:
// we found a duplicate
duplicates.add(face)
break (go back to outer loop to look for more dupes)
if no duplicate was found:
// add the current face to the list of faces with a matching topmost vert
facesDictionaryByID[topmostVertID].append(face)
else:
facesDictionaryByID[topmostVertID] = list(face)
Now we still have an inner loop, but it’s only iterating over faces which could be potential duplicates for the current one, which is going to be a relatively very small number. The important point is that now we aren’t checking against every other face, we’re only checking a few other faces, if any, per face.
EDIT: I also want to note that storing faces based on a particular vert index rather than position also has the benefit of allowing us to use an int instead of a float as a dictionary key and thus avoid false negatives due to floating-point error.
However, there’s a caveat. The speed it all runs depends on how you check whether the topmostVertID
value is in that dictionary.
If you use the python keyword in
or use dictionary.keys()
then you’re actually creating a new list of all the keys currently in the dictionary, and python will try to determine if it contains matching values by iterating over it. However, accessing the value via dictionary.has_key()
[the in
keyword] or dictionary.get()
doesn’t have to iterate over all its values since it uses a hash lookup (you don’t have to know what that means, just that it’s fast).
If you’re still following me when I talk about computational complexity, the dictionary.keys()
version is expensive because it causes an iteration over a list that gets bigger each time we go through the loop. This runs in O(n(n+1)/2), which is effectively not much better than O(n^2) - what we had with the naive solution. But since [using the in
keyword on dictionaries] dictionary.has_key()
and dictionary.get()
functions are considered to run in O(1) time (constant), then effectively, their usage reduces the algorithm to O(n) time, which is not too shabby. It means that we only need to go through the loop about once for every poly, instead of that number squared.
Hopefully you find all of this helpful and not confusing. This turned out to be a much longer post than I meant for it to and it got a bit computer-sciency.
EDIT: in
for dictionaries is actually O(1) in Python 2.7, so using it is encouraged. Also, dictionary.has_key()
still works in Python 2.7, but is deprecated and should in best practice be avoided as it was removed in later versions.